This is really a post where 1 picture = 1,000 words, so please consider the datasets charted below:

Pretty different graphs, right?

Yet, you might be surprised to hear that **they are all identical** in the light of common summary statistics: mean, variance, (Pearson) correlation and linear regression.

Here are the exact figures:

Property | Value |
---|---|

Mean of x in each case |
9.0 |

Variance of x in each case |
11.0 |

Mean of y in each case |
7.5 |

Variance of y in each case |
4.12 |

Correlation between x and y in each case |
0.816 |

Linear regression line in each case | y = 3 + 0.5x |

Details of all dataset points are at the bottom of this post if you fancy double-checking this yourself.

Credits for this amusing and quite extraordinary illustration go to statistician Francis Anscombe, who created this **Anscombe’s Quartet** back in 1973 (more info on wikipedia).

I believe Anscombe’s main point was to prove that statistics *can be* misleading (a fact greatly abused by today’s media?) but also that outliers can have a strong impact on statistical properties. Surely two points that can impact your trading system design and testing conclusions.

### Exploratory Data Analysis

Anscombe’s quartet is often used to highlight the importance of graphical exploration of the data for analysis. This concept is behind an area of statistics known as **Exploratory Data Analysis**. From wikipedia:

Exploratory data analysis (EDA) is an approach to analysing data for the purpose of formulating hypotheses worth testing, complementing the tools of conventional statistics for testing hypotheses. It was so named by John Tukey to contrast with Confirmatory Data Analysis, the term used for the set of ideas about hypothesis testing, p-values, confidence intervals etc. which formed the key tools in the arsenal of practising statisticians at the time.

The objectives of EDA are to:

- Suggest hypotheses about the causes of observed phenomena
- Assess assumptions on which statistical inference will be based
- Support the selection of appropriate statistical tools and techniques
- Provide a basis for further data collection through surveys or experiments

In this interview, David Harding, from Winton Capital Management, stresses the importance of research and statistical data analysis in his company. The two books he highlights cover Exploratory Data Analysis (Understanding robust and exploratory data analysis and Nonparametric Statistical Inference).

### Correlation

Another topic that Anscombe’s quartet illustrates is correlation (and non-linearity).

The most common measure of correlation is **Pearson product-moment correlation coefficient**. Its calculation only derives the **linear** dependence between two variables. If a non-linear relationship exists between these variables, it will go undetected by the Pearson correlation.

To illustrate this, consider Anscombe’s top right dataset, exhibiting a perfect functional relationship between x and y. The correlation should be 1, yet the Pearson coefficient is irrelevant at 0.816.

Consider the following more extreme case (of the type y=(x-a)^2 + b), where a **100% relationship** translates in a **zero** linear/Pearson **correlation**:

Taleb is probably a bit extreme when he says:

Anything that relies on correlation is charlatanism.

But using correlation on market data, typically described as non-linear, can have its pitfalls.

I was re-reading Ralph Vince’s paper on Leverage Space Model not long ago and he describes the **fallacy of correlation**:

Correlation fails when you are counting on it the most – at the (fat) tails of the distribution. The point is evident throughout this study; big moves in one market amplify the correlation between other markets, and vice versa

In the paper, he explains that correlation between two instruments becomes much stronger during periods of large standard-deviation moves (when potentially diversification/non-correlation would be required most).

### RIP Benoît Mandelbrot (20/11/1924 – 14/10/2010)

Finally, I would like to pay tribute to Benoît Mandelbrot, who passed away last week.

Mandelbrot has been dubbed “The Father of Fractals” and spent a big part of his life trying to understand how markets work. Looking at market data through the prism of **fractal geometry**, he tried to uncover non-linear, chaotic relationships in the data.

I’ll reiterate my recommendation to read his book, The (Mis)Behavior of Markets – one of the most enjoyable rebuttals of the Efficient Market Hypothesis. Although I have not (yet) found a practical application of the principles described in his book and papers, Mandelbrot’s attempt to define a new paradigm to understand the markets is interesting – and it is a book that reinforced my belief and understanding of Trend Following (despite not covering this topic explicitely).

One of his main finding is that price changes in financial markets do not follow a normal (Gaussian) distribution, but rather a fat-tailed distribution (Levy, Paretian or also called power-law distributions). This is potentially one of the main reasons for Trend Following to work (well that’s my interpretation).

I might actually pick up that book again and read it soon (and write a summary post on it).

### Appendix: Dataset point coordinates:

I | II | III | IV | ||||
---|---|---|---|---|---|---|---|

x | y | x | y | x | y | x | y |

10.0 | 8.04 | 10.0 | 9.14 | 10.0 | 7.46 | 8.0 | 6.58 |

8.0 | 6.95 | 8.0 | 8.14 | 8.0 | 6.77 | 8.0 | 5.76 |

13.0 | 7.58 | 13.0 | 8.74 | 13.0 | 12.74 | 8.0 | 7.71 |

9.0 | 8.81 | 9.0 | 8.77 | 9.0 | 7.11 | 8.0 | 8.84 |

11.0 | 8.33 | 11.0 | 9.26 | 11.0 | 7.81 | 8.0 | 8.47 |

14.0 | 9.96 | 14.0 | 8.10 | 14.0 | 8.84 | 8.0 | 7.04 |

6.0 | 7.24 | 6.0 | 6.13 | 6.0 | 6.08 | 8.0 | 5.25 |

4.0 | 4.26 | 4.0 | 3.10 | 4.0 | 5.39 | 19.0 | 12.50 |

12.0 | 10.84 | 12.0 | 9.13 | 12.0 | 8.15 | 8.0 | 5.56 |

7.0 | 4.82 | 7.0 | 7.26 | 7.0 | 6.42 | 8.0 | 7.91 |

5.0 | 5.68 | 5.0 | 4.74 | 5.0 | 5.73 | 8.0 | 6.89 |

Pretorian// Oct 20, 2010 at 9:17 amI think the main “practical” contribution of Fractal Geometry is that it gives a formal explanation to something that has been empirically proved (Trend Following), in contrast to an idelized theory (EMH) that has been an incredible disaster in the real world: VAR, Gaussian Copulas, LTCM, Delta Hedging and so on. You have said that TF works becuase of fat tails and outocorrelation, phenomena that Mandelbrot showed that occurs in nature as well: the Noah and Joseph effects.

I am so sorry about Mandelbrot, I think he is a real legend. RIP

prazor// Oct 21, 2010 at 7:45 amInteresting article and references!

A comment on the quote:

“Correlation fails … when correlation between two instruments becomes much stronger during periods of large standard-deviation moves…”

Yes, but correlation is still helpful in the sense that when overall correleation increases it’s time to adjust the exposure.

Also, thought you would like to know -

the link to the interview w Harding has a small error.

prazor

Jez Liberty// Oct 21, 2010 at 8:25 am@prazor – thanks for the heads up on the link. Fixed now

Re: correlation, I think it can still provide some value but in a loose, un-optimized fashion…

Adaptive-Trading-Systems// Oct 22, 2010 at 12:18 amNice article Jez. It serves as a reminder to maintain a ‘birds eye’ view when developing trading systems.

James