#### Evidence-Based Technical Analysis: Applying the Scientific Method and Statistical Inference to Trading Signals

Today I’ll be talking about an excellent book, which was recommended on several “quant” blogs I read: **Evidence-Based Technical Analysis** by David Aronson. One of the main reasons I picked this book is because *it teaches you to fish* (instead of *giving you a fish*). So, if you’re after a book with great trading strategies or indicators, this might not be the ideal one, however if you want to learn about **strategy testing and methodology**, it’s probably a great addition to your trading library. It had been on my list for a while and I wish I’d read it earlier as it has the potential to add cornerstone methods to trading research and testing procedures. Read on for a summary with a review right at the end…

### Intro

One of the early quotes from the book defines the concept it covers:

The scientific method is the only rational way to extract useful knowledge from market data and the only rational approach for determining which TA methods have predictive power. I call this evidence-based technical analysis (EBTA).

Aronson introduces early on the concept of **objective** (TA) vs. **subjective** (TA). An objective claim is a meaningful proposition, which can be unambiguously verified. For us mechanical system trading developers: a set of rules that can be back-tested. On the other hand, subjective technical analysis would consist of approaches such as Elliot Wave Analysis.

However, objective technical analysis is not sufficient on its own: you still need **rigourous statistical inference** to draw conclusions on its predictive power.

### Part One: the Foundations

Part one of the book establishes the methodological, philosophical, psychological and statistical foundations of EBTA.

The first topic covered is the need for **benchmarking** to evaluate **objective** rules and introduces the concept of detrending, which I have previously discussed.

The second topic deals with cognitive psychology and gives examples of different types of behavioral biases that can fool us and make us believe in subjective technical analysis:

- Pattern recognition
- Confirmation bias
- Hindsight bias
- Over-confidence
- Illusory correlations
- Mis-perception of randomness

The antidote for these “mind traps” is the **scientific method**. The generic scientific method is covered in the third chapter with some history and philosophy of science and logic reasoning. The scientific method – which can and should be applied to Technical Analysis – contains 5 stages:

- Observation
- Hypothesis
- Prediction
- Verification
- Conclusion

Subjective TA does not conform to the scientific method and the author presents an interesting study of objectification of a subjective TA pattern (Head and Shoulders) to make it testable (it shows that Head and Shoulders is worthless on stocks and has doubtful value on currencies).

### Statistical Analysis of Back-Test Results

The next three chapters introduce and cover **statistical analysis**. The beginning of this part gives a good refresher on statistical inference, starting with concepts such as frequency distribution, standard deviation, probabilities and p-values. The example of sampling and statistical inference using beads in a box makes for a good illustration and a fairly clear parallel with the world of trading rules back-testing.

The book moves on to concepts such as hypothesis testing, statistical significance and confidence interval, etc. and how they relate to rule testing.

One of the main issue of back-testing results is that they only represent **one** sample of how the systems/rule(s) perform. Aronson presents the classical statistical approach to derive the sampling distribution (required to perform the statistical inference) based on a single observation/sample. However this assumes normality of the distribution, which is unlikely to be correct when dealing with financial data.

### New Scientific Methods for Back-Testing

This last concept leads to the introduction of the two alternative methods to derive the sampling distribution and perform statistical inference on the back-tested results. These are two computer-based methods:

- The
**Bootstrap** - The
**Monte Carlo permutation**

Both methods estimate the sampling distribution by randomly resampling (reusing) the original sample of observation. A test statistic is then computed for each resample.

In practice, the bootstrap method uses resampling with replacement of the daily strategy returns to generate numerous random test statistics used to approximate a sampling distribution.

The Monte Carlo permutation method achieves the same result by decoupling and permuting the position direction (ie. long or short) with the daily instrument returns.

Using the statistical inference covered in earlier chapters, one can decide whether results found in the back-test are statistically significant or the product of random chance.

**These two methods are the main take-away from the book**, as they are valuable to identify the degree of randomness in a back-tested rule. This should probably be part of a standard trading system research methodology and I will cover these two methods in more detail in later posts.

### On Data Mining

The methods above only deal with one rule/back-test. However, we rarely test the one rule in isolation: most back-testing would test multiple parameter values, rules and combinations to try and identify the best performing ones: this is **data mining**.

It is however wrong to expect future performance of the best performing systems to keep in line with past, back-tested results. The best performing systems might have intrinsic value, but some of their over-performance is due to **random variations**. If you run 1,000 different rules with no predictive power, all of them will contain some random chance producing a variable departure from the zero-mean. The **“most lucky” rule** will be furthest away on the right-hand side of the zero-mean (and therefore picked up by the data miner), despite having no **intrinsic value**.

Data mining introduces a **bias**, which overstates the value of the “best” rule compared to expected random variations. The data mining bias is linked to several factors:

- Increases with the number of rules back-tested
- Decreases with sample size used in back-testing.
- Decreases with the correlation of back-tested rules results.
- Increases with the frequency of outliers in the back-test sample.
- Decreases with the variation in back-tested returns among rules considered.

This is illustrated with examples and charts. The rest of the chapter concentrate on methods to reduce/correct for the data mining bias and adapts the bootstrap method (using *White’s reality check*) and Monte Carlo permutation to be used in “data mining” mode (instead of single rule testing).

In conclusion, data mining is a valid method to discover the best rule(s) but the researcher should ensure that the results are statistically significant to avoid the risk of discovering “most lucky” rules.

### A Tour of the EMH and Application of Methods

The following chapter deals with the **Efficient Market Hypothesis**, which takes a bit of a beating by the author. The main point is that both from an empirical and theoretical point of view, the EMH contains flaws, which supports the idea of **succesful TA**.

The last part of the book presents a diverse set of rules and parameters (6,402 combinations) and attempts to test for their statistical significance. The rules are fairly simple and the results do not highlight significant predictive power in any rule.

### Review Conclusion

This book is a very interesting read, on the long side, with 450+ pages. Even though I enjoyed it throughout, I was sometimes finding myself hoping for the author not to expand so much on some introductory topics (the history and philosophy of science is quite interesting but could well be skim-read to get to the “juicier” parts quicker). If you’re in a rush I’d advise to concentrate on chapters 4, 5 and 6 where the actual bootstrap and Monte Carlo methods get presented and discussed, and the discussion on data mining bias is interesting and very relevant. For a reader new to these concepts, the initial chapters would provide a comprehensive introduction of the foundational concepts of scientific reasoning and statistical analysis before putting them all together in application.

For more info, some of the reviews on amazon are quite insightful (mostly positive – although the book’s got its share of 1-star reviews). There is also a companion website to the book with more info and detailed results of the tests performed in the last part of the book.

Josh// Aug 5, 2010 at 3:22 pmHi Jez

“Decreases with the correlation of back-tested rules results.”

Are you saying that while more rules increases the bias, if they are highly correlated with each other this increase is mitigated?

Jez// Aug 5, 2010 at 3:55 pmCorrect (well I am just repeating what Aronson says… ;-):

with a lot of uncorrelated (different) results, there is a higher probability that one set of results will err on the very lucky/over-performance side of things

If you think about the extreme example of 1,000 rule test results all perfectly correlated (ie identical results), there is no data mining bias (ie because the data mining process did not discover a rule over-performing all others purely by chance: they are all the same)

Josh// Aug 5, 2010 at 6:56 pmThat’s what I thought, just seeking some confirmation… well that little quote has got me thinking about some possible new ways to evaluate the robustness of systems.

Perhaps taking the average pairwise correlation for a set of rules to adjust the t-stat? Or number of rules multiplied by ( 1 – absolute of average pairwise correlation ) to get your “adjusted” number of independent rules?

Does Aronson go into the maths of how to adjust the bias for the rules’ correlations?

Josh// Aug 5, 2010 at 7:01 pmAnd yes, 1000 rules all with a pairwise correlation of 1 would be completely equivalent to having just one rule… somewhat related, have you read a paper on SSRN titled “Strategy Distinctiveness and Hedge Fund Performance”? Google it if you haven’t.

Andrew// Aug 6, 2010 at 4:04 amI have this book and can thoroughly recommend it. I have implemented Aronson’s ideas in R scripts and Octave C++ functions and they now form the backbone of my back testing methodology. I have learned from using these tests that very few indicators etc. actually have any statistical validity and I comfort myself with the fact that I now have the knowledge not to risk my money on such dubious TA.

Troy S.// Aug 6, 2010 at 8:18 amExcellent review Jez, looking forward to your posts on Monte Carlo and bootstrapping.

Jez// Aug 6, 2010 at 3:07 pm@Josh:

Aronson does not really get into a mathematical/theoritical explanation of each factor in data mining bias – rather he presents results based on computerized simulations using some artifical rules where he can control each factor. The results are presented in a chart that show that data mining bias drops slowly for correlations between 0 and 0.8 and then more drastically past the 0.8 mark (or thereabout) – the more rule/systems being tested, the higher the correlation threshold for a big drop of data mining bias (ie at 10 rules tested it start dropping more heavily at 0.7 whereas for 1000 rules it drops past 0.95).

Your idea of adjusting the t-stat based on rul correlation sounds good – however Aronson does not go in that direction, rather he describes how to adapt the bootstrap and Monte Carlo methods to account for data mining bias.

Thanks for the paper suggestion – I’ll take a look

@Andrew

I do also feel that some of Aronson concepts and methods will be incorporated in my standard testing methodology – As I am saying in the post, this book teaches you to fish!…

Will need to code that up as you did in R and Octave C++ (although I still havent learnt these tools and might go a different IT implementation route…)

Josh// Aug 6, 2010 at 7:21 pmThe bias dropping slowly until o.8 would make sense mathematically, since r square is correlation squared, so it following a power law makes sense.

Ultimately, it sounds like 6-to-12 and a half-dozen to the other. With hundreds of highly correlated rules but low datamining bias, versus hundreds of uncorrelated rules with high bias seems like the net edge would be zero. Of course if the rate at which the bias drops is non-linear relative the rate the correlation increases then there might be a “sweet spot”.

Interesting stuff, picking up Aronson’s book has been on my todo list for a while.

The Bootstrap Test: How significant are your back-testing results? | Au.Tra.Sy blog - Automated trading System// Aug 11, 2010 at 5:47 am[...] Evidence-Based Technical Analysis [...]

Monte Carlo Permutation: Test your Back-Tests | Au.Tra.Sy blog - Automated trading System// Aug 18, 2010 at 7:18 am[...] method to evaluate the statistical significance of a back-test result presented by Aronson (in EBTA) is the Monte Carlo Permutation. This is an extension of the classic Monte Carlo method, applied to [...]

Bevan// Nov 3, 2010 at 4:31 pmVery amusing reading some of the loony reviews of the book on Amazon, seem to be a few grudges about. Its amazing how worked up people get about TA!

Genome// Nov 14, 2010 at 10:46 amHi Andrew, after all your testing what are the few indicators worth using?

Cheers.

Andrew// Nov 15, 2010 at 5:21 pm@Genome

The indicators which I think have some value are those based on Digital Signal Processing concepts such as low lag filters, high frequency filters, measurements of periodicity and adaptive indicators. Fixed length parameters don’t cut the mustard for me.

Albert// Jan 3, 2012 at 10:56 pmGood book, but ( and rather big “BUT”) – no word about “overfitting”. Just like it is not an issue at all.

Baz// Sep 11, 2013 at 12:17 pmI realise I am coming to this a little late, but on the topic of over-fitting the one piece of advice seems to be that rather than trying to maximise one criteria such as profit you should try to maximise the robustness of your results.

So if your max Profit comes from one spike for one set of parameters where as all the others parameters in the neighbourhood give rise to losses then its likely that your strategy is not robust and your strategy will not perform out of sample. So its better to search for the highest profit plateau even if this is not as high as the highest profit spike.

Problem is the amount of parameters in even a fairly simple moving average strategy tends to be large once you start adding filtering/stops/slippage etc. Does anyone know a good way to visualize such high dimensional problems in a lower dimensional space? It can be done numerically but you loose the intuition.

potla// Sep 30, 2013 at 6:56 amGood work, keep it up.