
The second method to evaluate the statistical significance of a back-test result presented by Aronson (in EBTA) is the Monte Carlo Permutation. This is an extension of the classic Monte Carlo method, applied to rule testing.
The concept behind the Monte Carlo Permutation is similar to the Bootstrap method:
- Generate multiple random outputs based on the single sample data from the back-test.
- compare the random Monte Carlo outputs to the back-test output to evaluate its statistical significance.
The difference lies in how the multiple random outputs are generated. Whereas the bootstrap generates a sampling distribution for the back-tested rule return, the Monte Carlo Permutation focuses on the pairing of the rule positions with the instrument daily return. Its resampling randomly associates the rule positions with the market returns, without replacement.
The H0 hypothesis in the Monte Carlo Permutation test asserts that the returns of the rule evaluated are a sample from a non-profitable population, or, in other words, that rule positions are randomly correlated to market returns.
Monte Carlo Illustration
Imagine the following back-test result, presented day by day:
| Day | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| Rule Position | Long | Long | Long | No Pos | No Pos | Short | Short | Short |
| Market Return |
0.54%
|
-0.32%
|
1.54%
|
0.69%
|
-1.02%
|
-0.68%
|
1.20%
|
-2.50%
|
| Output |
0.54%
|
-0.32%
|
1.54%
|
0.00%
|
0.00%
|
0.68%
|
-1.20%
|
2.50%
|
| Mean Return |
0.47%
|
|||||||
There are effectively two input time series:
- Rule Positions
- Market Returns
The way these two time series are linked (by date) produces the daily output for the rule return – and a mean return can be calculated.
The permutation of the Monte Carlo method will reshuffle one time series to produce random links, or pairing, and produce a different rule output.
Two examples can be found below. The market return time series has been randomly reshuffled to produce two different sample outputs:
| Day | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| Rule Position | Long | Long | Long | No Pos | No Pos | Short | Short | Short |
| Market Return |
1.20%
|
0.17%
|
0.54%
|
1.54%
|
-0.32%
|
-0.68%
|
-0.33%
|
-1.02%
|
| Output |
1.20%
|
0.17%
|
0.54%
|
0.00%
|
0.00%
|
0.68%
|
0.33%
|
1.02%
|
| Mean Return |
0.49%
|
|||||||
| Day | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| Rule Position | Long | Long | Long | No Pos | No Pos | Short | Short | Short |
| Market Return |
-0.68%
|
-0.32%
|
1.20%
|
-0.33%
|
0.17%
|
0.54%
|
0.69%
|
-2.50%
|
| Output |
-0.68%
|
-0.32%
|
1.20%
|
0.00%
|
0.00%
|
-0.54%
|
-0.69%
|
2.50%
|
| Mean Return |
0.18%
|
|||||||
The Monte Carlo Permutation produces a large number of these random outputs. The p-value of the original back-testing sample can then be computed (it is equal to the fraction of random rule returns equal or greater to the back-tested rule return).
Note that Aronson once again recommends to run the back-test evaluated by the Monte Carlo Permutation on detrended data. It is also mentioned that Timothy Masters (who got the idea of applying the Monte Carlo method to rule testing) has performed tests showing that the bootstrap and Monte Carlo Permutation methods produce similar results when using detrended data.
Step by Step with Data Mining Bias Handling
Of course when applying this method to more than one rule, data mining bias comes into play.
The methodology for the Monte Carlo Permutation for data mining back-testing can be broken down as follows:
- N back-tests are run on detrended data. Both rule position and market return time series are collected for the back-tested rules.
- The market return time series is randomly reshuffled and paired with each of the N rule position time series to produce a new daily rule output time series for each rule. The same pairings must be used for all rules to ensure that the potential correlation structure present in the rules is preserved.
- A mean daily return is calculated for each of the N rules – the best return is selected as the value for the sampling distribution in this iteration
- Repeat steps 2 and 3 a large number of times
- Form the sampling distribution of the best means generated in the steps above.
- Derive the p-value of the best back-test mean return based on the sampling distribution.
Some “Criticisms”
Aronson mentions that since the Monte Carlo Permutation does not test a hypothesis about the rule’s mean return (H0 is about random correlation of positions and market returns) it is not possible to use it to derive confidence intervals – as could be done with the bootstrap sampling distribution.
The method also requires access to more information than the bootstrap (which only needs the daily rule returns). It makes it impossible to apply to “black box” systems or programs. For example, the Monte Carlo Permutation method would not enable us to check the statistical significance of a Trend following Wizard as was done in bootstrap post #2.
The same remark concerning the use of arithmetic mean return instead of geometric mean return applies here also, but that can be easily modified.
Finally, the method, as formulated, only considers extremely simple cases of money management with identical size for all positions. The method would need to be adapted to be used for rules with more complex money management strategies.
I’ll let you come to your conclusions and experimentations but it does seem like the Monte Carlo Permutation method has more weak points than the Bootstrap test.
Like this post? You may want to read these:
Welcome to my online repository of research and insights on automated trading system development
I have done some comparisons of the t-test and the non-parametric tests described in the Aronson book. Basically I found that the results are soo close (even in highly non-Gaussian cases) that the extra CPU involved in the non-parametric tests is simply not worth it.
I have recently stumbled across a forum posting (almost 6 years old!) that discusses permutation testing.
http://www.nuclearphynance.com/Show%20Post.aspx?PostIDKey=20934