Systematic Trading research and development, with a flavour of Trend Following
Au.Tra.Sy blog – Automated trading System header image 2

How can Walk-Forward testing keep your system a step ahead?

November 5th, 2009 · 17 Comments · Backtest, Software

Out-of-Sample testing is a necessary practice to avoid curve-fitting during the optimisation of a trading system. Walk-Forward testing improves on the idea of out-of-sample data testing and is designed as an on-going, adaptive approach. Its invention is mostly credited to Robert Pardo (read more about it in his book)

The way it works is fairly simple. It is a combination of multiple cycles of “in-sample optimisation” with “out-of-sample verification”.

Background on optimisation and out-of-sample testing

The reason for performing out-of sample verification tests is to check whether the in-sample data optimisation resulted in curve-fitting (over-optimisation) or in a robust system parameters selection.

If the parameters derived from optimisation perform much worse in the out-of-sample verification test, it most likely means that the parameter values were (over-) optimised for the specific in-sample dataset (curve-fitted). If the system performs similarly, it should mean that the system parameters are robust and validate the approach taken for optimisation.

Walk-Forward Process: how it works

Walk-Forward testing is an on-going and dynamic process to determine whether parameters optimisation just curve fits the price and noise or produces statistically valid out-of-sample results. Here is how it works:

Let’s say we have 10 years of data from 1999 to 2009. Optimisation period is three years (in-sample data) and Verification period is one year (out-of-sample data). To begin, you start by optimising your system using only the first three years of data – in this example, 1999-2001. When the system is optimised, record the optimal parameter values and use them in the test with new data (out-of-sample) starting with 2002.

Walk Forward from 1999 to 2009

Walk Forward from 1999 to 2009

Slide the three-year window of data forward (2000-2002) and perform the same process. Once you have processed all the data available, you can collate the performance of all out-of-sample tests and compare those to in-sample optimisation runs. If the comparison shows that the system is sufficiently robust to be traded live, you simply continue the walk-forward process in real time by re-optimising every year.

In Closing

Walk-Forward is an adaptive process which re-optimises the system on a continuous basis to adapt its parameters to the most recent market conditions.

The premise of performing several optimisation/verification steps over time is that the recent past is a better environment for selecting system parameters than the distant past. This is an assumption you need to consider when choosing whether to use Walk-Forward testing or not but this is a useful tool in your Systems development arsenal.

As discussed earlier changing system parameters based on recent market conditions could result in a system chasing its tail. The next post on this topic will be an actual comparison of a basic system’s performance when optimised in a “standard way” and when optimised using “Walk-Forward”.

Related Posts with Thumbnails

Tags: ··

17 Comments so far ↓

  • Milktrader

    Very nice summary, and I like that cool walk-forward graph that moves.

    If I may raise an issue concerning precise terminology. Curve fitting is a process that cannot be avoided, and it is what yields the parameter set we choose to trade. May I suggest that Walk Forward ensures that our curve-fitting process (optimization) is not overdone, or over-fit.

    Bob Pardo (the author of ‘walk-forward’) makes this point in his tome. He has also written that walk-forward is an idiot-proof methodology of determining if a system can be traded, and what results can be expected.

    The next level of consideration is what to do with all this walk-forward data. How do you look at it? How do you know what you have? This has been a bit of a stumbling block for me and is the main reason I’ve turned to R to help me organize data.

    What are we comparing? Net Profit? Drawdown? What ‘efficiency’ of the walk forward is acceptable, and what says your system just plain sucks? I think that you could compile about 20 statistics (mean, standard deviation, etc) and not overdo it.

  • Jez

    Hey Milk,

    I see your point on terminology. I know terminology is important and I dont want to get hung up on it… ;-)
    In my mind, curve fitting is over-fit (i.e. The goal of curve fitting is to find the parameter values that most closely match the data. – if you do that you would have the parameter set that produces the best back-tested system, returns, etc. but which probably be not very robust going forward) – thanks for that input anyway – I might want to read Pardo for some clarification.

    Regarding your other points, and I have to clarify that I haven’t yet – like you – gotten my hands dirty in running a full-on actual walk-forward test… But I was imagining you would first decide on a “bliss” function (this is how Ed Seykota calls the objective function). The value of your bliss function would drive which parameter set you choose from the optimisation and use in the back-test/verification step of your walk-forward testing.
    After the whole walk-forward testing is complete, this could be just a matter of comparing values of bliss functions in optimisation vs. back-test/verification and decide of a threshold ratio between the 2 to decide whether the system is robust enough to hold up in future.

    I would be interested to see your view on this. I remember you did post a fair bit on objective functions on your blog…

    I’ll surely have more to add when I get started with it in TradersStudio!

  • Milktrader

    Pessimistic Return on Margin (PROM)

    Function FF_PROM2 (MarginRequired As Integer)

    Dim AdjWin As Double
    Dim AdjLoss As Double
    Dim MarginValue As Integer
    Dim PROM As Double

    Dim SlowAve As BarArray
    Dim FastAve As BarArray

    AdjWin = wins – Sqr(wins)
    AdjLoss = losses + Sqr(losses)

    MarginValue = MarginRequired

    If SlowAve < FastAve Then

    PROM = 0

    End If

    PROM = ((AdjWin * AvgWin) + (AdjLoss * AvgLoss))/MarginValue

    'this next relationship is what allows the factor to be passed into report

    FF_PROM2 = PROM

    End Function

  • Jez

    Hi Milk – for some reason your last comment was “caught” in my “spam nets”…

    I assume the above PROM formula (from Pardo I believe) would be your objective function to optimise a trading system?

    I have not given much thought to an objective function yet but I have always imagined having a “big” formula trying to compromise the good and bads of the trading system, ie you would want to maximise:
    - Profit (MAR, CAGR)
    - Sharpe ratio
    etc.
    while you want to maximise:
    - average portfolio heat
    -margin-to-equity ratio
    - drawdowns (ie maximise Ed Seykota lake ratio).
    - time in market
    etc.

    One thing I would like to try is to formulate all these aspects and prioritise them in one formula.

    The idea of reducing the good aspects of the trading system (# of wins) and increasing the bad ones (# of losses) in the PROM formula above is quite interesting also – Thanks!

  • Milktrader

    Yes, that is the essence of Pardo’s formula. Another cool idea he has is equity correlation with perfect profit. It shows that a system makes money when the market offers it and may make less when the perfect profit is less.

    Instead of creating a composite fitness function, how about we run walk forward under a group of them to determine which illuminates the best about future profitability. That would be quite an undertaking, but an interesting exercise.

  • Milktrader

    How about we formulate a metric for the effectiveness of a fitness function. Maybe something logarithmic with values between zero and 1. Each fitness function can be treated as its own neural network and graded on its ability to guess the right answer.

    Not sure how to do this though.

  • Jez

    That point about comparing perfect profit with the system performance (equity) would be one interesting way of answering the question “Is the system not performing any more or are the markets not offering any opportunities?” (ie if performance is not good but perfect profit stays good, the system might have an issue)

    Regarding the metric for the effectiveness of a fitness function (I take it you mean objective/bliss function) – I am not convinced this is a valid approach as I dont see the objective function’s role to have any “predictive” power (i.e. it should just be a way to express the overall performance of a system under your ideal criteria). Although I see your point that this is the input used to select which parameter set are carried forward (and therefore some sort of predictability value might be useful)…

    Or maybe I mis-understood your point: would you have an objective function as the output metric and a fitness function trained (by neural nets) to determine the best predictability between the optimisation and verification test (ie the fitness function would be used to work out the best/most robust parameter set between optimisation and walk-forward back-test, ie where the ratio of objective functions are most constant and ideally fairly high). This sounds a bit complicated.

    I am not so keen on neural networks anyway (black-box approach vs KISS, etc.)

  • Milktrader

    Yes, objective = fitness = bliss function. It is the arbiter of best parameters. And I would suggest they are predictive of the robustness of a system. We probably would doubt the value of Net Profit as a good judge of best parameters. Net Profit/ Max Drawdown is better. What is better than that?

    Perhaps, and this is just a thought, there is a way to measure how well a fitness function picks the best parameter set.
    Perhaps th

  • Jez

    Milk,
    I think there are an infinite number of bliss functions…
    The bliss function serves the purpose of quantify your trading Nirvana (bliss, nirvana: I am staying in the same theme here…).

    The bliss function is your OWN way to rank how well or bad the performance of a system has done – based on your own personal parameters. For example, profit might be proportinally more important to you than drawdowns.. Why not use
    (net profit)^2 / Max DD (or 2 * ln(CAR) – ln(MaxDD) ) – ie you put more emphasis on profit
    but then you could be more fancy adding more parameters in the equation and possibly some logic:
    if CAR > 15% then 2 * ln(CAR) – ln(MaxDD) – ln(Pct. time in market) – 3 * ln(average portfolio heat)
    if CAR < 15% then 2 * ln(CAR) – 3 * ln(MaxDD) – ln(Pct. time in market) – 5 * ln(average portfolio heat)
    etc.

    I find Ed Seykota's website pretty good (I'll probably add it to my blogroll) and you could check that link:
    http://seykota.com/tribe/faq/2004_Oct/Oct_11/index.htm
    look for bliss function – it raises other questions too (ie net profit/Max DD can give you a value of 8 with 4% profit and .5% DD. Is that better than 40% profit and 10% DD?)

    But in any case, I am not sure how you could add a predictive element to it – that does not seem to fit its purpose (which is to quantify with one single number how good the back-test results are). Now I think it might be possible to define a robust approach which allows to pick the parameter set that might perform best in the future (ie compare all optimisation runs and their bliss functions but do not necessarily pick the highest number – potentially a spike – but rather a local stable maximum).

  • Dave

    Jez, This is a point in Amibroker that may make one work a little harder in CBT. In regular backtester, Tomasz closes all trades at end of OOS period. There is no carryover. For longer EOD runs this may not be critical. But it plays havoc with shorter periods. Bruce Robinson has work around started using “state” function…but it requires some manipulation. I may be able to dig up Bruces AFL if you’re interested.

  • Jez

    That sounds definitely interesting as I’ll probably move into shorter timeframes at some point. And its always good to hear about problems and solutions to learn the platform..

  • dew

    This blog entry helped me finally understand Walk Forward Testing. The picture alone is worth a thousand words.

    Much appreciated.

    Also, I referenced this blog post on the NinjaTrader forum

  • Alex

    Hi Jez, I like your blog.
    Walk forward is intresting but how can you deal with start date dependency? I mean, with a simple EMA cross system, when do you start to compute the fast and slow lag for each cycle?

  • Jez Liberty

    Hi Alex – thanks for the comment.
    Not 100% sure I understand your question but for walk-forward, I would have a collection of paramter combinations I want the system to run with. All these combinations would run from the start of the back-test and at each interval period their performance would be measured and the best performing ones would be chosen for trading…

  • Alex

    With start-date dependency I mean the following: consider a double exponential moving average system; if you start to compute the fast and the slow EMA at a certain date you get some entry and exit signals. If you start to compute on a month later for example, with the same parameter, you get different signals and different results as well. In walk forward analysis you have this problem every cycle.

  • Jez Liberty

    Agree with that point Alex, however the ideal Walk-Forward process would only have one start date for indicator calculation (ie start of the data being tested). For each optimisation cycle, you would look at the performance of all the systems (which started from the beginning of the test) but only for that period of optimisation (ie letting all systems/parameter combinations run in parallel from the beginning and looking at them in separate time-windows (corresponding to the optimisations and out-of-sample phases). You would not reactivate each system for each different cycle. Unfortunately this is what TB does and one reason it is not ideal.

Leave a Comment