Systematic Trading research and development, with a flavour of Trend Following ## R Code for Walk-Forward LSPM

#### December 23rd, 2010 · 8 Comments · Code At the same time as Josh Ulrich was making some very good points on why to use R, over on his FOSS Trading blog, I was coming to the same realisation that it is a very neat, useful tool indeed. In this post, I’ll present the code I used for the last post on running a Walk-Forward test of the LSPM. Hopefully this can serve as an introduction to R to readers who do not use it yet.

### Getting into R

The LSPM java app distributed by Ralph Vince during his seminar did not seem very robust – and in any case, I could not manage to get it to generate the Joint-Probability Table (JPT) for 6 market-systems over a 20-year history of monthly returns. It would simply hang after 4 of them.

I then decided to write my own JPT tool. Did not take me too long, an hour or two and about 200 lines of C# code. It did the job ok and I could run it in an automated/batch mode. But I had forgotten that Josh had posted some R code to do exactly that (Rats! A couple of hours wasted…). On top of that the code was about 30 lines long. Impressed by this, I decided that the study on the last post was a good opportunity to try and learn R and I attempted to automate the test using R only.

### What the Code Does

At a high level, the code reads several files each containing the history of monthly returns for a market-system. The format of the files is as follows:

```19900201,-0.5 19900301,-1.16 19900401,2.24 19900501,3.3```

First column = date, second column = percentage return

This is a Walk-Forward test, so the optimization runs on a sliding window of past returns, and results are applied to the walk-forward window of returns. The cycle is repeated as many times as necessary.

For each cycle, the return data from the optimization window is extracted from the full set of returns, passed to the LSPM optimization, and calculated leveraged returns (using f amounts from the optimization) are stored for the corresponding walk-forward window. The full results of the test consists of a concatenation of all leveraged returns in each walk-forward window.

The code below is my first forray in R and its robustness could be “enhanced” (it contains several assumptions such as file formats, number of market-systems) but it should be fairly easy to adapt to run different scenarios or make it more generic.
R is quite different from the programming languages I am used to, so some of the syntax might seem a bit hacked – but it does the job (I appreciate any corrections/pointers on better practices from experienced R readers…).

Hopefully, it is a useful example for readers wanting to get started with R. Readers might want to check my initial post on using R to run the LSPM optimization as this code expands on some aspects of it.

### The Code

The first part of the code sets the walk-forward parameters, here: 10 years for the optimization window and one year for the walk-forward window:

```# Set Walk-Forward parameters (number of periods) optim<-120 #10 years = 120 monthly returns wf<-12 #1 year = 12 monthly returns```

Open the market-system files:

```# Open files setwd("C:/MyPath") a <- read.csv("TMA-20-50-200.csv", header = FALSE, as.is = TRUE, sep = ",", dec="."); b <- read.csv("MA-50-200.csv", header = FALSE, as.is = TRUE, sep = ",", dec=".") c <- read.csv("BBO-20.csv", header = FALSE, as.is = TRUE, sep = ",", dec=".") d <- read.csv("Donchian-200.csv", header = FALSE, as.is = TRUE, sep = ",", dec=".") e <- read.csv("Donchian-20.csv", header = FALSE, as.is = TRUE, sep = ",", dec=".") f <- read.csv("BBO-50.csv", header = FALSE, as.is = TRUE, sep = ",", dec=".")```

Calculate the number of Walk-Forward cycles:

```# Calculate number of WF cycles numCycles = ceiling((nrow(a)-optim)/wf)```

Below is the code mentioned earlier, which can be found on Josh Ulrich’s blog – used to create the LSPM JPT:

```# Define JPT function jointProbTable <- function(x, n=3, FUN=median, ...) {   # Load LSPM if(!require(LSPM,quietly=TRUE)) stop(warnings())   # Function to bin data quantize <- function(x, n, FUN=median, ...) { if(is.character(FUN)) FUN <- get(FUN) bins <- cut(x, n, labels=FALSE) res <- sapply(1:NROW(x), function(i) FUN(x[bins==bins[i]], ...)) }   # Allow for different values of 'n' for each system in 'x' if(NROW(n)==1) { n <- rep(n,NCOL(x)) } else if(NROW(n)!=NCOL(x)) stop("invalid 'n'")   # Bin data in 'x' qd <- sapply(1:NCOL(x), function(i) quantize(x[,i],n=n[i],FUN=FUN,...))   # Aggregate probabilities probs <- rep(1/NROW(x),NROW(x)) res <- aggregate(probs, by=lapply(1:NCOL(qd), function(i) qd[,i]), sum)   # Clean up output, return lsp object colnames(res) <- colnames(x) res <- lsp(res[,1:NCOL(x)],res[,NCOL(res)]) return(res) }```

Loop through each Walk-Forward cycle:

`for (i in 1:numCycles) {`

Extract the relevant set of past returns for the optimization:

``` # Define cycle boundaries start<-1+(i*wf) end<-optim+(i*wf)   # Get returns for optimization cycle and create the JPT rtn <- cbind(a\$V2[start:end],b\$V2[start:end],c\$V2[start:end],d\$V2[start:end],e\$V2[start:end],f\$V2[start:end])```

Create the JPT and run the actual LSPM optimization (note that the case above is the “simple” optimization, only taking optimal geometric growth rate into account – no consideration for probability of drawdown. This was mostly to ensure the run did not take hours and hours):

``` jpt <- jointProbTable(rtn,n=c(20,20,20,20,20,20)) outcomes<-jpt[] probs<-jpt[] port<-lsp(outcomes,probs)   # DEoptim parameters (see ?DEoptim) np=60 # 10 * number of mktsys imax=1000 #maximum number of iterations crossover=0.6 #probability of crossover   NR <- NROW(port\$f) DEctrl <- list(NP=np,itermax=imax, CR=crossover, trace=TRUE)   # Optimize res <- optimalf(port,control=DEctrl)```

Retrieve the leverage amounts from the optimization as well as the set of returns from the Walk-Forward window. Apply the leverage amounts to these returns and add the data to the full set of leveraged returns:

``` # Save leverage amounts lev<-c(100/(-jpt\$maxLoss/res\$f), 100/(-jpt\$maxLoss/res\$f), 100/(-jpt\$maxLoss/res\$f), 100/(-jpt\$maxLoss/res\$f), 100/(-jpt\$maxLoss/res\$f), 100/(-jpt\$maxLoss/res\$f)) levmat<-c(rep(1,wf)) %o% lev #so that we can multiply with the wfrtn   # Get the returns for the next Walk-Forward period wfrtn <- cbind(a\$V2[(end+1):(end+wf)],b\$V2[(end+1):(end+wf)],c\$V2[(end+1):(end+wf)],d\$V2[(end+1):(end+wf)],e\$V2[(end+1):(end+wf)],f\$V2[(end+1):(end+wf)]) wflevrtn <- wfrtn*levmat #apply leverage to the returns   if (i==0) fullrtns<-cbind(t(wflevrtn)) else fullrtns<-cbind(fullrtns,t(wflevrtn)) }```

After the loop is finished, print the full set of leveraged returns

`t(fullrtns)`

### In Closing

Despite my very limited experience, I can really see some of the potential in using R for any sort of data manipulation.

If you want to get started in R and test this code, the following links will be useful:

R homepage, which contains link to a download page. The installation is very simple and quick. Make sure you also check the docs as they are good.

The R LSPM package maintained by Josh Ulrich, the real meat of the code above – credits to him for making this!

The R code included in the post above

And a batch file to run the above code from a command prompt.

Finally, if you want to re-run the same test, here are the return files that were used in the code above:
Return Data Files (zipped) Tags: ··

### 8 Comments so far ↓

• Joshua Ulrich

Jez,

Very nice post. I love prose+code!

Something worth noting: my function returns slightly different JPTs than Ralph’s java application due to using different functions to calculate each bucket’s value. I believe Ralph’s application uses min, max, and mean for the smallest, largest, and middle buckets, respectively. By default, my function uses the median for all buckets.

Also, I have some ideas for speeding up the optimizations, in addition to using the win-percent to set the optimization bounds. I’ll let you know when they’re ready to test.

• Jez Liberty

Great – I’m glad you like it, Josh.

I noticed the difference with the JPT construction – actually my “short-lived” implementation in C# was more similar to yours but when I compared the output for all three, they were quite similar (I believe there were other slight differences, which did not seem to impact the optimization very much).

Looking forward to the optimization speed-up!

• Jack White

We have already done internal walk forward testing of Vince’s LSP (he was quite helpful of it then, but this was over a year ago). Acording to Vince, this is not the same thing to testing a trading system. In testing something like LSP model, you know in advance the math is correct for a given jpt (and with a system, you are testing to see if results that were arrived at by testing, that is, parameters, a guess, not something solid like a formula that you can prove does something, hold up).

So what you are testing is the robustness of the jpt from one period to the next.

But you would expect it to not be robust. This is where Vince got very interested because he and we want to know then how to amend it to show robustness. What do you add / subtract from it? How far back do you go? How do you put things in that have not been see? I believe he has a method involving confidence intervals on the jpt to do exactly these types of things. So we determined our walk forward tests were attempts to answer these questions about how to construct robust jpt tables, and not about contesting the math. Vince laughs and says “It’s settled science!” Sometimes I do not know if he is a serious about things he seems a funny guy.

Also, the Java limitiation of a default 64Mb heap size is the reason the app seiztinged up. I am surprised Vince didn;t point that out. I believe the solution (according to him) is to increase the Java heap size by putting -Xms512m -Xmx512m right after the last line that starts with java.exe in the run.cmd file, like this:
java.exe -Xms512m -Xmx512m -classpath….

• Jez Liberty

Jack – thanks for your feedback.
I agree that the LSP model is 100% correct from a maths point of view (on past data). And for the approach I am testing above to work, it relies more on the robustness of the trading systems and their return distribution (or JPT). Every system will most likely display different “robustness behaviour” based on how stable they are.

The only robustness element that Vince talked about during his course was to use half the win ratio (P) as f because we know that f will be anywhere between 0 and P. So, picking P/2 would minimise the gap between future value of f and our chosen value. Thinking about it, I am not so sure it works as it seems to assume f moves between 0 and P but P is constant – whereas both would be moving targets..
Vince also mentioned that one could alter the JPT prior to the optimisation but did not give much method about it – mostly what one should “think” would be realistic…
I would be keen to hear what you guys have done to make the JPT more robust ?

ps: thanks for the note on the heap size parameter. Vince might have mentioned it but I missed it. In any case, I think the R method from Josh Ulrich is more convenient and faster.

• Jack White

Jez,

We’ve found, (and the response of Vince himself seemed to indicate that it was obvious) that “robustness,” though a function of the jpt (and we can speak at great length on how to amend that) is more a function of the criterion you have above all else.

Maximizing for greatest probability of profit is amazingly robust — maximizing for greatest growth is not. This is very interesting. After all, maximizing for greatest growth makes a stupid amount of money when it works — who needs to do that? But Vince’s model allows us to develop very robust money management systems for making plenty of money and very high probability of doing so. This is the real power of LSP and I believe it is exactly this type of implementation that Vince is bringing out in ETF form.

Alos, the p/2 robustness you speak of is very robust if you use long enough data windows. If you use systems where, for an adequate window of time, p dances around a lot, your window is likely not long enough. p tends to be fairly stable. Remember too, the idea of p/2 is not to predict where the peak of the landscape will be with respect to the axes representing each portfolio component, but rather to minimize the price paid for missing the peak, as a best guess.

• Jez Liberty

Thanks for the feedback Jack. The comment regarding optimizing for probability of profit vs equity growth is very interesting and I’ll be sure to look into that.

Thanks again – Jez

• timelyportfolio

just finished the book and playing with implementation; thanks so much for this example. I am trying to do this with the edhec data in PerformanceAnaytics. Also, I will add some charts for display. When done, I’ll post at my site timelyportfolio.blogspot.com.

• brian winters

Jez,

I have heard Ralph Vince talk about his formulas and tried to read his papers, but I am a math primitive and admit I cant follow it.

While he says the math isn’t difficult he obviously doesnt get people like myself. Once he gets past explaining the highest level of overall purpose, which I understand and am very interested in, he falls into speaking in formulas and math speak which goes over most traders heads.

I would think his concepts would be applicable to any system of trading, from simple to complex, and I know that complex math formulas can be broken down into several steps of simple math. I wish someone could do this and explain this information to us mere math mortals (we know */+- and that’s about all).

For instance, to keep it really simple, let’s say you have a portfolio with 3 stocks and cash. You are about to invest in a 4th stock and want to determine how many shares to buy using the Vince formula (with no leverage). How do you do this with simple math?

Vince says that there is no money management strategy unless you know Criteria, Horizon, and Worst-Case Outcome. So I am somehow hoping to understand this as well, because Ralph only seems to be able to explain it using formulas.

If anyone can do this I’d love to understand this, and I’m sure I’m not alone.

Thanks for any help.
Brian