Systematic Trading research and development, with a flavour of Trend Following
Au.Tra.Sy blog – Automated trading System header image 2

ETF v. Futures: a Quantification

February 9th, 2011 · No Comments · Code, Data, Equities, Futures

I have already covered the idea of using ETFs in place of Futures. Today, I wanted to run a quantitative comparison between the two instrument types

The ETF sector has been growing at an impressive rate, with new offerings popping up every month and providing a wider choice in selecting a portfolio to trade. One of the drawbacks of using ETFs for mechanical trading strategies is the relative short history of these instruments, making it hard or impossible to run to back-tests looking back far in the past – which I believe is critical to understand how a strategy works over different types of market conditions.

In A Practical Guide to ETF Trading Systems, author Anthony Garner uses futures contracts as a proxy for testing strategies on ETFs. If futures are good proxies for ETFs, the back-test results from a long run with futures can be extrapolated to the ETF world.

This post is attempting to look at one way of quantifying how well futures can act as a proxy.

Quantification by Correlation

The main “tool” used in this post, to test whether futures are good proxies to ETFs, is that of correlation between prices as well as returns. The assumption being that correlations between an ETF and a good candidate futures proxy must be as close as possible to 1.0. The closer to 1, the better the proxy relationship.

I picked a few ETFs and associated futures and calculated the correlation (using Pearson product-moment) between several of the data attributes:

  • Open
  • High
  • Low
  • Close
  • t-1 Return
  • t-5 Return
  • t-30 Return

The data considered in each case was the list of all dates where both futures and ETF had data. The futures contract were proportionally back-adjusted (to keep price ratio at correct values, as discussed in this post).

The table below shows the number of records for each comparison, the start date and the various correlation coefficients.

ETF v Future num.
Correlation Coefficients
Open High Low Close t-1 Rtn t-5 Rtn t-30 Rtn
Aussie Dollar: FXA v AD 1154 20060627 0.9928 0.9939 0.9945 0.9945 0.9000 0.9802 0.9947
Cotton: BAL v CT 652 20080626 0.9980 0.9983 0.9989 0.9991 0.8541 0.9666 0.9954
Cotton: COTN-L v CT 1010 20061221 0.9939 0.9947 0.9928 0.9960 0.3190 0.8825 0.9761
Cocoa: NIB v CC 651 20080626 0.9898 0.9948 0.9961 0.9973 0.8948 0.9781 0.9955
Crude Oil: DBO v CL 1024 20070108 0.9232 0.9251 0.9212 0.9236 0.9080 0.9416 0.9606
Crude Oil: OIL v CL 1120 20060817 0.9989 0.9994 0.9993 0.9994 0.9568 0.9900 0.9973
Crude Oil: USO v CL 1208 20060411 0.9971 0.9978 0.9976 0.9978 0.9536 0.9894 0.9982
Gold: GLD v GC 1553 20041119 0.9977 0.9981 0.9979 0.9980 0.8665 0.9743 0.9940
Japanese Yen: FXY v JY 996 20070214 0.9948 0.9963 0.9958 0.9965 0.9405 0.9856 0.9951
Nasdaq100: QQQQ v NQ 2991 19990311 0.9752 0.9762 0.9740 0.9752 0.9817 0.9957 0.9989
Silver: PHAG-L v SI 932 20070425 0.9964 0.9970 0.9964 0.9965 0.8223 0.9635 0.9929
Silver: SLV v SI 1194 20060501 0.9906 0.9921 0.9918 0.9920 0.8812 0.9775 0.9955
S&P 500: SPY vs ES 4533 19930201 0.8757 0.8765 0.8740 0.8751 0.9576 0.9637 0.9704
US T-Notes: IEF v TY 2130 20020731 0.9707 0.9713 0.9708 0.9711 0.9359 0.9689 0.9758

Note: please make sure you view this post with your “browser zoom” at 100% for the numbers to be readable.

Most of the numbers are pretty close to 1, which seems to validate the idea of using the future contract as a proxy for ETF.

Some underlyings can be traded through different ETFs (or ETNs). Crude Oil, for example, has several ETFs aiming to capture its performance (I picked DBO, OIL and USO for this test). We can see that there are some differences between the correlation figures of these three ETFs. Not all ETFs seem to track the underlying as well….

A more obvious case of divergence is between the two Cotton ETFs: BAL and COTN (traded in London). The COTN ETF has a low correlation reading on previous-day returns (0.319). Obviously, the fact that the two instruments are trading in different timezones (London vs. NY) will cause their daily performance returns to differ as the market still moves after the London close. This would be something to also keep in mind when choosing an ETF/future pair.

Note on Correlation: Pearson vs. Spearman

I have chosen the Pearson correlation method for these calculations, as the linear relationship between the variables under comparison was what I felt needed to be looked at. However there are several “limitations” of using this correlation calculation. One of them is the assumption of the data being normally distributed, resulting in the calculation being very sensitive to outliers.

This is best illustrated with the correlation of “t-1 Returns” between the COTN ETF and its corresponding Cotton future contract. Back in September 08, the ETF price dropped by 60% before gaining 75% the next day. I seem to remember this was related to a scare around AIG-backed ETFs/ETNs, when the insurance giant was rumoured to go bankrupt.

For the purpose of a test, I simply removed these 2 extreme returns from the file and re-ran the correlation calculation. The figure came in at 0.518. A large increase, obtained by simply removing two outliers. This highlights the point about Pearson correlation not being the most robust with regards to outlier data points.

There are other methods for calculating correlation, and Spearman’s rank method does not make any assumptions on the distribution of the data. It can also detect some non-linear relationship (which we were not interested here). Nevertheless, I re-ran the whole set using this alternative method. The figures were approximately identical – except for the case of COTN.

The Spearman correlation coefficient for COTN v CT using both original and modified files were near-identical at around 0.61. No big jump by removing two outliers.

Note on ETFs

ETPs (Exchange Traded Products), be they called ETF, ETN or ETC all come in different flavours. They can be structured in different ways and track different benchmarks, even when they are associated to the same market. They also add an extra layer and therefore additional counterparty risk on the intermediaries, costs/fees and tracking error.

A good example of the differences between some products can be found in this ETFdb article: USO vs. OIL: A Better Crude Oil ETF?

Finally, it is not (yet?) possible to trade the equivalent of a fully diversified futures portfolio with ETFs, as all markets are not represented in the ETF universe.

Appendix: R Code

Once again, I used R to generate the various correlation calculations. The idea being that one it is coded up it can be applied very quickly to any data.

The code takes a parameter file containing the files to be compared (1st column = description, 2nd and 3rd columns = file names) and the correlation calculation type. See sample here.

Here is the code below. Hopefully the comments are clear enough to help you understand the logic. This might not be the most perfect way to implement this but it does the job:

First, it defines a function to perform the calculations on two files (ie ETF vs. future):

# Define correlETFFut function
correlETFFut <- function(params, i, cor.method,...) {
  colsFut <- c("date", "O", "H", "L", "C", "Vol", "OI", "Con", "UC")
  # Adding headers
  cols1 <- colsFut
  cols2 <- colsFut
  # Read in data sets
  data.1 <- read.csv(toString(params[i,2]), col.names=cols1)
  data.2 <- read.csv(toString(params[i,3]), col.names=cols2)
  # Join both data sets
  data.merged <- merge(data.1, data.2, by.x="date", by.y="date")
  #Calculate correlation on Open, High, Low and Close
  cor.O <- cor.test(data.merged$O.x, data.merged$O.y,method=cor.method)
  cor.H <- cor.test(data.merged$H.x, data.merged$H.y,method=cor.method)
  cor.L <- cor.test(data.merged$L.x, data.merged$L.y,method=cor.method)
  cor.C <- cor.test(data.merged$C.x, data.merged$C.y,method=cor.method)
  #Calculate correlation on t-1 Rtn
  log.diffs1 <- log( data.merged$C.x[2:sum ( ! ( data.merged$C.x ) )]/data.merged$C.x[1:(sum ( ! ( data.merged$C.x ) ) - 1)])
  log.diffs2 <- log( data.merged$C.y[2:sum ( ! ( data.merged$C.y ) )]/data.merged$C.y[1:(sum ( ! ( data.merged$C.y ) ) - 1)])
  cor.Rtn <- cor.test(log.diffs1, log.diffs2,method=cor.method)
  #Calculate correlation on t-5 Rtn
  log.diffs1W <- log( data.merged$C.x[6:sum ( ! ( data.merged$C.x ) )]/data.merged$C.x[1:(sum ( ! ( data.merged$C.x ) ) - 5)])
  log.diffs2W <- log( data.merged$C.y[6:sum ( ! ( data.merged$C.y ) )]/data.merged$C.y[1:(sum ( ! ( data.merged$C.y ) ) - 5)])
  cor.RtnW <- cor.test(log.diffs1W, log.diffs2W,method=cor.method)
  #Calculate correlation on t-30 Rtn
  log.diffs1M <- log( data.merged$C.x[31:sum ( ! ( data.merged$C.x ) )]/data.merged$C.x[1:(sum ( ! ( data.merged$C.x ) ) - 30)])
  log.diffs2M <- log( data.merged$C.y[31:sum ( ! ( data.merged$C.y ) )]/data.merged$C.y[1:(sum ( ! ( data.merged$C.y ) ) - 30)])
  cor.RtnM <- cor.test(log.diffs1M, log.diffs2M,method=cor.method)
  #Put it all into cor.all
  cor.all <- c(sum ( ! ( data.merged$C.x )), data.merged$date[1], signif(cor.O$estimate,5), signif(cor.H$estimate,5), signif(cor.L$estimate,5), signif(cor.C$estimate,5), signif(cor.Rtn$estimate,5), signif(cor.RtnW$estimate,5), signif(cor.RtnM$estimate,5))

Second, a function reads the parameter file of all files to be compared, iterates through it and call the first function for each pair of files, building an array of all results:

# Define fullCorrel function
fullCorrel <- function(file, cor.method,...) {
  # read input file
  comps <- read.csv(file, header=F)
  for (i in 1:dim(comps[1])[1]){
    cor.tmp <- correlETFFut(comps,i,cor.method)
    if (i==1) cor.full <- cor.tmp else cor.full <- cbind(cor.full, cor.tmp)
  rownames(cor.full) <- c("Record Count", " Start Date", " cor Open", " cor High", " cor Low", " cor Close", " cor Rtn", " cor RtnW", " cor RtnM")
  colnames(cor.full) <- t(comps[1])

Finally, set your working directory (where all the files are) and call the second function with the relevant parameters (here the function is called twice with the same parameter file and two different correlation methods (make sure to update the working dir to your own path):

# Set Working Dir - Replace with your own path!
setwd("D:/ATS/Blog/Posts/ETF v Futures/Data")
# Run Pearson correlations
fullCorrel("ETFvFut.txt", "pearson")
# Run Spearman correlations
fullCorrel("ETFvFut.txt", "spearman")

Below are included some sample files to run the code above:

Related Posts with Thumbnails

Tags: ·

No Comments so far ↓

There are no comments yet...Kick things off by filling out the form below.

Leave a Comment