Quantitative Trading: Cointegration is not the same as correlation

Monday, November 06, 2006

Cointegration is not the same as correlation

A reader asked me recently why I believe that energy stock prices (e.g. XLE) are correlated with crude oil futures front-month contract (QM). Actually I don’t believe they are necessarily correlated – I only think they are “cointegrated”.

What is the difference between correlation and cointegration? If XLE and QM were really correlated, when XLE goes up one day, QM would likely go up also on the same day, and vice versa. Their daily (or weekly, or monthly) returns would have risen or fallen in synchrony. But that’s not what my analysis was about. I claim that XLE and QM are cointegrated, meaning that the two price series cannot wander off in opposite directions for very long without coming back to a mean distance eventually. But it doesn’t mean that on a daily basis the two prices have to move in synchrony at all.

Two hypothetical graphs illustrate the differences. In the first graph, stock A and stock B are correlated. You can see that their prices move in the same direction almost everyday.

Now consider stock A and stock C.

Stock C clearly doesn’t move in any correlated fashion with stock A: some days they move in same direction, other days opposite. Most days stock C doesn’t move at all! But notice that the spread in stock prices between C and A always return to about $1 after a while. This is a manifestation of cointegration between A and C. In this instance, a profitable trade would be to buy A and short C at around day 10, then exit both positions at around day 19. Another profitable trade would be to buy C and short A at around day 31, then closing out the positions around day 40.

Cointegration is the foundation upon which pair trading (“statistical arbitrage”) is built. If two stocks simply move in a correlated manner, there may never be any widening of the spread. Without a temporary widening of the spread in either direction, there is no opportunity to short (or buy) the spread, and no reason to expect the spread to revert to the mean either.

For further reading:

Alexander, Carol (2001). Market Models: A Guide to Financial Data Analysis. John Wiley & Sons.

51 comments:

NAThursday, November 9, 2006 at 8:23:00 AM EST
Interesting post. I've found this cointegration to be between OIH and USO/XLE.

I usually like to short OIH due to its higher intra-day volatility than XLE and go long XLE as a hedge, or vice versa.

It doesn't always cointegrate like you mentioned but every now and then there is an opportunity. I.e. On short-covering days
ReplyDelete
Replies
Ernie ChanThursday, November 9, 2006 at 8:42:00 AM EST
Yes, OIH is certainly an alternative to XLE. OIH is the most liquid oil services ETF. Frankly, I don't remember the reason anymore why I chose XLE instead of OIH to do the analysis. They both cointegrate with USO equally well.
ReplyDelete
Replies
AnonymousSaturday, February 17, 2007 at 11:10:00 PM EST
Dear Mr. Chan,

Wonderful blog you have over here. I looked for something like this for a long time.

You might also try DVN as an alternative to XLE.

Cheers,
Max
ReplyDelete
Replies
Ernie ChanSunday, February 18, 2007 at 12:27:00 AM EST
Max,
Glad you like my articles, and thanks for your suggestion.
Hope to exchange more ideas with you in the future!
Ernie
ReplyDelete
Replies
Paul TeetorThursday, August 23, 2007 at 1:39:00 AM EDT
Ernie and Yaser: For my trading, I've found XLE has a strong advantage over the alternatives: There is a single-stock futures available for XLE, and using the SSF drops the margin requirement from 50% to 20%. This extra leverage is very useful in spread trading. It is difficult to capture spread profits without that leverage, due to the small size of spread changes.
ReplyDelete
Replies
Camilo RostokerSaturday, March 22, 2008 at 6:04:00 PM EDT
Hi Ernie et al.,

Just wondering if there are any traders out there that use correlation or cointegration on an intra-day time scale to do day-trading. For example taking data samples as fast as 15 seconds, or maybe longer like every 10 minutes. Is there any useful information in time scales that small? I would think that it would depend highly on the volatility/liquidity of the underlyings so that enough margin could be made on the spread for such a strategy to be profitable. Just wondering if you have any experience or opinions on this.
Cheers,
Jack
ReplyDelete
Replies
Ernie ChanTuesday, March 25, 2008 at 5:43:00 PM EDT
Hi Jack,
Theoretically, cointegration is time-scale independent. So we cannot say a pair of stocks are cointegrated on a time scale of years, but not minutes. However, it is meaningful to ask what the average mean-reversion time is. I have written elsewhere on this blog (see Ornstein-Uhlenbeck formula) a good way to estimate this, and it will help you determine whether the pair of stocks is suitable for trading at the time-scale of interest.
Ernie
ReplyDelete
Replies
UnknownSunday, May 31, 2009 at 10:34:00 AM EDT
Hi Ernie,

I have been trading pairs in the Indian stock markets. I find your blog very much informative and educative. I really appreciate your efforts towards sharing indepth knowledge on the subject.

Can u explain the cointegration method via spreadsheet and if possible, share the spreadsheet. Appreciate if you can explain in a non-quantitative style. I want to learn interpreting the output of the cointegration test, whether it is mean-reverting or not for a given time frame.

Thanks
Bhumir
ReplyDelete
Replies
Ernie ChanMonday, June 1, 2009 at 3:30:00 PM EDT
Hi Bhumir,
Thank you for your interest in my blog. Unfortunately, cointegration test cannot easily be performed on Excel. I performed mine using Matlab. If you purchase my book, you will find sample codes on how to compute this.
Ernie
ReplyDelete
Replies
UnknownFriday, July 31, 2009 at 12:07:00 AM EDT
Hi Ernie, I have been reading your book. I must say it's very informative and it has helped me tremendously.

One question though about LeSage's cadf function when testing for co-integration. I notice that if you reverse the order of the y and x parameters (in cadf(y,x,p,nlag)), the resulting t-statistic can be very different for the same two sets of data.

Using your Matlab sample code 7_2.m as an example, if y is GLD and x is GDX, I get a t-statistic of -3.52. If y is GDX and x is GLD, I get -4.11. So what do I make out of this? Which result should I rely on to see if there's co-integration between the two sets of data? Or should I use both results (or the average of the two) as a guideline?

Thanks
Sam
ReplyDelete
Replies
Ernie ChanFriday, July 31, 2009 at 11:21:00 AM EDT
Hi Sam,
Yes, indeed the results are different depending on which series you pick as the independent variable.
My rule of thumb is to be conservative: regard a pair as cointegrating only if both t-stats meet the criterion.
Ernie
ReplyDelete
Replies
Peter MagnerWednesday, October 14, 2009 at 10:55:00 AM EDT
Hey Ernie, this is Peter from University of Cape Town South Africa. I am writing to ask you if you get any meaningful link if one is testing for integration if one uses correlation. i am testing integration across african markets for my thesis and have used Engel Granger cointegration test, but thought it might be nice to include a correlation matrix but dont want to look stupid. Also, just to confirm, does it matter if i only use A and independant and B as dependant and not test both ways?

Thanks
Please respond asap if possible!
ReplyDelete
Replies
Ernie ChanWednesday, October 14, 2009 at 8:51:00 PM EDT
Peter,
Including a correlation matrix will not convince anybody that the African markets are cointegrated. However, it might serve as an useful comparison in technique.

Indeed cointegration tests are variable-order-dependent, esp. for borderline cases. Try both orders.
Ernie
ReplyDelete
Replies
Fuzhi ChengWednesday, August 3, 2011 at 2:31:00 PM EDT
Hi, Ernie:
I have a rather simple question regarding index tracking using cointegration optimal portfolio (following an earler paper by Dunis & Ho: Cointegration portfolio of European Equities for Index Tracking) Suppose I am able to find cointegration in the following manner: ln(index)=2*ln(p1)+3*ln(p2) where p1 and p2 are the prices of constituent stocks in the index. The paper suggests using the "normalized" parameters for weights (can you please explain what normalization means in that paper?). I assume it is 2/5=0.4 and 3/5=0.6 for weights. Suppose asset 1&2 each has return of 5%, then the portfolio constructed with the 0.4, 0.6 weight would give 5%*0.4+5%*0.6=5% return. However by the original cointegration result: ln(index)=2*ln(p1)+3*ln(p2) and by first differencing it (becoming returns on both sides), the index return should be 2*10%+3*10%=50%. Definitely the portfolio is not tracking the index. I am sure there is something not right here... Thanks for your help.

Fuzhi
ReplyDelete
Replies
Ernie ChanWednesday, August 3, 2011 at 3:15:00 PM EDT
Hi Fuzhi,
You have to apply the normalized weights before computing returns, otherwise the two sides won't match. It would be like comparing the P&L of $1 capital with the P&L of $1M capital if you don't normalize by capital.
Ernie
ReplyDelete
Replies
Fuzhi ChengThursday, August 4, 2011 at 8:06:00 AM EDT
Ernie:
Appreciate very much your reply. However, I am still a little confused. Could you please explain again how you would normalize the weights if this is the cointegration results you get: ln(index)=2*ln(p1)+3*ln(p2)
where "index" is the index price, "p1" and "p2" are the prices of constituent stocks in the index. Seems all are in percent return terms and have nothing to do with the amount of capital.

Thanks.

Fuzhi
ReplyDelete
Replies
Ernie ChanThursday, August 4, 2011 at 9:35:00 AM EDT
Fuzhi,
The 2 and 3 represents units of capital. So clearly we need to normalize them so that both sides have the same total capital, typically 1 unit.

In any case, I dislike using logs. I prefer raw prices so that the number of shares are fixed.
Ernie
ReplyDelete
Replies
Fuzhi ChengThursday, August 4, 2011 at 11:03:00 AM EDT
Ernie:

Thank you so much for your help.

Fuzhi
ReplyDelete
Replies
SunyMonday, October 10, 2011 at 6:40:00 AM EDT
Dear Ernie, like say I manage to identify a good cointegrated pair. My question now is to work on a hedge ratio. When I regress price of A over B compared to B over A, I end up with two different hedge ratios. Which hedge ratio should I choose? As I will need to use the residual to determine a band for entry and exit. Depending on which hedge ratio I use, I end up with two different entry and exit.
ReplyDelete
Replies
Ernie ChanMonday, October 10, 2011 at 8:59:00 AM EDT
Suny,
The eigenvector obtained from the Johansen test can be used to determine a unique linear combination (i.e. hedge ratio) of the 2 price series.
Ernie
ReplyDelete
Replies
JeetThursday, December 8, 2011 at 4:18:00 AM EST
Hi Erin,
As you said in one of the earlier blog here, You said S1 ~ S2 and
S2 ~ S1 both should pass co integration test.

1. Let us say S1 ~ S2 is co integrated while reverse order is not. So does it mean that such pair is not co integrated?

2. Let us say, we have 5 stock to trade with, which one I should use as independent variable and other 4 as dependent without trying so many combination.
ReplyDelete
Replies
Ernie ChanThursday, December 8, 2011 at 7:59:00 AM EST
Hi Jeet,
1) This indicates the pair is borderline cointegrating. Trade at your own risk!
2) You should use Johansen test: it will give you all good combinations of symbols with no unique "independent" variable.
Ernie
ReplyDelete
Replies
JeetFriday, December 9, 2011 at 4:32:00 AM EST
what should be the logic behind choosing the independent set of stock and dependent stock from a basket of stock?

Johnson set might give result but I am looking for logic.
ReplyDelete
Replies
Ernie ChanFriday, December 9, 2011 at 8:12:00 AM EST
Jeet,
Logic can only be found if you have a fundamental economic understanding of the relationship between the assets. For e.g. if you believe that firm A and B are both big customers of firm C, you might argue that C's price should be a dependent variable.
However, I usually do not find it important to find out why a variable is independent: it makes no difference to the trading model.
Ernie
ReplyDelete
Replies
AnonymousTuesday, February 7, 2012 at 10:57:00 AM EST
Dear Mr.Chan,

I used your file to test ex7_2.m for GLD and GDX.
The t-statistic is -9.72, not -3.36.
What's wrong with it??

By the way, your book is very good.
Thanks.
ReplyDelete
Replies
Ernie ChanTuesday, February 7, 2012 at 12:43:00 PM EST
Anon,
Did you use my data file for the test? Did you set all the parameters for the cadf test to be the same as mine?
Ernie
ReplyDelete
Replies
99Monday, February 13, 2012 at 9:37:00 AM EST
Dear Mr.Chan,

Only used copy and paste.(ex7-2 , jplv7 *m-files and GLD/ GLD)

parameters?
Not only use those m.file, but also need to change parameters??

Base on the result, only t-statistic is not right.
Others are similar.

I tried GLD and GLD, two the same data, t-statistic is also -9. @@||
ReplyDelete
Replies
Ernie ChanMonday, February 13, 2012 at 11:08:00 AM EST
99,
There is no need to change the input parameters to the cadf function for testing cointegration.

Are you using the same input data as I used? Have you made sure the dates of those price series are ascending (most recent data on last row)?

Ernie
ReplyDelete
Replies
99Tuesday, February 14, 2012 at 7:53:00 AM EST
Dear Chan,

hmm.... I used those GLD/ GDX files from your server.(2006/05/23~2007/11/30 data)
I tested those file to "adftest.xls", the "Dickey Fuller Test Statistic" is right.
Is my Matlab wrong? @.@||
ReplyDelete
Replies
99Tuesday, February 14, 2012 at 7:58:00 AM EST
Dear Chan,

I mailed a letter to your G-mail with all m.file and my matlab screen.
If you have time, could you help to read it?
Thanks and sorry... disturb you.
ReplyDelete
Replies
Ernie ChanTuesday, February 14, 2012 at 10:47:00 AM EST
99,
I did not receive your email (I checked the spam folder too). Could u pls resend?
Ernie
ReplyDelete
Replies
FasTechs.com, Inc.Wednesday, March 14, 2012 at 10:15:00 AM EDT
I am wondering what everyone feels is the most reliable cointegration test in matlab? I have tried egci adf and jci and get widely different results. Then to make matters worse I check with catalystcorner and many pairs show significantly different results there.
ReplyDelete
Replies
Ernie ChanWednesday, March 14, 2012 at 11:15:00 AM EDT
cbucks,
Have you tried Johansen test?
Ernie
ReplyDelete
Replies
cf16Tuesday, February 19, 2013 at 9:22:00 AM EST
Cointegration is not the same as correlation.
certainly. it can be proven that pearson correlation coefficient will be close to 1 only if variance of each asset is relatively small to the variance of random walk process that generates data.
ReplyDelete
Replies
cheerfulFriday, May 30, 2014 at 1:35:00 PM EDT
ADF, Variance Ratio, CADF test

Dear Dr Chan,

I have a set of pair data using 5min and daily data@9am.

5min data: ADF = -2.63 vs 10%critical value=2.59, H2 = 0.35, h=1, CADF= -2.6 vs 10% critical value = -3.

Daily data: ADF = -1.6 vs 10%critical value=2.59, H2 = 0.4, h=0 p=0.8, CADF= -1.9 vs 10% critical value = -3

May I know if I can trade this pair? How to overcome different time frame where one shows a trend and the other shows a weak mean-reversal as above.

Thanks
Leo
ReplyDelete
Replies
cheerfulFriday, May 30, 2014 at 1:38:00 PM EDT
ADF, Variance Ratio, CADF test

Dear Dr Chan,

I have a set of pair data using 5min and daily data@9am.

5min data: ADF = -2.63 vs 10%critical value=2.59, H2 = 0.35, h=1, CADF= -2.6 vs 10% critical value = -3.

Daily data: ADF = -1.6 vs 10%critical value=2.59, H2 = 0.4, h=0 p=0.8, CADF= -1.9 vs 10% critical value = -3

May I know if I can trade this pair? How to overcome different time frame where one shows a trend and the other shows a weak mean-reversal as above.

Thanks
Leo
ReplyDelete
Replies
Ernie ChanFriday, May 30, 2014 at 1:42:00 PM EDT
Hi Leo,
It looks to me that for both 5 min or daily data, we can't reject the CADF null hypothesis. But that doesn't mean you can't create a profitable mean-reverting strategy: you just have to backtest it with various parameters.

It is common for an instrument to mean-revert in some timeframe while trend in another. You just have to adapt your strategy to the respective timeframes accordingly.

Ernie
ReplyDelete
Replies
cheerfulMonday, June 23, 2014 at 3:13:00 PM EDT
Dear Dr Earnest,

1) It looks to me that for both 5 min or daily data, we can't reject the CADF null hypothesis. But that doesn't mean you can't create a profitable mean-reverting strategy: you just have to backtest it with various parameters.
>> Do you mean adjusting the moving average and number of standard deviation for bollinger? If they are very weak mean-reversal as indicated by the cadf, we can expect a very poor result regardless of how well we optimize the bollinger. May I know how we can overcome it?

2) In example 5.1,
you use audusd data but I could only find the inputData_AUDCAD_20120426 from your "box" link. Could there be a mistake because the results I get after I replace audusd with audcad looked very close.

3) You use Johansen weights for the hedge ratio. If we normalize it ie weight 1/ weight2, we see huge spikes ie 20 times. If we use the second set of Johansen weights, the normalize weights also have spikes. The change of weights vary drastically day to day. Looks unstable to me?

Thanks
Leo
ReplyDelete
Replies
Ernie ChanMonday, June 23, 2014 at 4:08:00 PM EDT
Hi Leo,
1) Yes, you can optimize the parameters of the Bollinger band. If the best results are still too weak, you shouldn't be trading these assets using mean reversion models.
2) My box.net has data on all 3 files: AUDCAD, AUDUSD, and USDCAD.
3) I think I mentioned to you (or some other reader?) before that the Johansen test is not guaranteed to pick the same eigenvector as the "best" one everyday, as the order of their eigenvalues do change. So if you want to enforce continuity, you will have to make sure that the same eigenvector is used until the eigenvalue is much "worse" than the best one.

Ernie
ReplyDelete
Replies
cheerfulMonday, June 23, 2014 at 4:37:00 PM EDT
Dear Dr Ernest,

1) & 2) you are right. This shows highly correlated asset may not be highly cointegrated for mean-reversion.

3) I think I mentioned to you (or some other reader?) Yes. I have tried to use moving average on the eigenvector, switching to another eigenvector set when one is too high, taking average of the two eigenvector sets but all failed. The change is still drastic. Then I try to figure out the new value that is added and oldest value that is dropped out and compared to the average for the period (as you taught me), I dont see much of the change in graphical point of view but the resultant eigenvector set is still having big change. What can we do about it? Because at a certain day for both different eigenvector sets, I have to suddenly long ie 20 times more than usually on an asset.

Thank you
Leo

ReplyDelete
Replies
Ernie ChanMonday, June 23, 2014 at 5:05:00 PM EDT
Leo,
Are you saying that even if you stick with one continuously changing eigenvector, the hedge ratio still changes discontinuously? That seems unlikely.
Ernie
ReplyDelete
Replies
Ernie ChanFriday, September 26, 2014 at 11:41:00 AM EDT
Hi notcher,
As I wrote before, my web host considers that file too large to upload. If you email me, I can send you a link to the box.net folder.
Ernie
ReplyDelete
Replies
cheerfulFriday, October 3, 2014 at 12:23:00 PM EDT
Dear Dr Chan,

May I know if you are using signal properties ie ADF and cointegration to check mean-reverting property in the lookback period and if yes, proceed to use mean-reverting strategy for the next period. Hoping that the next period will continue to be mean-reverting until ADF and cointegration in the lookback period indicates non-reverting?

Thank you
Leo
ReplyDelete
Replies
Ernie ChanFriday, October 3, 2014 at 1:48:00 PM EDT
Hi Leo,
Actually, we use ADF test only as a screen for suitable pairs. Once a pair is deemed suitable for pair-trading, we just run Bollinger band type strategy on it.
Ernie
ReplyDelete
Replies
cheerfulThursday, April 14, 2016 at 4:33:00 PM EDT
Dear Earnest Chan,

Do you use Matlab in your production? I know prototyping using Matlab has a fast development speed but Matlab is very slow, compared to python, C++ or Java in terms of processing speed.

Thank you
Leo
ReplyDelete
Replies
Ernie ChanThursday, April 14, 2016 at 6:59:00 PM EDT
Hi Leo,
Yes, I use Matlab for trading some low frequency strategies.

I disagree that Matlab is slower than Python. Please see this academic study: Aruoba, S. Borağan and Fernández-Villaverde, Jesús . 2014. A Comparison of Programming Languages in Economics. NBER Working Paper No. 20263. Available at economics.sas.upenn.edu/~jesusfv/comparison_languages.pdf

Ernie
ReplyDelete
Replies
cheerfulFriday, April 15, 2016 at 6:55:00 PM EDT
Hi Dr Ernie

Would you run in unix or windows for production? I thinking of using cygwin to configure PC to unix.

That paper is very good.

Thank you
Leo
ReplyDelete
Replies
Ernie ChanFriday, April 15, 2016 at 8:23:00 PM EDT
Hi Leo,

Since I am not a high frequency trader, it doesn't matter to me whether we run it in Linux or Windows. We run it in Windows Server.

Ernie
ReplyDelete
Replies
AnonymousSunday, January 28, 2018 at 9:40:00 AM EST
Hi,
pg. 111 of "Algorithmic Trading:winning strategies..." refers to 2 files:
1)inputData_USDCAD_20120426
2)inputData_AUDUSD_20120426

But when I go to "http://epchan.com/book2/" I don't see none of them.
Where can I find them?
ReplyDelete
Replies
Ernie ChanSunday, January 28, 2018 at 10:17:00 AM EST
Hi,
Please email me to get those files.
Ernie
ReplyDelete
Replies
AnonymousSunday, January 28, 2018 at 10:59:00 AM EST
Hi Ernie,
my email: polargnome@yahoo.com

Thanks
ReplyDelete
Replies

Add comment