## Tuesday, January 19, 2010

Some readers have asked whether there is an Excel version of the ADF test for cointegration (mentioned in articles here or here.) You can download one such package here (Hat tip: Bruce H.).

jgpietsch said...

I realize this is self serving, but should you wish to post additional resources, ETF Rewind can automatically calculate ADF cross-matrices for over 50-stocks at a time in Excel. That's quite a grid size.

Keep up the terrific work Ernie.

Sincerely, Jeff

Anonymous said...

Hi Ernie

You can also get an Excel ADF plug-in here:
http://www.web-reg.de/

Don't worry the site is in English!

Question.

I like to run two ADF tests over the last 504 days, 1:252 then 253:end

You'll often find a year ago (1:252) a pair not cointegrated, but now (253:end) cointegrated.

In your experience do you like to see a couple of years of good cointegration (say tstat > 90%) or are you happy to trade a pair that has only been cointegrated to any significance over the last year (253:end)?

Thanks

Matt

Ernie Chan said...

Matt,
Yes, I would often like to see at least 3 years of cointegration before trading a pair. Of course, that's no guarantee that it will continue to cointegrate year-after-year, but I think it is more probable that it would.
Ernie

Ernie Chan said...

Jeff,
Thanks for the tip!
Ernie

Paulo said...

Hi Ernie

I am starting studying pairs trading using the cointegration model and i have a doubt. How do I measure the trade return. I read some articles and books that says the way to measure the trade return is by simple subtracting the exit spread by the enter spread when spread is LN(asset 1) - (coint.coef. * ln(asset 2).
Is this the tradicional way to measure the return in this kind of strategie ?

Thank you

Ernie Chan said...

Hi Paulo,
You can simply calculate the returns of each side of the pair, and assuming you have invested equal capital on each, average these 2 returns.
Ernie

Unknown said...

Hi Ernie,

What input time series data should I use for the Excel tool offered below:

Should I use:

a) difference of prices of stocks 1 and 2?

b) difference of log prices of stocks 1 and 2?

c) residuals of OLS between prices of stocks 1 and 2?

d) residuals of OLS between log prices of stocks 1 and 2?

e) something else?

Thanks!

Ernie Chan said...

Hi Anon,
An adf test should be performed on c).
Ernie

Math Finance said...

thanks, chan, that's what I am looking for, switching from excel to matlab just for an ADF test annoys me, cheers.

biao said...

sorry, I should call you Ernie, apologize.

Anonymous said...

Thanks for the reply Ernie. I get different values for the Dickey Fuller Test Statistic if I use OLS residuals from prices vs. log prices. Is there any reason why you chose c) prices instead of d) log prices?

Thanks

Michael

Ernie Chan said...

Michael,
You will usually get different t-stat if you use log instead of raw prices, but I bet the conclusions on whether they cointegrate are the same.

I like raw prices instead of log prices is because if you pair trade using log prices, you will have to rebalance your positions daily, whereas if you use raw prices, you can have a fixed number of shares until you exit the pair. But in terms of testing for cointegration, they should be the same.

Ernie

Anonymous said...

Hi Earnie,

Other than pairs such as IVV/SPY I'm finding that the cointegrating relationship is often unstable if a rolling vector/adf is computed. e.g. the adf may oscillate from -2 (not cointegration) to -4 (cointegrating) during the past 3 yrs. Would you ignore pairs such as these and just focus on ones which are strongly cointegrating at all times?

Thanks, Paul

Michael said...

Hi Ernie, just to add to Paul's question, I have seen many instances of pairs with very negative ADF test statistics (low p-value) when tested from 2006-2009, but their ADF stats for each of 2006, 2007, 2008, 2009 is much less negative (higher p-value). Any thoughts on why this might be? How is it possible that a pair integrate well over the entire 4 years but not so well over each of the individual years?

Thanks

Ernie Chan said...

Hi Paul and Michael,
The basic problem with applying ADF test (or in fact, ANY statistical tests) to financial time series is that finance is not like physical science: the statistics changes. So it is no surprise that a pair that used to cointegrate can stop doing so for some period, and then become cointegrating again. But yes, if you find a pair consistently cointegrate over any 1 year period, we can say that it has a better likelihood of repeating the same performance going forward.
Ernie

Michael said...

Thanks Ernie. But why is it possible for a pair to co-integrate with ADF stat of say, -1.50, for each of the INDIVIDUAL years 2006, 2007, 2008, 2009, but to show an ADF stat of -4.50 for the ENTIRE 2006-2009 period? i.e. how is it that the pair co-integrates MUCH better when we look at the entire 4 year period as a whole rather than look at each year individually?

Ernie Chan said...

Hi Michael,
That is indeed strange. I have not observed this phenomenon before, but will keep an eye out for it.
Ernie

Anonymous said...

Lets say you have a pair for which the ADF (computed using a rolling 1year window) has oscillated between -2.5 and -4.0 over the past 3 years.

Would you:

a) consider such a pair to be worth trading continuously

b) not trade the pair at all

c) trade the pair only when it was beyond the 5% critical level and deactivating it if it dropped below that (i guess running the risk of 'chasing cointegration')

thanks, Paul

Ernie Chan said...

Hi Michael/Paul,
Actually, if a pair has a long half-life, e.g. over a year, then it is quite possible that it will not show cointegration for each individual year, but can be cointegrating over the longer term, such as 4 years.

So to answer your question, Paul, you should compute the half-life of mean-reversion of the pair, and decide if you can tolerate a holding period (and drawdown) of that duration.

Ernie

Unknown said...

Ernie, what is half life of a pair and how does one go about computing it? Do you have any useful links on this?

Thanks much

Michael

Anonymous said...

Earnie, I've been computing my lifehalf by fitting an OU process to my spread, and for those spreads which are strongly cointegrated the HL is indeed short. Which leads me to the question, couldn't I just derive any spread (using the cointegration vector, price ratio or otherwise) and rely solely on the OU fit to determine whether the pair is tradable? Thanks, Paul

Ernie Chan said...

Michael,
You can just search for the keyword "half life" on my blog, and you will find several posts on this subject.

Also, it is explained in details in my book Quantitative Trading as well.

Ernie

Ernie Chan said...

Paul,
Yes, you can run a profitable short-term mean-reversion strategy on a short-half-life pair, just based on a regression fit.

However, cointegration test is better if you intend to hold the pair over a longer period, since a short half-life does not guarantee cointegration.

Ernie

A said...

Ernie,

To get the OLS residual, how do we decide whether to graph stock A on the X-axis and stock B on the Y-axis or vice versa?

Thanks,
A

Ernie Chan said...

A,
This is indeed a good point.

As with cointegration test, the order of X vs Y does make some difference. My general belief is that this difference is not big enough to matter. However, if you really want an order-independent way to construct your spread, or conduct your cointegration test, use Johansen rather than cadf test. You can use the eigenvector from the Johansen test to construct your spread.

Ernie

A said...

Thanks Ernie,

I apologize for my ignorance when it comes to cointegration. But, I was wondering if you can clarify a few points.
My understanding is that the CADF tests the null hypothesis which is that the series is non-stationary or NOT cointegrated. First, a p value of .05 or smaller is considered statistically significant. Therefore, we would reject the null hypothesis if and only if the p-value is smaller than 0.05, if not, I can't reject the null and my series can be considered as non-stationary or NOT cointegrated. If the p value is less than 0.05 we can consider the series to be cointegrated. Am I correct?

Could you please explain the Dickey Fuller Test Statistic, the lag order, and Rho (1=random walk,0=w noise)? (These are the inputs and outputs in the excel add-in that you provided the link to)
Also, how are the results interpreted?
After we have found that the p value is less than 0.05 for the residuals of OLS between prices of stocks 1 and 2, how do you determine the optimum hedge ratio? Do you just use the slope of the linear regression of stock 1 price vs. Stock 2 price?

I know I’ve asked a lot of questions, but any help would be greatly appreciated.

Thanks,
A

Ernie Chan said...

A,
You are right about the p-value.

You can try different lag orders to find out which one will lead to cointegration.

I am not familiar with Rho, as I don't personally use this Excel program. I use the Matlab program instead.

The optimum hedge ratio is determined by linear regression between the 2 price series.

For more details of how this is done, you can read examples 3.6 and 7.2 of my book.

Ernie

Anonymous said...

Hello,

I run Amibroker and have been trying to find a way to code in the ADF test to get a t-stat or something.
I have already programmed the regression coefficient and the hald life but don't know how to do the ADF test.
Ive looked online and there is no clear explanation(well to me anyways) of what you have to do step by step to obtain this stat. Are you aware of anay such explanation?

Thanks,

J

Anonymous said...

Hi Ernie,

Does any cointegrated pair have a mean reverting reiduals process? In my opinion, if the optimal lag order p, given by the Augmented Dickey Fuller Test, is different from 0 then the residual of OLS between prices of stocks 1 and 2 is not a stationary process and therefore the corresponding pair cannot be used as a trading pair. In fact, in that case, the residuals process is an AR(p) and thus does not have a constant mean. Am I right?

Thank you,

James.

Ernie Chan said...

Hi James,
p is not the lag, it is the order of polynomial. But if p>0, you are right that you have to allow for a non-zero drift in your time series. But that's not a problem with mean-reverting trade: it just says you should also have a long term long or short position, but also trade around it.
Ernie

BSend said...

Ernie,
Have you ever used the adf.test function in R. I ran it in a few time series in pairs, and it keeps giving DF statistics of higher than postive 1.5 and p-values of over 0.99. What do you think could be the cause?

BSend said...

BSendero@gmail.com

Ernie Chan said...

Hi intuit,
I actually haven't tried any programs in R myself.
Ernie

Anonymous said...

Hi Ernie,

Is there a way to conduct ADF test on basket of currencies instead of just a pair? I want to test the mean reverting theory on basket of currencies. If there is, then how to implement or what tool to use. Appreciate your reply.

Sincerely,

Kan

Ernie Chan said...

Hi Kan,
If you want to find a basket of currencies that cointegrate, you should use the Johansen test instead.

A free software package for running adf and Johansen tests in Matlab is available from spatial-econometrics.com
Ernie

michelcaspary said...

Ernie,
I have the closing price series for these stocks and I would like to know what kind of series I need to enter at that spreadsheet for the ADF-test.
thank you and congratulations for your blog

Ernie Chan said...

Michel,
I have not used the Excel adf program myself, but I am guessing that you need to input a price series with ascending order in dates.
Ernie

michelcaspary said...

Ernie,
when I want to use the ADF-test to verify if two stocks (stock1 and stock2) series are cointegrated how should I preoceed? Should I test the ADF on the difference between the stock2 price and the OLS regression line of stock1 and stock2?
Should I use closing price or log of that?
thank you, regards
Michel Caspary

Ernie Chan said...

Michel,
You should first use ols regression to find the hedgeRatio between the closing prices of the 2 stocks, then use this hedgeRatio to find the spread of the 2 closing prices, and finally use adf test to find if this spread is stationary.

Of course, if you have the cadf test available, you can run it directly on the 2 closing prices directly to test for cointegration instead.

Ernie

Anonymous said...

Ernie,

should the coint be done on the residual of stk 1 and 2 or the residual of stk 1 and 2 against time?

Ernie Chan said...

Anon,
Both should be tried, and usually the results are the same. But to eliminate order-dependence, use Johansen test where this problem does not arise.
Ernie

Anonymous said...

Hi Ernie, the link to the Excel package is dead...anywhere else we we can get it?

Ernie Chan said...

Hi anon,
has one.
Ernie

Anonymous said...

Hello Ernie,
so I have a OLS line of a cointegrated pair.
PriceY = PriceX * slope + intercept
lets suppose I want to trade about \$1mm of StockY, so I find the number of shares to trade StockY (sharesY). How do I go about finding the number of shares to trade StockX (sharesX) ?
thank you.
Michel

Ernie Chan said...

Hi Michel,
In your equation, 1 share of Y is hedged with slope shares of X.
So sharesY will be hedged with slope*sharesY of X.
Ernie

Anonymous said...

thank you very much
regards.
Michel

Earnie,

I'm using this spreadsheet but I am a little confused on the ln(A)-ln(B) values. Can I use intraday prices (ie. every second prices) or do I have to use EndofDay prices so there is one datapoint per day?

Will using more frequent data screw up my cointegration test?

I have access to submillisecond price data and I want to make sure I can use it instead of EoD prices...

Ernie Chan said...

There is no point in using intraday data for cointegration tests, unless you are interested in trading intraday. If you do, you have to concatenate all the intraday prices together, and remove the overnight gap, before you can apply cointegration test.

I also seriously doubt that you can apply Excel on sub millisecond prices due to memory and CPU constraints.

Ernie

Hi Ernie,

According to the description in the post there it says, "To know if they are mean reverting, simply create a series = Log(MSFT price)-Log(GOOG price). Paste the data into Sheet1 in cell A4"

However in the comments section here you mentioned one should take the residuals and use that as the time series. Is the description for the excel file wrong?

(btw if I regress stock A's return on stock B's return, the residual you talk about would be the error variable and not the standard dev of that error term right? And the whole point of the test would be to test if the residuals are stationary and have a mean of 0 and if so the pair are cointegrated?)

And one more dumb question: in the excel file you linked to, in cell A4 for the time series, do I place the most recent data at the top or the oldest data?

Thanks and kind regards,

Ernie Chan said...

I did not write the Excel program, nor did I write the note that we should just use log(MSFT)-log(GOOG) as the signal. My strategy would be to use linear regression to find the best hedge ratio, and use the residual of this LR as signal.

The residual can be standardized or not: either can be used as signal.

I never use that Excel code myself, as I am a Matlab-user, so I am afraid I cannot explain that code to you.
Ernie

Ajay said...

Hi Erin,

Am new to all this. Request if you can please send an excel file which can help me understand the working in details along with formulas.

have been trying really hard to understand the concept

Ernie Chan said...

Hi Ajay,
I don't actually use that Excel program for my own ADF test, as I am a Matlab user. If you want to understand how ADF test work in general, and to obtain the sample input data files, please see my second book Algorithmic Trading Example 2.1.
Ernie

Anonymous said...

Hi Ernie,

when you look for the z score of mean reversion pair, do i take the spread 0f the pair for the z score calculation or the residual of that pair?

P.S. i love your books and this blog too : )

Ernie Chan said...

Hi,
Thanks for your kind words on my books.

Zscore is based on the spread (i.e. residuals) of prices. One does not take the spread of the Zscores of residuals.

Ernie

Unknown said...

Kindly email "santoshkhooda@gmail.com"

Any one else having pls email "santoshkhooda@gmail.com"

Anonymous said...

Hi, everyone.

Unfortunately, 'quantcode.com' site closed down a few years ago.

Hj Kim

Anonymous said...

Hi Ernie,

when you talking about the residual, is it the equation of :(asset 1) - (coint.coef.*(asset 2)
or the residual of regression of the 2 stock ?

And the residual is tested for Adf test for a Delta diff of the residual and a 1 lag residual result.

Is i am right ?

Roger

Roger said...

Hi Ernie,

when you talking about the residual, is it the equation of :(asset 1) - (coint.coef.*(asset 2)
or the residual of regression of the 2 stock ?

And the residual is tested for Adf test for a Delta diff of the residual and a 1 lag residual result.

Is i am right ?

Roger

Ernie Chan said...

Hi Roger,
If you are using ADF test for cointegration of two assets, the residual is that of the linear regression between the two prices.

To simplify matters, you can just use the CADF test instead. There will be no need to run a regression, and no need to test residuals.

Ernie