The model is simple: at the end of each calendar quarter, compute the log of BM and ROE for every stock based on the most recent earnings announcement, and regress the next-quarter return against these two factors. One subtlety of this regression is that the factor loadings (log BM and ROE) and the future returns for stocks within an industry group are pooled together. This makes for a cross-sectional factor model, since the factor loadings (log BM and ROE) vary by stock but the factor returns (the regression coefficients) are the same for all stocks within an industry group. (A clear elucidation of cross-sectional vs time-series factor models can be found in Section 17.5 of Ruppert.) If we long stocks within the top decile of expected returns and short the bottom decile and hold for a quarter, the expected annualized average returns of this model is an eye-popping 26% or so.

I have tried to replicate these results, but unfortunately I couldn't. (My program generated a measly, though positive, APR.) The data requirement and the program are both quite demanding. I am unable to obtain the 60 quarters of fundamental data that the authors recommended - I merely have 40. I used the 65 industry groups defined by the GIC industry classifications, while the authors used the 48 Fama-French industry groups. Finally, I am unsure how to deal with stocks which have negative book values or earnings, so I omit those quarterly data. If any of our readers are able to replicate these results, please do let us know.

The authors and I used Compustat database for the fundamental data. If you do not have subscription to this database, you can consider a new, free, website called Thinknum.com. This website makes available all data extracted from companies' SEC filings starting in 2009 (2011 for small caps). There is also a neat integration with R described here.

***** Update *****

**I forgot to point out one essential difference between the method in the cited paper and my own effort: the paper used the entire stock universe except for stocks cheaper than $1, while I did my research only on SP500 stocks (Hat tip to Prof. Lyle who clarified this). This turns out to be of major importance: a to-be-published paper by our reader I. Kaplan reached the conclusion that "Linear models based on value factors do not predict future returns for the S&P 500 universe for the past fifteen years (from 1998 to 2013)."**

===

Speaking of new trading technology platforms that provide historical data for backtesting (other than Thinknum.com and the previously mentioned Quantopian.com), here is another interesting one: QuantGo.com. It provides institutional intraday historical data through its data partners from 1 minute bars to full depth of book in your own private cloud running on Amazon EC2 account for a low monthly rate. They give unlimited access to years of historical data for a monthly data access fee, for examples US equities Trades and Quotes (TAQ) for an unlimited number of years are $250 per month of account rental, OPRA TAQ $250 permonth and tagged news is $200. Subscribers control and manage their own computer instances, so can install and use whatever software they want on them to backtest or trade using the data. The only hitch is that you are not allowed to download the vendor data to your own computer, it has to stay in the private cloud.

===

Follow @chanep to receive my occasional tweets on interesting quant trading industry news and articles.

===

My online Mean Reversion Strategies Workshop will be offered on April 1-3. Please visit epchan.com/my-workshops for registration details. Furthermore, I will be teaching my Mean Reversion, Momentum, and Millisecond Frequency Trading workshops in London on March 17-21, and in Hong Kong on June 17-20.

## 90 comments:

Hi Ernie,

Usually How big is the capacity we can trade in one ETF pairs?

Thanks.

Hi Ernie,

In last post comment, you mentioned you trade ETF pairs with typically we have 4-8 such pairs. Are these 4-8 pairs including both intraday and interday trading? Normally do you expect you have positive result from all 4-8 pairs interday trading every year? Do you leverage through "leveraged ETF" or margin?

-HK

Hi Anon,

You can trade at least $400K per side without much market impact, especially at the close.

Ernie

Hi HK,

Yes, 4-8 pairs include both inter and intra days. No, we don't expect all pairs to have positive returns. We leverage through margin.

Ernie

Hi Ernie,

For 10% p.a., does that mean 1 million dollars capital would become 1.1 million dollars capital after 1 year? What p.a. should we expect with novice/intermediate pair trading knowledge and not applying leverage/Kelly formula? Thanks.

-HK

Hi HK,

Yes, 1M become 1.1M for 10% p.a.

If you do not apply leverage, the return can be as low as 3-4% p.a.

Ernie

Hi Ernie,

Thanks. For ETF, is there a way to calculate how much % of the ETF can be moved away from the value that it is representing because of temporary stronger demand than support? For example, both 2823 and 2822 ETF in both suppose to reflect A50 index which reflects the china market and normally move in almost exactly %, while 2823 is more popular than 2822 with higher trading money volume (like 40% higher). China market has surprising good raise today then 2823 raised 1.8% while 2822 raised around 1.3%, eventually they come much closer in the afternoon and close at raise 1.39% and 1.27%. I don't know if there is fundamental reason to explain this and anyway to calculate the biggest possible gap with extreme demand and supply different while they represent the same index and being traded in same exchange.

-HK

Hi HK,

If you know the exact % of stocks held in an ETF, you can certainly calculate the premium/discount of its market value relative to its NAV.

Ernie

Hi Ernie,

Thanks. I want to learn more about how a stock/future exchange technically work in details, like how the exchange system decides there is enough demand more than supply to move 1 price unit. I think this helps me to think about short term trading strategy. Is there any book or article suggestion?

-HK

Hi HK,

I recommend the book Trading and Exchanges: tinyurl.com/op3nqfo

Ernie

Hi Ernie,

How do I reset paper account in IB?

I choose higher amount, such as 10,000,000, but the system tells me "Max reset amount is GBP 60,000". That is weird. It is too small.

Thanks.

Hi Anon,

You can go to Account Management in IB to reset paper trading amount. For further help, you can send them a support request.

Ernie

Hi Ernie,

Do you know any good software for paper trading?

Hi Anon,

My favored platform for paper trading is Matlab connecting to Interactive Brokers via matlab2IB api by exchangeapi.com. You can also use undocumentedmatlab.com API.

Ernie

Hi Ernie,

I like IB paper trading account too, and I use their Java API.

However, IB paper account balance now is limited to $1,000,000.

I want to trade bigger balance. Do you have any suggestion?

Thanks.

Hi Anon,

I recommend you speak with an IB customer rep to resolve this issue. I am not able to reset the paper account NAV to over 1M either.

Ernie

Hi Ernie,

I spoke with IB customer rep already.

Now our paper account can only have initial cash 1M or 2 times of our production account.

I guess IB think paper account consume too much of their resource.

Hi Ernie

i havent read through the paper yet, but it sounds to me that it works similar to those simple value investing strategies if you buy low PE stocks baskets and sell high PEs once a year very likely you get an "OK" annualized return over a long period of time. But I think the frequency would be too low and can't be leveraged much due to low sharpe ratio?

Hi Paul,

Yes, it is a value strategy not dissimilar to ones described in Joel Greenblatt's Little Book that Beats the Market. But the difference is that this is backtested rigorously over 60 quarters with survivorship-bias-free data.

With a supposed APR of 26% or so, I don't think you will need leverage.

Ernie

Hi Ernie,

I am glad to see you back on Twitter!

Hi Ethan,

Thanks!

Yes, I find it is easier to share industry tidbits via Twitter than to have a write a blog article.

The only thing about Twitter I don't like is that their accounts seem easily hacked, and hackers do send out horrific tweets on the account owners' behalf.

Ernie

Hi Ernie,

alexa.com may be helpful for you to check out which website like facebook/twitter is the most popular especially for countries that you want.

For ADF to test if two price series are conintegrated, I tried to use the excel ADF that you mentioned in previous post, but seem like that download link is down. I find out this one in the same website:

http://www.quantcode.com/modules/mydownloads/singlefile.php?lid=573

The instruction says if we want to test the conintegration between two price series, just do the Log(price A)-Log(price B) and then put in the single price serie column in the excel to do the calculation.

I try this with 2823 China A50 ETF and 2828 H-share ETF in HK market. I got these two historical data from yahoo, then use some VBA to match the date and daily open price, and I get these result from data from 31 Dec 2007 to now(around 5 years):

Dickey Fuller Test Statistic -3.098206234

p-value 0.113033535

Lag order 11

I understand these numbers as the critical value of -3.098 should be 9.x% because it is little bit higher than the 10% crit value -3.042 and p-value shows it has over (1-0.113) 88.7% chance that it is a stationary relationship. I don't know if we should measure the lag order. So how do you think about this pair?

What is a good range of t-statistic value and p-value in order to consider a possible good profitable pair?

Thanks,

-HK

Hi HK,

I think your pair is not very stationary. However, there is no harm in backtesting a strategy with short lookback with it. Sometimes non-stationary pair can still exhibit short-term mean-reversion.

I would prefer a p-value less than 0.1. t-statistics depends on the size of your data set.

Ernie

Hi Ernie

When you test cointegration of the pairs, do you employ any bootstrapping methodology due to finite data set? Thanks

Hi Anon,

No, I haven't tried that. Bootstrapping is not particularly suitable for cointegration test because it requires long blocks of dates to be moved together, otherwise you would destroy the cointegration by bootstrapping.

Ernie

Hi Ernie,

How can I use ADF to test a pair for day trade with 1 minute bar and how many days of data should I include to have a solid conclusion?

Thanks,

HK

Hi HK,

To test for cointegration for intraday data, you have to concatenating several days of data together while eliminating the overnight gap, just as you would backadjust continuous futures contracts. Even with intraday data, you will need a month of it for backtesting.

Ernie

Hi Ernie,

Is it hard to buy and sell GS?

I find its bid size and ask size are small, such as 200 shares.

Hi Anon,

In the US market, the BBO sizes for any stock, even the largest cap stock such as AAPL, are about 100 shares at most times. This is a consequence of market makers avoiding getting run over by HFT.

I have discussed the cause and cure of this problem in details in my course Millisecond Frequency Trading.

Ernie

hi Ernie,

IS it possible to do Millisecond Frequency trading on IB?

Hi Anon,

No, it is not possible to trade with millisecond latency on IB. IB does not allow colocation at their servers, so the minimum latency is > 10 ms (for a trade execution to reach your program.)

Ernie

Hi Ernie,

There is a software called pairtradeFinder.

http://www.pairtradefinder.com/

Have you heard about this software?

Hi Anon,

No, I haven't. I typically pick my pairs using my proprietary algorithm implemented in Matlab.

Ernie

Hi Ernie,

Have you heard about Barchart as data feed?

Hi Anon,

I have heard of it, but haven't used it myself.

Ernie

Hi Ernie,

For testing a single etf mean reverse strategy, should I also apply ADF to test it and use p-value better less than 0.1 as a general guideline?

For "t-statistics depends on the size of your data set.", where can I read more about this to decide if the pair/single is stationary?

If I believe pair trading is the best way but I want to go beyond 10% apr and I know the consequence could be broken account, is that applying leverage more than Kelly formula with a pair trading strategy or few strategies would be a good way to think about this?

I think if I need to take high risk, a over leverage pair trading strategy is still better than a momentum strategy in most of the cases.

I have used mean reverse strategy with single ETF in recent real trading and it is very effective. I am trying to find out a good pair trading strategy.

Thanks,

HK

Hi HK,

Yes, you can use the p-value given by the adf function in spatial-econometric.com to determine stationarity. There is no need for t-statistics.

Sure, if you want more returns and not afraid of NAV=0, you can apply as much leverage as your want.

I disagree with your statement about risk in momentum vs mean-reverting strategy. But see my second book for detailed reasoning.

Ernie

HI Ernie,

Do you mean the "Stop Loss" section in chapter 8? I just tried to say mean reverse strategies have higher winning % chance in general so it would beat momentum strategies in most of the cases.

-HK

Hi Ernie,

I just use ADFtest to test 2 index future hour data rows data with open price of each hour. Here is the result of 2 tests:

current month futures(FEB) with 19 days, 325 row date and time matches

t-stats -3.554663018

p-value 0.037751633

Lag order 6

next month futures(MARCH) with August data to now, 421 row date and time matches

t-stats -2.654071465

p-value 0.301063997

Lag order 7

For this second test, it is one of the quarter future so it could be traded back in August 2013 to now but it wasn't actively traded so there weren't many records in 2013 then become actively trade when it becomes the "next month future" in February 2014.

Is the p-value too small for the first test so it would be too stationary to be profitable? Or I don't have enough data to tell yet?

The daily stationary level would not be acceptable in long term because these two index goes to two extremely in recent months as we can see in the second test, but seem like the hourly pair can be a stationary match as we can see in the first test. Is this idea a possible profitable strategy?

Thanks,

-HK

Hi Ernie,

I just think about why the p-value of the second test is so big and I think I know why now. It should be because it was so inactive trade so that the hourly data only reflected a random price within that hour instead of the real open price.

-HK

Hi HK,

No, I meant the last section of Chapter 6 on Pros and Cons of Momentum Strategies.

Winning ratio is certainly not the only criterion for a good trading strategy.

300-400 data points are sufficient for a good adf test.

Ernie

Hi Ernie,

Have you seen any research or implementation of pairs trading done not on the basis of a cointegrating price relationship between two stocks, but rather on a fundamental metric (e.g. P/E) that may be stationary? I.e., trades entered / exited when the relative valuation diverges from a mean, irrespective of the price series?

Thanks, Danie

Hi Danie,

It is a good idea to test for coint on P/E instead of just P. But P/E does involve data price series, even though the E only changes once a quarter.

Ernie

Hi Ernie,

A huge fan of yours. Just wanted your opinion on this. I am backtesting a momentum trading strategy, which seems to work for AUD and CAD. I think since the currencies are tied to commodities I am quite comfortable with the underlying reason. However I am still not sure on if it is really profitable or not to run this.

The return p.a is 1.16% and the stdev is 1.1% (Unlevered)

From 2005-2012 it made 90 trades with 65 profitable trades and 24 losing trades. Roughly 70% win ratio and the Profit: Loss ratio is 1:3 , meaning I win 10 pips and lose 30 pips.

Seems to me that it is very much close to zero and does not suggest any alpha. Normally how much return per annum would one look at for an unlevered FX strategy?

Thanks!

Hi Henry,

We should expect a minimum APR of over 4% unlevered. Have you looked at the ratio between APR and max drawdown? It should be higher than one.

Ernie

Hi Ernie,

Thanks for your insight.

My max draw is -0.7%. So it is above 1 but I am just worried this might be one where the carry paid out is more than received and also slippage.

Thank

Hi Henry,

If your holding period is very short compared to 1 year, then the carry cost should be minimal. But you do have to include slippage and bid-ask spread in your transaction costs while backtesting.

Ernie

Hi Ernie

In your opinion, do you think commodity future is suitable for pair trading? Have you ever researched the possibility of doing pair trading copper future from 2 different exchange, for example TOCOM and SGX

Thx

Hi Anon,

There are very few pairs of futures that are suitable for mean-reverting trading. But of course that doesn't prevent one from trying a specific pair like you suggested.

Ernie

Hi Ernie,

DO you know any good VPS vendor?

Thanks.

Hi Ernie,

I have a question about linear regression and R-squared. I remember you mentioned before that linear analysis is better than non-linear.

That is why I am trying to learn this.

I played with linear reg and RSquared value on stock chart, and it seems RSquared value is too choppy sometimes, such as that it can go over threshold like 0.50 and come back down again before we can decide if the trend is cleanly up or not to be able to trade it.

How do we use this or any other approach can you recommend for me to explore in linear analysis?

Thanks!

Sonny

Hi Ernie,

What is the difference between VPS service and Amazon EC2?

Thanks.

Hi Anon,

I recommend Rhythmic Technologies for VPS.

The difference with Amazon EC2 is that your "instance" is persistent. You never need to store or restart an instance.

Ernie

Hi Sonny,

I have never used stockcharts. The way you described R-squared doesn't make sense to me.

The entry signal should just be the residual: difference between actual y value vs fitted y value.

Ernie

Hi Ernie, I was just curious - what is your recent experience with mean reversion on currency pairs, if you can share that info.

Thanks.

Hi Anon,

There are very few currency pairs that mean-reverting. Exceptions are the commodities currency pairs such as AUDCAD.

Ernie

Hi Ernie,

I try to get ETF data out of Yahoo like this:

http://chartapi.finance.yahoo.com/instrument/1.0/2800.hk/chartdata;type=quote;range=5d/csv

I remember you mentioned you used Yahoo data before as well. Is that the smallest time unit is 1 minute? Can I get hourly data from it? Have you tried to get real live data feed from Yahoo and use it to create signal and trade?

Thanks,

-HK

Hi HK,

I have never tried getting data from Yahoo with frequency higher than 1 day, so your method certainly sounds interesting.

I have used Yahoo RealTime for their live quotes, and they are pretty good.

Ernie

Hi Ernie, regarding my post above (linear reg)

What I meant actually when I plotted R-squared and linear reg on a chart (in this example, an equity, TSLA)

Not necessarily "stockcharts.com"

http://www.screencast.com/t/b4rku2hgN6rf

I have the linear reg (20) and R-squared (10)

R-squared seems giving a pretty choppy signal to be able to tell the strength of the linearity of the price....

Any suggestion how to use the Rsquared or better and cleaner technique? Thanks

my email is sgtheguitarist at gmail , if that is better to communicate...

Thanks much!

Hi Sonni,

I recommend you use the cadf or Johansen test to establish cointegration between the 2 time series first. Once established, linear regression can be used to determine the mean-reverting residues to be used as trading signals. I never care much about R^2 of a LR, once cointegration is determined to have a p-Value of smaller than 10%.

You can read my book Algorithmic Trading for further details.

Ernie

Hi Ernie,

For Amazon EC2 service,

after we "stop" instance, do Amazon charge anything? How much do they charge?

Thanks.

Hi Anon,

No, I don't believe you have to pay anything except for disk space charge after you stop an instance.

But best to confirm with their customer service.

Ernie

Hi Ernie,

For a main index of an exchange, if the ETF of this index has annual interest return, does the index future has interest too? Do the index ETF and index interest both theoretically raise 1/(number of trading number) of annual interest return for one overnight? Do the index ETF and index future suppose to raise and drop in basically the same percentage in intraday or interday timeframe except the the day that ETF pays annual interest?

Thanks,

-HK

Hi HK,

By "interest", I think you meant "dividend"?

The index futures value already include dividend payout.

The daily returns of the index ETF vs the index future are not necessarily the same. Futures always include roll returns (discussed extensively in my new book), while index ETF only generates spot returns.

Ernie

Hi Ernie,

May I ask what kind of "instance" you used or would use in Amazon EC2?

Duo cores, or quad cores?

Is internet speed stable in EC2?

Thanks.

Why do people keep asking things that are better directed towards Amazon's customer service?

And can you please read Ernie's books if you are interested in cointegration etc and stop asking basic and often stupid questions.

Sorry, but I am getting a bit frustrated.

Hi Anon,

Dual or quad cores depend on how many programs/processes you intend to run on these instances.

Amazon servers have better connection to the internet backbone than any office computers.

Ernie

Hi Anon,

Thanks for your kind defense of the space here!

Ernie

Ernie

Do you have any suggestions of papers or other material on intraday mean reversion strategies?

Thanks

Hi Anon,

My second book described an intraday mean reversion strategy on stocks.

Ernie

Hi Ernie,

I did some little research to find out the structure of ETF of future. I looked at this paper:

http://sibresearch.org/uploads/2/7/9/9/2799227/riber_k13-069_223-237.pdf

Seem like the premium of ETF would be the biggest factor, while volatility of future can be higher than its underlying market index.

Should I consider both premium of ETF and roll returns of future are non predictable noise in the pair? Seem like there is no pattern to predict the changes of premium of ETF or roll returns of future after few days or a month.

If my plan is day trade, then only use data in last few days to backtest and optimize for the trades of today, is that I don't really need to care about premium of ETF and roll returns in day trade range?

Thanks,

HK

Hi HK,

Actually, both the ETF premium and futures roll returns can be highly persistent ... not at all random. You should build a model to study them. In fact, you can read chapter 5 of my second book to see how to compute the roll returns every day.

If you are day-trading, these are indeed less important, though not necessarily negligible.

Ernie

Hi Ernie,

Do you trade futures calendar spreads?

Is that easy to trade, compared with ETFs pairs?

Thanks.

Hi Anon,

Actually, it is more difficult to find a strategy that works for calendar spread than for ETF.

However, I did suggest one in chapter 5 of my 2nd book.

Ernie

Hi Erine

I heard of " time varying cointegration ".

What is the main adv of this approach over the one you mentioned in your book, or any drawbacks compare the standard cointegration implementation?

Thanks

Hi Anon,

Time varying cointegration just means that you allow the cointegration vector to vary slowly over time. It can certainly be more realistic than to consider them fixed over a lookback period. The drawback is that you have to propose a model for how this vector vary, and that will involve more parameters and more room for data-snooping bias.

See http://econ.la.psu.edu/~hbierens/TVCOINT.PDF

for more details.

Ernie

Hi Erine

Why you prefer using cointegration approach on pairs trading, but not other approaches likes stochastic spread, APT etc...

Thanks

Hi Anon,

I have not studied and compared each and every strategy or technique for pair trading. However, my gut feel is that their returns for a specific pair will be similar over the long term.

Ernie

Hi Ernie,

If you were to breakdown the risk in your portfolio over the last 1-2 yr, what would your risk decomposition look like in terms of ETF pairs trading, futures momentum, PEAD, ... Just curious because you've covered quite a few strategies in your 1st and 2nd book as well as on this blog.

Cheers!

Hi Anon,

My risks are dominated by pair trading and futures momentum in the last year.

Ernie

Hi Erine

I saw you are also offering a HFT training course

http://www.technicalanalyst.co.uk/training-courses/millisecond-frequency-trading-mft/

But, how could a retail trader can compete with the big institution in this domain when they are equipped with the best technology and huge money to invest on the hardware/infrastructure, and their flow is always a toxic one

http://www.extremetech.com/extreme/176551-new-laser-network-between-nyse-and-nasdaq-will-allow-high-frequency-traders-to-make-even-more-money

http://business.financialpost.com/2014/03/11/virtu-financial-the-high-frequency-trader-that-had-1-day-of-trading-losses-in-1238-days-files-for-ipo/

thx

Hi Anon,

Actually, that course is NOT on high frequency trading. You might call it Medium Frequency Trading. HFT would require sub-millisecond latency these days, and indeed unsuitable for retail traders.

Retail traders can still profit for strategies operating at, say, 10-20 ms. Furthermore, half of the course is on what longer term traders should avoid doing in face of HFT strategies.

Ernie

Hi Erine

Normally, how often you will re-run the cointegration test to update the spread profile? are you using any rolling windows to recalculate the spread?

Thx

Hi Anon,

I recommend daily update, using a moving window of 1-3 years.

Ernie

Hi Erine

You sugguest to update the spread profile daily by using a 1-3 years moving window.

In this case, what is the key difference with the time varying cointegration approach you replied to another reader ...see below

//----------------------//

Hi Anon,

Time varying cointegration just means that you allow the cointegration vector to vary slowly over time. It can certainly be more realistic than to consider them fixed over a lookback period. The drawback is that you have to propose a model for how this vector vary, and that will involve more parameters and more room for data-snooping bias.

See http://econ.la.psu.edu/~hbierens/TVCOINT.PDF

for more details.

Ernie

Friday, March 7, 2014 at 2:55:00 PM EST

//------------------------------//

Hi Anon,

Whether you use a moving window for cointegration or time-varying cointegration, the hedge ratio will change slowly over time. The latter method however is more "principled", but introduces more parameters with room for data snooping bias.

Ernie

Ernie,

Do you or anyone else know where to get 1-minute bar data for European equities?

I only find tickdata.com but they are as usual very expensive.

Thanks

Hi Anon,

You can check out nanex.net.

Ernie

Hi Ernie

If I write a trading program by using Python and it also calls the R-library.

Can this trading program be complied as an exe file to run it on other machine without the source code?

I have been told one can compile Python programs into C executables using a program called NumbuPro.

Ernie

Post a Comment