The model is simple: at the end of each calendar quarter, compute the log of BM and ROE for every stock based on the most recent earnings announcement, and regress the next-quarter return against these two factors. One subtlety of this regression is that the factor loadings (log BM and ROE) and the future returns for stocks within an industry group are pooled together. This makes for a cross-sectional factor model, since the factor loadings (log BM and ROE) vary by stock but the factor returns (the regression coefficients) are the same for all stocks within an industry group. (A clear elucidation of cross-sectional vs time-series factor models can be found in Section 17.5 of Ruppert.) If we long stocks within the top decile of expected returns and short the bottom decile and hold for a quarter, the expected annualized average returns of this model is an eye-popping 26% or so.
I have tried to replicate these results, but unfortunately I couldn't. (My program generated a measly, though positive, APR.) The data requirement and the program are both quite demanding. I am unable to obtain the 60 quarters of fundamental data that the authors recommended - I merely have 40. I used the 65 industry groups defined by the GIC industry classifications, while the authors used the 48 Fama-French industry groups. Finally, I am unsure how to deal with stocks which have negative book values or earnings, so I omit those quarterly data. If any of our readers are able to replicate these results, please do let us know.
The authors and I used Compustat database for the fundamental data. If you do not have subscription to this database, you can consider a new, free, website called Thinknum.com. This website makes available all data extracted from companies' SEC filings starting in 2009 (2011 for small caps). There is also a neat integration with R described here.
*** Update ***
I forgot to point out one essential difference between the method in the cited paper and my own effort: the paper used the entire stock universe except for stocks cheaper than $1, while I did my research only on SP500 stocks (Hat tip to Prof. Lyle who clarified this). This turns out to be of major importance: a to-be-published paper by our reader I. Kaplan reached the conclusion that "Linear models based on value factors do not predict future returns for the S&P 500 universe for the past fifteen years (from 1998 to 2013)."
Speaking of new trading technology platforms that provide historical data for backtesting (other than Thinknum.com and the previously mentioned Quantopian.com), here is another interesting one: QuantGo.com. It provides institutional intraday historical data through its data partners from 1 minute bars to full depth of book in your own private cloud running on Amazon EC2 account for a low monthly rate. They give unlimited access to years of historical data for a monthly data access fee, for examples US equities Trades and Quotes (TAQ) for an unlimited number of years are $250 per month of account rental, OPRA TAQ $250 permonth and tagged news is $200. Subscribers control and manage their own computer instances, so can install and use whatever software they want on them to backtest or trade using the data. The only hitch is that you are not allowed to download the vendor data to your own computer, it has to stay in the private cloud.
Follow @chanep to receive my occasional tweets on interesting quant trading industry news and articles.
My online Mean Reversion Strategies Workshop will be offered on April 1-3. Please visit epchan.com/my-workshops for registration details. Furthermore, I will be teaching my Mean Reversion, Momentum, and Millisecond Frequency Trading workshops in London on March 17-21, and in Hong Kong on June 17-20.