Comments on Quantitative Trading: More Data or Fewer Predictors: Which is a Better Cure for Overfitting?

Hi GR, I ran my backtest on a single PC, with just...

2017-06-11T16:13:02.995-04:00

Hi GR,
I ran my backtest on a single PC, with just 16 GB RAM.

My program is similar to that displayed on page 113-114 of my book Machine Trading. I have never run into memory problem.

We must use daily data, because different stock's data gets updated on different days. We will update the positions whenever there is an update.

Ernie

Hi Ernie, Really enjoyed your post. It is very ti...

2017-06-11T15:36:26.374-04:00

Hi Ernie,

Really enjoyed your post. It is very timely for me as I am in the process of conducting a very similar test using a different data set. I am currently running into an out of memory problem and was hoping to gain some insight into how you conducted your test?

Your test consisted of 500 stocks x 5 yrs x 252 trading days per year = 630,000 rows for the 500 stocks. Your test had 1 dependent (target) variable and the 27 independent (predictor) variables = 28 columns. Therefore, your Matlab matrix size was 630,000 x 28 = 17,640,000.

I am running Win 10, 64-bit, have 32 GB of RAM, i7-4790 3.60GHz Intel processor and cannot complete a stepwise regression on my database in Matlab due to "out of memory issues" for a matrix sized 50122 x 147 (7,367,934). I have my data saved in a table.

Did you run this simulation on a super computer or on a high performance PC? Looking for any tips to get this simulation going in Matlab is appreciated.

Also, why use 252 days of data for each stock. If the majority of fundamental data only changes quarterly (excluding P/E, etc.) couldn't the simulation be run every 63 (252/4) trading days for each stock? I am concerned about overweighting independent vairables to don't change but every 63 days.

Thank you for your time,
GR

Hi aagold, I am not sure which edition of Grinold ...

2017-05-06T21:43:42.466-04:00

Hi aagold,
I am not sure which edition of Grinold & Kahn you are reading, but I am using an old version from 1995. There, on page 48, section titled "Cross-Sectional Comparisons", it starts "These FACTORS [my emphasis] compare attributes of the stocks with no link to the remainder of the economy. These cross-sectional attributes can themselves be classified into two groups: fundamental and market. Fundamental attributes include ratios such as dividend yield and earnings yield, ...".

Hence Grindold & Kahn called "dividend yield" a "factor", while I, Ruppert & Matteson called it a "factor loading", and you called it a "characteristic". Note you did not call it a "factor" - if you did, I might agree that all you did was to reverse the naming convention of Ruppert and Matteson and I would give it a pass.

I think it is clear that there is no industry standard for naming these attributes. What Grinold & Kahn called "characteristic portfolios" Ruppert & Matteson called "hedge portfolios".

I am afraid I do not share your enthusiasm in this case of adopting one author or the other's terminology, so I think I will stop here!

Ernie

Hi Ernie, Thanks for your responses, glad to see ...

2017-05-06T20:49:53.972-04:00

Hi Ernie,

Thanks for your responses, glad to see you're not annoyed by this topic! :-)
Some people find this type of discussion useless but I find terminology is actually very important in scientific discussions.

Actually I think Grinold & Kahn is very consistent with my use of the term "characteristic". They make heavy use of a concept called "characteristic portfolios" on pages 28-35. Here's what they say: "Assets have a multitude of attributes, such as betas, expected returns, E/P ratios, capitalization, membership in an economic sector, and the like. In this appendix, we will associate a characteristic portfolio with each asset attribute". You can also look at Grinold & Kahn page 55 (eq. 3.16) where they define terms like factor loading/exposure and factor return - all consistent with what I've said.

You wrote, "In time series factor models, the "factors" such as HML are observable, and the factor loadings are unobservable, but in cross-sectional factor models, the "factors" are the regression coefficients which are unobservable." That may be Ruppert's terminology, but have you seen that reversal of roles between "factors" and "factor loadings" in the context of cross-sectional regressions anywhere else? I haven't.

I think this confusion stems from the two-step nature of a "Fama-Macbeth Regression", which involves both a time-series Step 1 and and cross-sectional Step 2.

Here's a wikipedia link: https://en.wikipedia.org/wiki/Fama%E2%80%93MacBeth_regression

Here's another link: http://didattica.unibocconi.it/mypage/dwload.php?nomefile=fama-macbeth20141115121157.pdf

In both these examples, the authors call the output of Step 2 a "risk premium", not a "factor". Note: it is true that the observable inputs to Step 2 are called "factor loadings" in general discussions of F-M regressions, since it's assumed these betas/exposures were derived in Step 1 time-series regressions. However, in the case when a characteristic such as B/P, E/P, etc. is directly observed rather than being calculated in Step 1, they're generally called "characteristics" rather than exposures or factor loadings.

Regards,
aagold

Here's

aagold, Yes, the terminology is a bit confusing. ...

2017-05-06T19:56:05.984-04:00

aagold,
Yes, the terminology is a bit confusing.

However, I find that using "characteristic" to describe them doesn't make it any easier to remember whether the regression coefficients should be called "factors" or "factor loadings". In time series factor models, the "factors" such as HML are observable, and the factor loadings are unobservable, but in cross-sectional factor models, the "factors" are the regression coefficients which are unobservable. Rather than introducing yet another term "characteristic", I just stuck to 2 terms.

To confuse the matter still further, some books such as Active Portfolio Management by Grinold and Kahn refers to the "characteristics" as "factors", and refers to the regression coefficients as "factor loadings", exactly the opposite terminology of what you and Ruppert's book both use!

Ernie

Ok, to each his own I guess, but personally I find...

2017-05-06T19:31:52.570-04:00

Ok, to each his own I guess, but personally I find that terminology very counter-intuitive and confusing. I think if you do a broader literature search you'll find that calling equity multiples like P/E or P/B (or their reciprocals) "factor loadings" is very rare; in fact, the book you mentioned may be the only place it's done.

Here's a little more evidence:
http://faculty.som.yale.edu/zhiwuchen/Investments/Fama-92.pdf

On page 4 of the classic Fama-French (1992) paper "The Cross-section of Expected Returns", the authors write "Our asset pricing tests use the cross-sectional regression approach of Fama-Macbeth (1973). Each month the cross-section on stocks is regressed on variables hypothesized to explain expected returns". Note: these are the guys who basically invented cross-sectional regressions, and nowhere in this paper, or any other paper from them I'm aware of, do they refer to these variables as "factor loadings".

Regards,
aagold

aagold, There can be different names for the same ...

2017-05-06T18:35:55.308-04:00

aagold,
There can be different names for the same object.

My terminology is based on the widely used graduate finance text "Statistics and Data Analysis for Financial Engineering" (2nd ed) by Profs. David Ruppert and David Matteson of Cornell University. In Section 18.5, for e.g. they discussed cross-sectional factor models which is the category of factor models I described in this post. On p. 539, they wrote "In a cross-sectional factor model ...; the loadings are directly measured and the factor values are estimated by regression". The loadings here are the fundamental characteristics such as P/E ratio that you referred to.

Ernie

Ernie, I've got a comment on your use of the ...

2017-05-06T18:02:18.742-04:00

Ernie,

I've got a comment on your use of the term "factor loading" in this and a few other blog posts I've seen. I believe the term "characteristic" is used in the literature for a particular stock or portfolio's B/P, E/P, etc. ratio, not "factor loading". The term "factor loading" is used when a time-series regression is done to calculate how much exposure a stock or portfolio has to a "factor" or "risk factor". For example, if we regress the time series of a particular stock's returns against HML (long portfolio of high B/P stocks, short portfolio of low B/P stocks) returns, the calculated regression coefficient (beta) is that stock's "factor loading" on the "value factor" (HML).

Here's a paper that studies the relationship between these two concepts:
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2549578

Here's the abstract:
We develop a methodology for bias-corrected return-premium estimation from cross-sectional regressions of individual stock returns on betas and characteristics. Over the period from July 1963 to December 2013, there is some evidence of positive beta premiums on the profitability and investment factors of Fama and French (2014), a negative premium on the size factor and a less robust positive premium on the market, but no reliable pricing evidence for the book-to-market and momentum factors. Firm characteristics consistently explain a much larger proportion of variation in estimated expected returns than factor loadings, however, even with all six factors included in the model.

Here's another relevant recent paper discussing this issue:
http://cfr.ivo-welch.info/readers/pub/cfr-024.pdf

Regards,
aagold

aagold, Interesting idea. I will look into this wh...

2017-04-23T15:59:23.517-04:00

aagold,
Interesting idea. I will look into this when I have some time.
Thanks,
Ernie

Ernie, I think it would be interesting to explore...

2017-04-22T13:42:24.570-04:00

Ernie,

I think it would be interesting to explore how Kelly optimization interacts with the predictor-variable reduction work you've discussed.

For example, how does the Kelly-optimized 27-variable CAGR compare to the Kelly-optimized 2-variable CAGR? They won't necessarily turn out the same, even though the out-of-sample unleveraged Sharpe ratios ended up the same, because the leverage optimization is done using in-sample data and the testing is done with out-of-sample data. I suspect the in-sample leverage optimization using the 2-variable model will be more effective than that using the 27-variable model.

I don't have your data and backtesting software, otherwise I'd investigate this myself, but if you're interested in doing that study then I think it would be interesting.

Regards,
aagold

aagold, Your deduction is correct. The 2-variable...

2017-04-20T16:50:45.834-04:00

aagold,
Your deduction is correct.

The 2-variable model has higher CAGR and average annualized (uncompounded) return too. The latter is what goes into Sharpe ratio. Since it has the same Sharpe ratio as the 27-variable model, it must mean that the 2-variable model has higher volatility.

Ernie

Thanks for the quick response. Just out of curios...

2017-04-20T15:57:50.826-04:00

Thanks for the quick response.

Just out of curiosity - since the original 27-variable model and the optimized 2-variable version have the same sharpe ratio but different CAGR, one must have both a higher numerator (mean excess return) and a higher denominator (standard deviation) such that the changes cancel out. Which model has both higher mean excess return and higher standard deviation?

I'm trying to figure out if the CAGR improvement of the 2-variable model happened because it's both more volatile and has higher return than the 27-variable version, which implies the 27-variable version was far below kelly-optimal leverage, or if it's the opposite which implies the 27-variable version was far above kelly-optimal leverage.

aagold, You have a good point. I was reporting un...

2017-04-20T13:22:59.696-04:00

aagold,
You have a good point.

I was reporting unlevered returns. Indeed, if we were to use Kelly-optimal leverage, the levered returns would be the same - but that's assuming that the returns distribution is close to Gaussian. There are usually fat tails that render the Kelly leverage sub-optimal, and so we would prefer a strategy that has higher unlevered returns if its Sharpe ratio is same as the alternative.
Ernie

Ernie, The main result of your article is "w...

2017-04-20T11:21:53.291-04:00

Ernie,

The main result of your article is "we achieve a quite significant improvement of the CAGR over the base model: 19.1% vs. 14.7%, with the same Sharpe ratio."

Are you calculating CAGR using levered or unlevered returns?

I'm surprised the CAGR can be so different when the Sharpe Ratio is the same. Eq. 7.3 from Thorpe's Kelly Criterion paper shows that the maximum achievable growth rate of a continuous-time diffusion is a function of Sharpe Ratio S (g_opt = S^2/2 + risk-free-rate). So if you're seeing such a major difference in CAGR without any difference in S then it must be because you're operating far from the kelly-optimal leverage. If you optimized leverage then you probably wouldn't see any difference in CAGR.

Hi, If you determined that a pair is cointegrating...

2017-04-13T08:36:19.343-04:00

Hi,
If you determined that a pair is cointegrating based on applying Johansen test to daily prices, it is quite OK to trade it intraday as well.
Ernie

Hi Ernie, Using Johansen-based pairs trading, is ...

2017-04-13T07:36:59.573-04:00

Hi Ernie,

Using Johansen-based pairs trading, is it possible to get intraday signals when we only use end-of-day prices?

Hi, For portfolio optimization, one should use unl...

2017-04-10T20:06:00.719-04:00

Hi,
For portfolio optimization, one should use unlevered returns.
Ernie

Hi Ernie, Thank you for quick response! If we wa...

2017-04-10T18:56:44.883-04:00

Hi Ernie,

Thank you for quick response!

If we want to do portfolio optimization (portfolio of strategies, not assets), how do we get co-variance matrix according to returns computation methods you mentioned above?

Many thanks.

Hi, There are 2 types of returns you can consider....

2017-04-10T08:06:27.640-04:00

Hi,
There are 2 types of returns you can consider.

For unlevered returns, divide the P&L by the "gross absolute market value" of your portfolio. For e.g. if you are long $10 AAPL and short $5 MSFT, the gross absolute market value is $15.

For levered returns, divide the P&L by the NAV of the account. In this case, yes, different leverages (i.e. initial NAV) will result in different returns.

Ernie

Hi Ernie, When we backtest a trading strategy, af...

2017-04-10T04:19:59.251-04:00

Hi Ernie,

When we backtest a trading strategy, after we get daily P&L, how do we get daily returns of the strategy?

If we assume different initial equity, we may get different volatility later.

Thanks.

Thanks for the clarification.

2017-03-11T16:41:10.278-05:00

Thanks for the clarification.

Even though each stock has new data 4 times a year...

2017-03-11T15:12:45.148-05:00

Even though each stock has new data 4 times a year, the stocks all have different earning release dates. Hence we have to compare all the stocks' fundamental numbers every day to make investment decision.

Please see Example 2.2 in my new book for details and codes.

Ernie

Hi Ever, I typically just take 5bps as transaction...

2017-03-11T15:10:25.711-05:00

Hi Ever,
I typically just take 5bps as transaction costs for S&P 500 stocks. It doesn't depend on your account NAV.
I haven't, however, deducted transaction costs in my results discussed above.

Ernie

Interesting post. I did not understand your answer...

2017-03-11T13:16:39.187-05:00

Interesting post. I did not understand your answer to dashiell on the number of training rows available, the post refers to using factor loadings from quarterly reports as the only independent factors in the regression model, how can you use it for daily returns? It would seem you have 4 different points per year so a total of only 5x4=20 points per stock for predicting the next quarterly return, unless I misunderstood the regression setup. Can you clarify please?

Does the transaction cost adjust based on the CAGR...

2017-03-11T11:08:51.164-05:00

Does the transaction cost adjust based on the CAGR for say $100K?