*m*and standard distribution

*s*, then many finance students know that the mean log returns is

*m-s*2

*/2*

*.*That is, the compound growth rate of the stock is

*m-s*2

*/2*

*.*This can be derived by applying Ito's lemma to the log price process (see e.g. Hull), and is intuitively satisfying because it is saying that the expected compound growth rate is lowered by risk ("volatility"). OK, we get that - risk is bad for the growth of our wealth.

However, let's find out what the expected price of the stock is at time

*t*. If we invest our entire wealth in one stock, that is really asking what our expected wealth is at time

*t*. To compute that, it is easier to first find out what the expected log price of the stock is at time

*t*, because that is just the expected value of the sum of the log returns in each time interval, and is of course equal to the sum of the expected value of the log returns when we assume a geometric random walk. So the expected value of the log price at time

*t*is just

*t** (

*m-s*2

*/2). But what is the expected price (not log price) at time*

*t*? It isn't correct to say exp(

*t** (

*m-s*2

*/2)), because the expected value of the exponential function of a normal variable is not equal to the exponential function of the expected value of that normal variable, or E[exp(x)] !=exp(E[x]). Instead, E[exp(x)]=exp(μ*

*+*σ2

*/2) where μ and σ*

*are the mean and standard deviation of the normal variable (see Ruppert). In our case, the normal variable is the log price, and thus μ=*

*t** (

*m-s*2

*/2), and σ2=*

*t**

*s*2 . Hence the expected price at time

*t*is exp(

*t**

*m*). Note that it doesn't involve the volatility

*s.*Risk doesn't affect the expected wealth at time

*t*. But we just argued in the previous paragraph that the expected compound growth rate

*is*lowered by risk. What gives?

This brings us to a famous recent paper by Peters and Gell-Mann. (For the physicists among you, this is

*the*Gell-Mann who won the Nobel prize in physics for inventing quarks, the fundamental building blocks of matter.) This happens to be the most read paper in the Chaos Journal in 2016, and basically demolishes the use of the utility function in economics, in agreement with John Kelly, Ed Thorp, Claude Shannon, Nassim Taleb, etc., and against the entire academic economics profession. (See Fortune's Formula for a history of this controversy. And just to be clear which side I am on: I hate utility functions.) To make a long story short, the error we have made in computing the expected stock price (or wealth) at time

*t*, is that the expectation value there is ill-defined. It is ill-defined because wealth is not an "ergodic" variable: its finite-time average is not equal to its "ensemble average". Finite-time average of wealth is what a specific investor would experience up to time

*t*, for large

*t*. Ensemble average is the average wealth of many millions of similar investors up to time

*t*. Naturally, since we are just one specific investor, the finite-time average is much more relevant to us. What we have computed above, unfortunately, is the ensemble average. Peters and Gell-Mann exhort us (and other economists) to only compute expected values of ergodic variables, and log return (as opposed to log price) is happily an ergodic variable. Hence our average log return is computed correctly - risk

*is*bad. Paradox resolved!

===

**My Upcoming Workshops**

May 13 and 20: Artificial Intelligence Techniques for Traders

I will discuss in details AI techniques as applied to trading strategies, with plenty of in-class exercises, and with emphasis on nuances and pitfalls of these techniques.

June 5-9: London in-person workshops

I will teach 3 courses there: Quantitative Momentum, Algorithmic Options Strategies, and Intraday Trading and Market Microstructure.

(The London courses may qualify for continuing education credits for CFA Institute members.)

## 57 comments:

I don't grok this paper.

On the surface it feels like it's mainly addressing 100 year old arguments. There are very few references to recent economics/finance books or articles.

So if log price is an ergodic variable, then what's wrong with projecting that into the future and converting it to a price?

John,

No, you misunderstood. The paper states that log price (or price) is NOT an ergodic variable. Only changes in log prices is. Change in log price is log return.

Ernie

Sorry, I had typed it up wrong. I had meant the log return. So for instance, what's the issue with saying

X_t ~ N(mu, sigma)

where X_t is the log return. Projecting this out n periods gives

X_t_n ~ N(mu*n, sigma*sqrt(n))

then converted to price you have

Y_t_n ~ exp(X_t_n)

So your distribution at the horizon is log normal.

Then you express utility on the distribution of the price at horizon.

John,

What you have done seems exactly what I did in my post. The result is that the expected price or log price depends only on average 1-period net (not log) return, but not the standard deviation of the net return. But the calculation involves an ensemble average E[exp(x)]=exp(μ+σ2 /2), not a time average. If x is not ergodic, the ensemble average isn't equal to the time average, and I am unaware of an analytical formula for a time average.

Ernie

I suppose part of my confusion is on the distinction between ensemble average and time average. I looked up the difference here

http://www.nii.ac.jp/qis/first-quantum/forStudents/lecture/pdf/noise/chapter1.pdf

I still don't know why I should be convinced...

First, the mu in our above comments are in the frequentist sense calculated as time average. For one individual stock, there is no ensemble. There is only one realization in the past.

Second, I'm not really even sure why the difference between time averages and ensemble averages matters. I'm with you completely on the importance of erdogicity, but wealth is supposed to grow over time. Wouldn't it be a bad thing if its mean was unchanged over time?

John,

mu is the average 1-period log return. As log return is ergodic, there is no difference between time or ensemble average.

Wealth grows over time whether you compute the time or ensemble average. But the difference is that the formula for average wealth displayed in my article is independent of risk. But that average wealth applies only if you buy 100,000 stocks, each have same mean return and standard deviation of returns, and you are interested in the portfolio's return. If you own only 1 stock, then your time-averaged wealth will be reduced by the standard deviation, but I did not display the formula there. What I displayed is the averaged growth rate of wealth, which clearly shows it is reduced by s^2/2. It is not a matter of whether the mean changes over time: it doesn't.

Ernie

Ernie

Hi

I'm with John on this. It's a non-issue. Keep your estimation separate from your portfolio optimization/analysis and once you've projected your log returns to the correct horizon do the necessary conversions thereafter.

Also, I think your formula for expected price is incorrect: E(P_t) = exp(t*m). If you have defined m as the expectation of the linear price returns, then a simple recombining tree using +10% and -5% jumps will quickly show you that E(P_t) = [ 1 + (10% + -5%)/2]^t = (1+2.5%)^t = (1+m)^t != exp(m*t).

Emlyn

Hi Emlyn,

There is no portfolio optimization involved in this discussion. It is purely a question of whether it is reasonable to compute expected wealth vs computing expected log returns. The authors (and I) demonstrated that computing expected log returns is the only reasonable way, for a single investor in a single strategy. If you want to know the average wealth of 100,000 investors, or 100,000 strategies, it is reasonable to compute expected wealth.

Also, your demonstration seems to corroborate instead of refuting my calculation. Your binomial tree formula only holds when t is small, since you have to update your returns frequently in discrete time. For small t, exp(mt)=exp(m)^t~(1+m)^t just as your wrote.

Ernie

Hi Ernie,

My mention of portfolio optimization was because this is one application where you explicitly require expected wealth (or at least expected linear return) estimates. Mean-variance is the classic example. It does not hold true if you use expected log returns.

The binomial tree holds in generality as step size is fully general. I understand your point about short-term linear returns being a first-order approximation of log returns (from Taylor series expansion of exponential funciton) but that is a separate issue. My point is that expected PRICE is not equal to P_0 * exp(t*m) because you have defined m as the expectation of the linear returns. Expected price is equivalent to expected linear return because your time is known and assuming log-normality as per your discussion above, you have the relationship Exp(LinRet) = exp(t * (mu + 1/2*sigma^2)) - 1. The mu and sigma here are the drift and volatility parameters input directly into the geometric brownian motion and thus relate to the log-returns. Above, you have incorrectly stated that sigma^2 = t * s^2 (where s is vol of linear returns, not log returns) which is where the final error in the expected price occurs. Dropping time for now, s^2 = [exp(sigma^2) - 1] * [exp(2*mu + sigma^2]. Hope this clarifies.

Emlyn

Hi Emlyn,

Actually, I have shown that Mean-Variance optimization is equivalent to Kelly optimization in the 1-period case, except for the overall optimal leverage employed which mean-variance optimization doesn't provide. (See http://epchan.blogspot.com/2014/08/kelly-vs-markowitz-portfolio.html). However, Mean Variance optimization does not optimize multi-period growth, hence there is no need to compute expected wealth.

The derivation of E[P(t)] is not as simple as you outlined. It isn't based on expectation of a linear return, extrapolated to t. My derivation was in the main text, so I won't repeat it here. The key formula to use is E[exp(x)]=exp(μ+σ^2 /2). Also, sigma^2 = s^2 is correct. Though the mean net return is not the same as the mean log return, the standard deviation of the net return IS the same as the standard deviation of the log return (to first order approximation.) I multiplied that by t in t * s^2 because this is a Gaussian Brownian Motion, hence the standard deviation at time t scales linearly with t. It doesn't matter whether you use sigma or s in this formula since they are the same.

Ernie

Hi Ernie,

Thanks for the reply and very nice post on the link between the two frameworks. Remember though, MV framework was originally derived to optimise expected wealth, granted in a one-period setting but the main trick is to make sure your inputs for the horizon of choice are correct. This was John's initial point on distribution projection and subsequent conversion.

On the second point, I'm afraid I just don't agree with you. There is an I consistency in the mathematics. But I don't think we'll get any further going down this road so I'll leave it here. I urge you to read Meucci's 2005 textbook where he deals with exactly this issue. Thanks for the interaction though and for providing a very good blog.

Emlyn

Thanks for the discussion, Emlyn!

Will take a look at Meucci's book when I have a chance.

Cheers,

Ernie

Ernie, wealth is a non-stationary series (even in the wide-sense), so I don't think there is a meaningful way to define an ensemble average. As you show, for fixed t, mu_t= E[S(t)] = mu*t; this makes sense to me as the expected value of the terminal wealth with expectation taken over investors. Specifically, if you consider the two dimensional S(i, t), where i ranges over stocks/investors and t over time, then A_i(t) = average of S(1,t), ..,S_(n, t) is a random variable sequence that converges in probability to mu_t.

I think the absence of s in the expression for expectation of S(t) merely reflects the difference between expectations of linear and exponential returns.

Hi Ramesh,

I don't agree with your assertion that one cannot define an ensemble average for a non-stationary series. Students of probability have computed the variance of a random walk for centuries, and it is equal to D*t, where D=diffusion coefficient. A "normal" random walk isn't stationary just as a geometric random isn't, but the ensemble average of the absolute squared deviation is well-defined, given above.

But I do agree that expectations of linear and exponential returns differ, and wealth is an expectation of the exponential of the sum of log returns. The mean (or sum of) "arithmetic" returns isn't useful unless we rebalance the portfolio at the end of every period. If we do rebalance, then the wealth becomes an ergodic quantity and ensemble average will equal time average.

Ernie

Hi Ernie,

I'm an avid reader of your blog and recall you managed a forex fund.

I was wondering if you could shed a little light on what infrastructure you use for your forex trading. I think I read that you use Interactive Brokers. Is this correct, and if so, are you co-located and what are the fill rates like?

Thanks!

James

Hi James,

Yes, we do run a fund and managed accounts (qtscm.com) that trades FX. For retail clients, we trade mainly at Interactive Brokers, where we find the bid-ask spread is excellent.

For colocation, I recommend https://www.speedytradingservers.com/.

May I ask how you define "fill rate"?

Ernie

I would define fill rate as #filled orders divided by #execution attempts. For example, if I see EURUSD as 1.0871-1.0872 and I attempt to hit the bid using an IOC order, I may or may not get filled. The bid may change or I could be "last look" rejected. Presumably I won't see the latter with IB so more concerned about the rejections due to latency.

Curious as to what fill rate you're seeing.

Hi James,

Ah ok - you are taking liquidity with IOC, so you are right to be concerned about last look. We only run market-making strategies on FX, using LMT orders, so we don't have issue with last look nor do we measure fill rate. In fact, I don't even know if IB allows their FX market makers to employ last look!

Ernie

I see. Thanks Ernie.

Hi Ernie,

May I ask what you trade in your "Commodity Pool"?

What kind of strategies do you use?

Many thanks.

Hi,

We trade mainly FX mean reversion and Futures momentum strategy in our pool. There are a number of other strategies as well that have lower allocations.

Ernie

Hi Ernie,

Thank you for quick response.

For Futures momentum strategy in your pool, what futures do you trade?

commodities or stocks index futures? Many thanks.

Hi,

Mainly E-minis, but small allocation to agricultural and energy futures.

Ernie

Hi Ernie,

Is this something can be explained by the Jensen's inequality?

thanks

Kin Wa

Hi Kin Wa,

Jensen inequality can certainly explain why E[exp(x)] !=exp(E[x]), but it doesn't give the whole picture.

Ernie

Hi Ernie,

Thanks for a great blog and great books. I have a PCA question.

As you know, the signs of eigenvectors are not unique. When using for example pcacov in Matlab, the sign can change from time to time.

When backtesting, I need to make sure the eigenvectors are pointing in the same direction for each time t. Do you have a quick code trick for how to guarantee this?

Thanks

Hi,

Thanks for your kind words.

When you said "signs of eigenvectors are not unique" did you mean the sign of their eigenvalues, or the signs of the components of an eigenvector?

If the former, that's not possible, since the eigenvalues of a covariance matrix are all >= 0 (the covariance matrix is positive semidefinite.)

If the latter, I don't see what's wrong with that. It just means that the hedge portfolio that corresponds to each eigenvector have both long and short positions. You cannot simply ignore the ones with negative values, as that would mean it is no longer an eigenvector of the covariance matrix.

Ernie

Hi Ernie,

I mean the signs of the eigenvector. Let's say we have n=4 time series, then for example the first eigenvector may look like [.54 .23 -.55 .90] or [-.54 -.23 .55 -.90]. If one does a PCA on a set of economic time series one might want to preserve the economic interpretation of the first PCA, i.e. the signs matter. So when doing a PCA walk forward, it would be nice to guarantee that the vector "points in the same direction" for each time t. So my question was just whether you have come across a quick code trick for this. One suggestion I've come across is, at each time t, find the largest (in absolute terms) element in EIG(:,1) and make sure this value is positive or negative, depending on which direction one want to rotate the first eigenvector. But I'm not sure this is theoretically correct.

Hi,

Certainly if ev is an eigenvector, -ev is one too. So you are free to multiply all components by -1. Does that work for you?

Ernie

Hi Ernie,

Where can we get historical option data (put, call, all strikes and expiry)?

Thanks.

Hi,

ivolatility.com, quandl.com, optionmetrics.com.

Ernie

Very interesting, log-utility just happens to maximize the time average which is what we really should be focusing on. But what about the case when we have a fixed time horizon, then taking the limit as T -> infinity does not make sense and we don't get rid of the stochastic component in the growth rate?

Hi Emil,

The compound growth rate at any finite T asymptotically approaches the formula given in this post (which is exact at T-> infinity). This is true for most theoretical derivations involving time averages in finance.

Ernie

Based on all your books it seems that Kelly is the ideal allocation method for all trading strategies. Just to confirm, the Kelly formula has flaws when in use, I know you covered the half Kelly but beyond that, have you encountered period where you didn't use Kelly at all and periods where you used Kelly and it failed? Keep in mind I am not referring to the periods where you've recommended QT inventors to opt out of strategies based on regime changes or unknowns. I am aware that this specific does focus on Kelly so we can call it a side note.

Hi Ever,

Yes, as I have shown in another blog post (https://epchan.blogspot.ca/2014/08/kelly-vs-markowitz-portfolio.html), Kelly is essentially the same as the widely adopted Mean Variance optimization method, except that Kelly also suggests an overall leverage of a portfolio.

It is unclear what it means by periods where Kelly "fails". Fails meaning the portfolio is not profitable? But that has nothing to do with Kelly itself. Fails meaning the portfolio underperforms an equal weighted one? Certainly! But again, Kelly does not promise it always outperforms other allocation methods in all periods. It only promises that in the long run, it generates maximum growth rate. The long run, of course, can be very long.

Ernie

Hi Ernie,

I've listened to a couple of your talks online and had a few questions after reading your blog. My background is fundamental investing at asset management and hedge funds and only more recently have I tried to pick up machine learning.

In one of lectures, you referenced SVM and random forest as two algorithms that work well for stock prediction. However, I wasn't clear if this is a classification or regression problem. In other words, what is more appropriate, to predict the actual stock price, the actual return or if the stock with be up or down? I've seen a lot of academic papers and I've seen a mix of what people do but I've seen no explanation as to the pros and cons of use one over the other. Perhaps you could enlighten me here.

My second question is regarding one of your comments about insufficient data. Are you saying that daily price data for your y label is not enough and you need minute by minute data?

And my third question has to do with one of your slides where you say feature rich data sets are a curse. I've read about this before so I'm not questioning what you're saying. However, I remember listening to a talk given my James Simons, and he said that once they find a good predictor they just leave it in in case it comes back. If that's the case, then over time, they would like have a substantial number of features in the model. Are they doing something that perhaps is different?

My last question is about your recent book or perhaps you have a suggestion of other resources. I'm trying to find a resource that clearly shows right way to clean and transform feature data and cross validate in order to use it for prediction. I've not seen a source that accurately describes best practices.

Thank you,

Alex

Hi Ernie,

I've listened to a couple of your talks online and had a few questions after reading your blog. My background is fundamental investing at asset management and hedge funds and only more recently have I tried to pick up machine learning.

In one of lectures, you referenced SVM and random forest as two algorithms that work well for stock prediction. However, I wasn't clear if this is a classification or regression problem. In other words, what is more appropriate, to predict the actual stock price, the actual return or if the stock with be up or down? I've seen a lot of academic papers and I've seen a mix of what people do but I've seen no explanation as to the pros and cons of use one over the other. Perhaps you could enlighten me here.

My second question is regarding one of your comments about insufficient data. Are you saying that daily price data for your y label is not enough and you need minute by minute data?

And my third question has to do with one of your slides where you say feature rich data sets are a curse. I've read about this before so I'm not questioning what you're saying. However, I remember listening to a talk given my James Simons, and he said that once they find a good predictor they just leave it in in case it comes back. If that's the case, then over time, they would like have a substantial number of features in the model. Are they doing something that perhaps is different?

My last question is about your recent book or perhaps you have a suggestion of other resources. I'm trying to find a resource that clearly shows right way to clean and transform feature data and cross validate in order to use it for prediction. I've not seen a source that accurately describes best practices.

Thank you,

Alex

Hi Alex,

1) SVM is a classification algorithm, can only be used to predict direction of price moves.

2) Random forest can be applied to either classification or regression trees. The latter can be used to predict magnitude as well as direction of moves.

3) There is no reason to believe one is superior to the other with respect to various out-of-sample (OOS) performance measures.

4) Daily price is good enough if you aggregate >1,000 stocks and build the same model for all.

5) If a feature was found to work OOS, then of course one should retain it in library. There aren't that many features that work OOS, so no worries about "too many". We can only have too many unproven features - that lead to data snooping bias.

6) My book references many other books and articles. In terms of AI in trading, I recommend following @carlcarrie on Twitter, which posts numerous relevant articles daily.

Ernie

Hi Ernie,

Do you use log return during backtesting?

Thanks.

Hi,

Yes, for signal generation, not for performance evaluation. The latter need to conform to the industry norm - which uses net return.

Ernie

Hi Ernie,

What is the difference between log return and net return?

Thanks.

Hi,

Log return = log(price(t))-log(price(t-1))

Net return = price(t)/price(t-1) -1

Ernie

Hi Ernie,

Thank you for quick response!

When we evaluate performance via profit&loss(pl) and cumulative pl, it is straightforward to include transaction costs.

When we evaluate performance via net return and cumulative return, how do we include transaction costs?

Many thanks.

Hi,

The typical way is to subtract some fixed percentage from each trade's returns. My first book Quantitative Trading has detailed examples on this.

Ernie

Hi Ernie,

When we do backtesting, shall we use cumulative pl or cumulative return to evaluate performance? What are their strength or weakness?

Thanks.

Hi,

Cumulative returns are more meaningful.P&L depends on the size of your orders and is arbitrary.

Ernie

Hi Ernie,

I was reading your latest book Machine Trading. In your book you talk about cross validation and in particular K-fold. My question is I've also read elsewhere that because of stock prices are a time series problem, K-fold does not work but rather roll forward cross validation is what you need to use. Any thoughts on this?

Best,

Alex

Hi Alex,

It depends what you use as input features for the problem.

If your input features involve L number of days, then it is best to avoid any overlap of the training set and the validation set by picking data at least L days apart to form each set.

Other than that, most examples I discussed are time series problem. Why do you think cross validation won't work on these problems?

Ernie

Hi Ernie,

This is was what I was referring to.

https://www.youtube.com/watch?v=56Nq6cs2-gc

Basically, my understanding is time series data, normal K-fold cross validation does not work because training data should always come before testing data. Wondering if you agree with this?

Best,

Alex

Hi Alex,

I am not too concerned about the "peeking ahead" effect during cross-validation. The important point is that our input features occur prior in time to the predicted outcome, for each data point. If the validation data set is consistently prior in time to the training data set, that would be a concern. But generally speaking, the time order will be randomized, so I don't believe there can be much bias there.

Of course, I am not objecting the roll forward method described in the video - I just don't think it will make a big difference in the out-of-sample results. However, if you want to be absolutely safe, you can certainly adopt that method.

Ernie

Great blog, great books.

Except as an approximation for a runaway stock, we can't use the unrestricted geometrical random walk model for large t. Once you consider absorbance (delistment and ruin), you will get your dependence on risk. Your derivation is correct for the solvent subset of the ensemble. This is different from the possible issue with non-ergodicity; thanks for the paper.

Hi Ernie,

Just a follow up on the cross validation question. I went ahead and ignored it for now and did a random shuffle during cross validation.

However, I have seen some blogs/sites that show people also taking the log returns to detrend time series data. My question is if I'm using a multivariate dataset (not just stock prices), do I need to do this?

The reason I ask is I used random forest on a set of features and consistently got around 60% for precision, recall and f1. However, when I didn't shuffle the data I got much lower scores. I also tried to detrend my data by taking the log returns and got even worse scores. Just wondering what you think.

Best,

Alex

Hi Alex,

If your input feature is not prices, there is no reason to detrend it. Instead, you can consider using its "Zscore", using a moving lookback period.

Ernie

Hi Ernie,

I have a question about my model that I can't seem to figure out and I'm wondering if you can easily tell what I did. I have a list of 10 features (some price, some technical, some fundamental features). Basically many of the features I use to look at as a fundamental analyst.

I ran those features through a random forest classification algo (I'm using Scikit-learn). I first predict next day stock price and randomly split training vs. testing. When I run it through the model I get no signal (accuracy, recall, precision etc. ~50%).

Then I try to things: First, I extend the prediction from next day to 2, 3, 4, 5, 10, 50, etc. And my accuracy jumps up dramatically to the point where I know it can't be correct. The further I extend it out the higher it goes - 90%. Clearly this is wrong or else someone much smarter would have figured this out.

Second, I try to correct this by not randomly splitting my data. Instead I train on the first 60% and test on the last 40%. My results go back to 50% or lower.

My question is, what's going on in the model when I randomly train/test split my data vs. in the first example vs. the second example where I do not?

My other question is as I've benign reading more - the differences in how ML models find a prediction seems to be categorized in parametric vs. non-parametric models. Has there been any research on whether one type of model is better suited for the stock market? In my mind, it seems that parametric might be suited for more long-term predictions (i.e. like fundamental analyst are trying to do), and non-parametric is more suited for short-term predictions (because of the noise that often fundamental analyst get wrong). Would like to hear your thoughts? I'm also wondering whether most people in practice use a regression based models or instance based models?

Best,

Alex

Hi Alex,

1) I don't know why your accuracy changes so dramatically if you use random splitting. It seems that part of your future returns have been used as input feature.

2) To me, non-parametric models simply mean we have assumed no known simple distribution function to describe the data. As a result, there are actually more parameters because we need an empirical distribution to predict the future! I am not sure that one is better for long term vs short term. I generally think parametric model is less subject to overfitting and therefore better for finance models.

Ernie

Post a Comment