Thursday, May 04, 2017

Paradox Resolved: Why Risk Decreases Expected Log Return But Not Expected Wealth

I have been troubled by the following paradox in the past few years. If a stock's log returns (i.e. change in log price per unit time) follow a Gaussian distribution, and if its net returns (i.e. percent change in price per unit time) have mean m and standard distribution s, then many finance students know that the mean log returns is m-s2 /2That is, the compound growth rate of the stock is m-s2 /2. This can be derived by applying Ito's lemma to the log price process (see e.g. Hull), and is intuitively satisfying because it is saying that the expected compound growth rate is lowered by risk ("volatility"). OK, we get that - risk is bad for the growth of our wealth.

However, let's find out what the expected price of the stock is at time t. If we invest our entire wealth in one stock, that is really asking what our expected wealth is at time t. To compute that, it is easier to first find out what the expected log price of the stock is at time t, because that is just the expected value of the sum of the log returns in each time interval, and is of course equal to the sum of the expected value of the log returns when we assume a geometric random walk. So the expected value of the log price at time t is just t * (m-s2 /2). But what is the expected price (not log price) at time t? It isn't correct to say exp(t * (m-s2 /2)), because the expected value of the exponential function of a normal variable is not equal to the exponential function of the expected value of that normal variable, or E[exp(x)] !=exp(E[x]). Instead, E[exp(x)]=exp(μ+σ2 /2) where μ and σ are the mean and standard deviation of the normal variable (see Ruppert). In our case, the normal variable is the log price, and thus μ=t * (m-s2 /2), and σ2=t *s. Hence the expected price at time t is exp(t*m). Note that it doesn't involve the volatility s. Risk doesn't affect the expected wealth at time t. But we just argued in the previous paragraph that the expected compound growth rate is lowered by risk. What gives?

This brings us to a famous recent paper by Peters and Gell-Mann. (For the physicists among you, this is the Gell-Mann who won the Nobel prize in physics for inventing quarks, the fundamental building blocks of matter.) This happens to be the most read paper in the Chaos Journal in 2016, and basically demolishes the use of the utility function in economics, in agreement with John Kelly, Ed Thorp, Claude Shannon, Nassim Taleb, etc., and against the entire academic economics profession. (See Fortune's Formula for a history of this controversy. And just to be clear which side I am on: I hate utility functions.) To make a long story short, the error we have made in computing the expected stock price (or wealth) at time t, is that the expectation value there is ill-defined. It is ill-defined because wealth is not an "ergodic" variable: its finite-time average is not equal to its "ensemble average". Finite-time average of wealth is what a specific investor would experience up to time t, for large t. Ensemble average is the average wealth of many millions of similar investors up to time t. Naturally, since we are just one specific investor, the finite-time average is much more relevant to us. What we have computed above, unfortunately, is the ensemble average.  Peters and Gell-Mann exhort us (and other economists) to only compute expected values of ergodic variables, and log return (as opposed to log price) is happily an ergodic variable. Hence our average log return is computed correctly - risk is bad. Paradox resolved!

===

My Upcoming Workshops

May 13 and 20: Artificial Intelligence Techniques for Traders

I will discuss in details AI techniques as applied to trading strategies, with plenty of in-class exercises, and with emphasis on nuances and pitfalls of these techniques.

June 5-9: London in-person workshops

I will teach 3 courses there: Quantitative Momentum, Algorithmic Options Strategies, and Intraday Trading and Market Microstructure.

(The London courses may qualify for continuing education credits for CFA Institute members.)


74 comments:

  1. I don't grok this paper.

    On the surface it feels like it's mainly addressing 100 year old arguments. There are very few references to recent economics/finance books or articles.

    ReplyDelete
  2. So if log price is an ergodic variable, then what's wrong with projecting that into the future and converting it to a price?

    ReplyDelete
  3. John,
    No, you misunderstood. The paper states that log price (or price) is NOT an ergodic variable. Only changes in log prices is. Change in log price is log return.

    Ernie

    ReplyDelete
  4. Sorry, I had typed it up wrong. I had meant the log return. So for instance, what's the issue with saying

    X_t ~ N(mu, sigma)
    where X_t is the log return. Projecting this out n periods gives
    X_t_n ~ N(mu*n, sigma*sqrt(n))
    then converted to price you have
    Y_t_n ~ exp(X_t_n)

    So your distribution at the horizon is log normal.

    Then you express utility on the distribution of the price at horizon.

    ReplyDelete
  5. John,
    What you have done seems exactly what I did in my post. The result is that the expected price or log price depends only on average 1-period net (not log) return, but not the standard deviation of the net return. But the calculation involves an ensemble average E[exp(x)]=exp(μ+σ2 /2), not a time average. If x is not ergodic, the ensemble average isn't equal to the time average, and I am unaware of an analytical formula for a time average.
    Ernie

    ReplyDelete
  6. I suppose part of my confusion is on the distinction between ensemble average and time average. I looked up the difference here
    http://www.nii.ac.jp/qis/first-quantum/forStudents/lecture/pdf/noise/chapter1.pdf

    I still don't know why I should be convinced...

    First, the mu in our above comments are in the frequentist sense calculated as time average. For one individual stock, there is no ensemble. There is only one realization in the past.

    Second, I'm not really even sure why the difference between time averages and ensemble averages matters. I'm with you completely on the importance of erdogicity, but wealth is supposed to grow over time. Wouldn't it be a bad thing if its mean was unchanged over time?


    ReplyDelete
  7. John,

    mu is the average 1-period log return. As log return is ergodic, there is no difference between time or ensemble average.

    Wealth grows over time whether you compute the time or ensemble average. But the difference is that the formula for average wealth displayed in my article is independent of risk. But that average wealth applies only if you buy 100,000 stocks, each have same mean return and standard deviation of returns, and you are interested in the portfolio's return. If you own only 1 stock, then your time-averaged wealth will be reduced by the standard deviation, but I did not display the formula there. What I displayed is the averaged growth rate of wealth, which clearly shows it is reduced by s^2/2. It is not a matter of whether the mean changes over time: it doesn't.

    Ernie

    Ernie

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete
  9. Hi

    I'm with John on this. It's a non-issue. Keep your estimation separate from your portfolio optimization/analysis and once you've projected your log returns to the correct horizon do the necessary conversions thereafter.

    Also, I think your formula for expected price is incorrect: E(P_t) = exp(t*m). If you have defined m as the expectation of the linear price returns, then a simple recombining tree using +10% and -5% jumps will quickly show you that E(P_t) = [ 1 + (10% + -5%)/2]^t = (1+2.5%)^t = (1+m)^t != exp(m*t).

    Emlyn

    ReplyDelete
  10. Hi Emlyn,
    There is no portfolio optimization involved in this discussion. It is purely a question of whether it is reasonable to compute expected wealth vs computing expected log returns. The authors (and I) demonstrated that computing expected log returns is the only reasonable way, for a single investor in a single strategy. If you want to know the average wealth of 100,000 investors, or 100,000 strategies, it is reasonable to compute expected wealth.

    Also, your demonstration seems to corroborate instead of refuting my calculation. Your binomial tree formula only holds when t is small, since you have to update your returns frequently in discrete time. For small t, exp(mt)=exp(m)^t~(1+m)^t just as your wrote.

    Ernie

    ReplyDelete
  11. Hi Ernie,

    My mention of portfolio optimization was because this is one application where you explicitly require expected wealth (or at least expected linear return) estimates. Mean-variance is the classic example. It does not hold true if you use expected log returns.

    The binomial tree holds in generality as step size is fully general. I understand your point about short-term linear returns being a first-order approximation of log returns (from Taylor series expansion of exponential funciton) but that is a separate issue. My point is that expected PRICE is not equal to P_0 * exp(t*m) because you have defined m as the expectation of the linear returns. Expected price is equivalent to expected linear return because your time is known and assuming log-normality as per your discussion above, you have the relationship Exp(LinRet) = exp(t * (mu + 1/2*sigma^2)) - 1. The mu and sigma here are the drift and volatility parameters input directly into the geometric brownian motion and thus relate to the log-returns. Above, you have incorrectly stated that sigma^2 = t * s^2 (where s is vol of linear returns, not log returns) which is where the final error in the expected price occurs. Dropping time for now, s^2 = [exp(sigma^2) - 1] * [exp(2*mu + sigma^2]. Hope this clarifies.

    Emlyn

    ReplyDelete
  12. Hi Emlyn,
    Actually, I have shown that Mean-Variance optimization is equivalent to Kelly optimization in the 1-period case, except for the overall optimal leverage employed which mean-variance optimization doesn't provide. (See http://epchan.blogspot.com/2014/08/kelly-vs-markowitz-portfolio.html). However, Mean Variance optimization does not optimize multi-period growth, hence there is no need to compute expected wealth.

    The derivation of E[P(t)] is not as simple as you outlined. It isn't based on expectation of a linear return, extrapolated to t. My derivation was in the main text, so I won't repeat it here. The key formula to use is E[exp(x)]=exp(μ+σ^2 /2). Also, sigma^2 = s^2 is correct. Though the mean net return is not the same as the mean log return, the standard deviation of the net return IS the same as the standard deviation of the log return (to first order approximation.) I multiplied that by t in t * s^2 because this is a Gaussian Brownian Motion, hence the standard deviation at time t scales linearly with t. It doesn't matter whether you use sigma or s in this formula since they are the same.

    Ernie

    ReplyDelete
  13. Hi Ernie,

    Thanks for the reply and very nice post on the link between the two frameworks. Remember though, MV framework was originally derived to optimise expected wealth, granted in a one-period setting but the main trick is to make sure your inputs for the horizon of choice are correct. This was John's initial point on distribution projection and subsequent conversion.

    On the second point, I'm afraid I just don't agree with you. There is an I consistency in the mathematics. But I don't think we'll get any further going down this road so I'll leave it here. I urge you to read Meucci's 2005 textbook where he deals with exactly this issue. Thanks for the interaction though and for providing a very good blog.
    Emlyn

    ReplyDelete
  14. Thanks for the discussion, Emlyn!

    Will take a look at Meucci's book when I have a chance.

    Cheers,
    Ernie

    ReplyDelete
  15. Ernie, wealth is a non-stationary series (even in the wide-sense), so I don't think there is a meaningful way to define an ensemble average. As you show, for fixed t, mu_t= E[S(t)] = mu*t; this makes sense to me as the expected value of the terminal wealth with expectation taken over investors. Specifically, if you consider the two dimensional S(i, t), where i ranges over stocks/investors and t over time, then A_i(t) = average of S(1,t), ..,S_(n, t) is a random variable sequence that converges in probability to mu_t.

    I think the absence of s in the expression for expectation of S(t) merely reflects the difference between expectations of linear and exponential returns.

    ReplyDelete
  16. Hi Ramesh,
    I don't agree with your assertion that one cannot define an ensemble average for a non-stationary series. Students of probability have computed the variance of a random walk for centuries, and it is equal to D*t, where D=diffusion coefficient. A "normal" random walk isn't stationary just as a geometric random isn't, but the ensemble average of the absolute squared deviation is well-defined, given above.

    But I do agree that expectations of linear and exponential returns differ, and wealth is an expectation of the exponential of the sum of log returns. The mean (or sum of) "arithmetic" returns isn't useful unless we rebalance the portfolio at the end of every period. If we do rebalance, then the wealth becomes an ergodic quantity and ensemble average will equal time average.

    Ernie

    ReplyDelete
  17. Hi Ernie,

    I'm an avid reader of your blog and recall you managed a forex fund.

    I was wondering if you could shed a little light on what infrastructure you use for your forex trading. I think I read that you use Interactive Brokers. Is this correct, and if so, are you co-located and what are the fill rates like?

    Thanks!

    James

    ReplyDelete
  18. Hi James,
    Yes, we do run a fund and managed accounts (qtscm.com) that trades FX. For retail clients, we trade mainly at Interactive Brokers, where we find the bid-ask spread is excellent.

    For colocation, I recommend https://www.speedytradingservers.com/.

    May I ask how you define "fill rate"?

    Ernie

    ReplyDelete
  19. I would define fill rate as #filled orders divided by #execution attempts. For example, if I see EURUSD as 1.0871-1.0872 and I attempt to hit the bid using an IOC order, I may or may not get filled. The bid may change or I could be "last look" rejected. Presumably I won't see the latter with IB so more concerned about the rejections due to latency.

    Curious as to what fill rate you're seeing.

    ReplyDelete
  20. Hi James,
    Ah ok - you are taking liquidity with IOC, so you are right to be concerned about last look. We only run market-making strategies on FX, using LMT orders, so we don't have issue with last look nor do we measure fill rate. In fact, I don't even know if IB allows their FX market makers to employ last look!
    Ernie

    ReplyDelete
  21. Hi Ernie,

    May I ask what you trade in your "Commodity Pool"?
    What kind of strategies do you use?

    Many thanks.

    ReplyDelete
  22. Hi,
    We trade mainly FX mean reversion and Futures momentum strategy in our pool. There are a number of other strategies as well that have lower allocations.
    Ernie

    ReplyDelete
  23. Hi Ernie,

    Thank you for quick response.

    For Futures momentum strategy in your pool, what futures do you trade?
    commodities or stocks index futures? Many thanks.

    ReplyDelete
  24. Hi,
    Mainly E-minis, but small allocation to agricultural and energy futures.
    Ernie

    ReplyDelete
  25. Hi Ernie,

    Is this something can be explained by the Jensen's inequality?

    thanks

    Kin Wa

    ReplyDelete
  26. Hi Kin Wa,
    Jensen inequality can certainly explain why E[exp(x)] !=exp(E[x]), but it doesn't give the whole picture.
    Ernie

    ReplyDelete
  27. Hi Ernie,

    Thanks for a great blog and great books. I have a PCA question.

    As you know, the signs of eigenvectors are not unique. When using for example pcacov in Matlab, the sign can change from time to time.

    When backtesting, I need to make sure the eigenvectors are pointing in the same direction for each time t. Do you have a quick code trick for how to guarantee this?

    Thanks

    ReplyDelete
  28. Hi,
    Thanks for your kind words.

    When you said "signs of eigenvectors are not unique" did you mean the sign of their eigenvalues, or the signs of the components of an eigenvector?

    If the former, that's not possible, since the eigenvalues of a covariance matrix are all >= 0 (the covariance matrix is positive semidefinite.)

    If the latter, I don't see what's wrong with that. It just means that the hedge portfolio that corresponds to each eigenvector have both long and short positions. You cannot simply ignore the ones with negative values, as that would mean it is no longer an eigenvector of the covariance matrix.

    Ernie

    ReplyDelete
  29. Hi Ernie,

    I mean the signs of the eigenvector. Let's say we have n=4 time series, then for example the first eigenvector may look like [.54 .23 -.55 .90] or [-.54 -.23 .55 -.90]. If one does a PCA on a set of economic time series one might want to preserve the economic interpretation of the first PCA, i.e. the signs matter. So when doing a PCA walk forward, it would be nice to guarantee that the vector "points in the same direction" for each time t. So my question was just whether you have come across a quick code trick for this. One suggestion I've come across is, at each time t, find the largest (in absolute terms) element in EIG(:,1) and make sure this value is positive or negative, depending on which direction one want to rotate the first eigenvector. But I'm not sure this is theoretically correct.

    ReplyDelete
  30. Hi,
    Certainly if ev is an eigenvector, -ev is one too. So you are free to multiply all components by -1. Does that work for you?
    Ernie

    ReplyDelete
  31. Hi Ernie,

    Where can we get historical option data (put, call, all strikes and expiry)?

    Thanks.

    ReplyDelete
  32. Hi,
    ivolatility.com, quandl.com, optionmetrics.com.
    Ernie

    ReplyDelete
  33. Very interesting, log-utility just happens to maximize the time average which is what we really should be focusing on. But what about the case when we have a fixed time horizon, then taking the limit as T -> infinity does not make sense and we don't get rid of the stochastic component in the growth rate?

    ReplyDelete
  34. Hi Emil,

    The compound growth rate at any finite T asymptotically approaches the formula given in this post (which is exact at T-> infinity). This is true for most theoretical derivations involving time averages in finance.

    Ernie

    ReplyDelete
  35. Based on all your books it seems that Kelly is the ideal allocation method for all trading strategies. Just to confirm, the Kelly formula has flaws when in use, I know you covered the half Kelly but beyond that, have you encountered period where you didn't use Kelly at all and periods where you used Kelly and it failed? Keep in mind I am not referring to the periods where you've recommended QT inventors to opt out of strategies based on regime changes or unknowns. I am aware that this specific does focus on Kelly so we can call it a side note.

    ReplyDelete
  36. Hi Ever,
    Yes, as I have shown in another blog post (https://epchan.blogspot.ca/2014/08/kelly-vs-markowitz-portfolio.html), Kelly is essentially the same as the widely adopted Mean Variance optimization method, except that Kelly also suggests an overall leverage of a portfolio.

    It is unclear what it means by periods where Kelly "fails". Fails meaning the portfolio is not profitable? But that has nothing to do with Kelly itself. Fails meaning the portfolio underperforms an equal weighted one? Certainly! But again, Kelly does not promise it always outperforms other allocation methods in all periods. It only promises that in the long run, it generates maximum growth rate. The long run, of course, can be very long.

    Ernie

    ReplyDelete
  37. Hi Ernie,

    I've listened to a couple of your talks online and had a few questions after reading your blog. My background is fundamental investing at asset management and hedge funds and only more recently have I tried to pick up machine learning.

    In one of lectures, you referenced SVM and random forest as two algorithms that work well for stock prediction. However, I wasn't clear if this is a classification or regression problem. In other words, what is more appropriate, to predict the actual stock price, the actual return or if the stock with be up or down? I've seen a lot of academic papers and I've seen a mix of what people do but I've seen no explanation as to the pros and cons of use one over the other. Perhaps you could enlighten me here.

    My second question is regarding one of your comments about insufficient data. Are you saying that daily price data for your y label is not enough and you need minute by minute data?

    And my third question has to do with one of your slides where you say feature rich data sets are a curse. I've read about this before so I'm not questioning what you're saying. However, I remember listening to a talk given my James Simons, and he said that once they find a good predictor they just leave it in in case it comes back. If that's the case, then over time, they would like have a substantial number of features in the model. Are they doing something that perhaps is different?

    My last question is about your recent book or perhaps you have a suggestion of other resources. I'm trying to find a resource that clearly shows right way to clean and transform feature data and cross validate in order to use it for prediction. I've not seen a source that accurately describes best practices.

    Thank you,

    Alex






    ReplyDelete
  38. Hi Ernie,

    I've listened to a couple of your talks online and had a few questions after reading your blog. My background is fundamental investing at asset management and hedge funds and only more recently have I tried to pick up machine learning.

    In one of lectures, you referenced SVM and random forest as two algorithms that work well for stock prediction. However, I wasn't clear if this is a classification or regression problem. In other words, what is more appropriate, to predict the actual stock price, the actual return or if the stock with be up or down? I've seen a lot of academic papers and I've seen a mix of what people do but I've seen no explanation as to the pros and cons of use one over the other. Perhaps you could enlighten me here.

    My second question is regarding one of your comments about insufficient data. Are you saying that daily price data for your y label is not enough and you need minute by minute data?

    And my third question has to do with one of your slides where you say feature rich data sets are a curse. I've read about this before so I'm not questioning what you're saying. However, I remember listening to a talk given my James Simons, and he said that once they find a good predictor they just leave it in in case it comes back. If that's the case, then over time, they would like have a substantial number of features in the model. Are they doing something that perhaps is different?

    My last question is about your recent book or perhaps you have a suggestion of other resources. I'm trying to find a resource that clearly shows right way to clean and transform feature data and cross validate in order to use it for prediction. I've not seen a source that accurately describes best practices.

    Thank you,

    Alex






    ReplyDelete
  39. Hi Alex,
    1) SVM is a classification algorithm, can only be used to predict direction of price moves.
    2) Random forest can be applied to either classification or regression trees. The latter can be used to predict magnitude as well as direction of moves.
    3) There is no reason to believe one is superior to the other with respect to various out-of-sample (OOS) performance measures.
    4) Daily price is good enough if you aggregate >1,000 stocks and build the same model for all.
    5) If a feature was found to work OOS, then of course one should retain it in library. There aren't that many features that work OOS, so no worries about "too many". We can only have too many unproven features - that lead to data snooping bias.
    6) My book references many other books and articles. In terms of AI in trading, I recommend following @carlcarrie on Twitter, which posts numerous relevant articles daily.

    Ernie

    ReplyDelete
  40. Hi Ernie,

    Do you use log return during backtesting?

    Thanks.

    ReplyDelete
  41. Hi,
    Yes, for signal generation, not for performance evaluation. The latter need to conform to the industry norm - which uses net return.
    Ernie

    ReplyDelete
  42. Hi Ernie,

    What is the difference between log return and net return?

    Thanks.

    ReplyDelete
  43. Hi,
    Log return = log(price(t))-log(price(t-1))

    Net return = price(t)/price(t-1) -1

    Ernie

    ReplyDelete
  44. Hi Ernie,

    Thank you for quick response!

    When we evaluate performance via profit&loss(pl) and cumulative pl, it is straightforward to include transaction costs.

    When we evaluate performance via net return and cumulative return, how do we include transaction costs?

    Many thanks.


    ReplyDelete
  45. Hi,
    The typical way is to subtract some fixed percentage from each trade's returns. My first book Quantitative Trading has detailed examples on this.
    Ernie

    ReplyDelete
  46. Hi Ernie,

    When we do backtesting, shall we use cumulative pl or cumulative return to evaluate performance? What are their strength or weakness?

    Thanks.

    ReplyDelete
  47. Hi,
    Cumulative returns are more meaningful.P&L depends on the size of your orders and is arbitrary.
    Ernie

    ReplyDelete
  48. Hi Ernie,

    I was reading your latest book Machine Trading. In your book you talk about cross validation and in particular K-fold. My question is I've also read elsewhere that because of stock prices are a time series problem, K-fold does not work but rather roll forward cross validation is what you need to use. Any thoughts on this?

    Best,

    Alex

    ReplyDelete
  49. Hi Alex,
    It depends what you use as input features for the problem.

    If your input features involve L number of days, then it is best to avoid any overlap of the training set and the validation set by picking data at least L days apart to form each set.

    Other than that, most examples I discussed are time series problem. Why do you think cross validation won't work on these problems?


    Ernie

    ReplyDelete
  50. Hi Ernie,

    This is was what I was referring to.

    https://www.youtube.com/watch?v=56Nq6cs2-gc

    Basically, my understanding is time series data, normal K-fold cross validation does not work because training data should always come before testing data. Wondering if you agree with this?

    Best,

    Alex

    ReplyDelete
  51. Hi Alex,
    I am not too concerned about the "peeking ahead" effect during cross-validation. The important point is that our input features occur prior in time to the predicted outcome, for each data point. If the validation data set is consistently prior in time to the training data set, that would be a concern. But generally speaking, the time order will be randomized, so I don't believe there can be much bias there.

    Of course, I am not objecting the roll forward method described in the video - I just don't think it will make a big difference in the out-of-sample results. However, if you want to be absolutely safe, you can certainly adopt that method.

    Ernie

    ReplyDelete
  52. Great blog, great books.

    Except as an approximation for a runaway stock, we can't use the unrestricted geometrical random walk model for large t. Once you consider absorbance (delistment and ruin), you will get your dependence on risk. Your derivation is correct for the solvent subset of the ensemble. This is different from the possible issue with non-ergodicity; thanks for the paper.

    ReplyDelete
  53. Hi Ernie,

    Just a follow up on the cross validation question. I went ahead and ignored it for now and did a random shuffle during cross validation.

    However, I have seen some blogs/sites that show people also taking the log returns to detrend time series data. My question is if I'm using a multivariate dataset (not just stock prices), do I need to do this?

    The reason I ask is I used random forest on a set of features and consistently got around 60% for precision, recall and f1. However, when I didn't shuffle the data I got much lower scores. I also tried to detrend my data by taking the log returns and got even worse scores. Just wondering what you think.

    Best,

    Alex

    ReplyDelete
  54. Hi Alex,
    If your input feature is not prices, there is no reason to detrend it. Instead, you can consider using its "Zscore", using a moving lookback period.
    Ernie

    ReplyDelete
  55. Hi Ernie,

    I have a question about my model that I can't seem to figure out and I'm wondering if you can easily tell what I did. I have a list of 10 features (some price, some technical, some fundamental features). Basically many of the features I use to look at as a fundamental analyst.

    I ran those features through a random forest classification algo (I'm using Scikit-learn). I first predict next day stock price and randomly split training vs. testing. When I run it through the model I get no signal (accuracy, recall, precision etc. ~50%).

    Then I try to things: First, I extend the prediction from next day to 2, 3, 4, 5, 10, 50, etc. And my accuracy jumps up dramatically to the point where I know it can't be correct. The further I extend it out the higher it goes - 90%. Clearly this is wrong or else someone much smarter would have figured this out.

    Second, I try to correct this by not randomly splitting my data. Instead I train on the first 60% and test on the last 40%. My results go back to 50% or lower.

    My question is, what's going on in the model when I randomly train/test split my data vs. in the first example vs. the second example where I do not?

    My other question is as I've benign reading more - the differences in how ML models find a prediction seems to be categorized in parametric vs. non-parametric models. Has there been any research on whether one type of model is better suited for the stock market? In my mind, it seems that parametric might be suited for more long-term predictions (i.e. like fundamental analyst are trying to do), and non-parametric is more suited for short-term predictions (because of the noise that often fundamental analyst get wrong). Would like to hear your thoughts? I'm also wondering whether most people in practice use a regression based models or instance based models?

    Best,

    Alex

    ReplyDelete
  56. Hi Alex,
    1) I don't know why your accuracy changes so dramatically if you use random splitting. It seems that part of your future returns have been used as input feature.
    2) To me, non-parametric models simply mean we have assumed no known simple distribution function to describe the data. As a result, there are actually more parameters because we need an empirical distribution to predict the future! I am not sure that one is better for long term vs short term. I generally think parametric model is less subject to overfitting and therefore better for finance models.
    Ernie

    ReplyDelete
  57. This statement"

    "It is ill-defined because wealth is not an "ergodic" variable: its finite-time average is not equal to its "ensemble average"."

    Is puzzling to me. Wealth in the sense of a variable depends on how one measures it. If you measure wealth by changes in value, or first differences, then, the ensemble average is the average change in value and multiplied by the sample size gives the value of final wealth. The time-average is the rate of growth of the wealth. No one ever said that the ensemble average and the time average should be the same if the former is determined based on first differences and the latter based on arithmetic returns. More importantly, the time-average does not even depict the path as it calculated from the product of returns and that result does not depend on order of these returns. It is a misconception that time-averages offer more information than ensemble averages. There are many possible paths resulting in the same final wealth that both have same ensemble and time-averages.

    Therefore, I still do not get the problem. Some wrote a paper claiming that time-averages are better in case of non-ergodic process but in my opinion this is a strawman. Non-ergodicity does not even come into play when calculating ensemble and time-averages the proper way. Although ergodic processes have equal ensemble and time-averages, this is a red herring when considering the differences. No one ever said that the two must be the same and by not being it does not mean that a time-average provides better information than an ensemble average. And no one ever said that volatility does not affect final wealth; it does but this is irrelevant when it comes to a distinction between ensemble and time-averages because the trick is that in the context of final wealth, ensemble averages must be based on first differences, not arithmetic returns, and time-averages on arithmetic returns. Of course, if someone uses arithmetic returns to calculate ensemble averages final wealth will not be easily deduced. But this is due to the wrong choice of calculation, not because there is some fundamental difference between ensemble and time-averages that make the latter a better choice.

    A minor detail: G = A - V/2 is a gross approximation.

    ReplyDelete
  58. Hi Michael,
    Wealth in this article and the references cited is defined as dollars accumulated. It isn't defined as the first difference (or change) in dollars accumulated. The latter is indeed ergodic, as I pointed out. The central argument in the cited references is that we should only compute averages of first differences, but not averages of wealth, since the former is independent of whether the average is based on time, or on ensembles, while the latter does. Since the usual mathematical apparatus can only compute ensemble averages, it can only be applied to changes in wealth than to wealth itself.

    G=A-V/2 is based on Ito's formula, which in turns assumes a log normal distribution of returns. It is a gross approximation to the same extent that Black-Scholes equation is a gross approximation of European option prices.

    Ernie

    ReplyDelete
  59. Thanks for the reply Ernie. I'm not here to argue with you as you know I have high respect for you and your work.

    "The central argument in the cited references is that we should only compute averages of first differences, but not averages of wealth, since the former is independent of whether the average is based on time, or on ensembles, while the latter does. "

    No one is supposed to average actual wealth, this is a strawman argument, i.e., an artificial image to set up to later attack.

    If we set W(t = 0) = 0 then all we left with are first differences to sum to get to final wealth .

    In comparing ensemble and time-averages, no one ever forced anyone to take actual wealth W(t) in the former and returns W(t)/W(t-1) - 1 in the later. My point is that if this is assumed, it is a fallacy just to then make noise. In ensemble average one must use only first differences in the first place. Otherwise, that which is computed makes no sense. If we take that nonsense and try to argue against ensemble averages, then it is only us who are wrong, not the ensemble averages. I never heard of using actual wealth in ensemble averages and I am surprised this has made so much noise and has bothered some academics so much.

    ReplyDelete
  60. Dear Ernie,

    First of all, congratulations for your great blog.

    I see an issue in the second paragraph where you compute E[P(t)], the expected value of price at time t. You write: " (...) In our case, the normal variable is the log price." — However, neither price P nor logP are normal variables. Both are functions of path and are not statistically independent. Only the increments ∆P/P and ∆logP may be assumed statistically independent (as far as the time axis is concerned) with a log-normal or normal pdf. We can compute their math-expected values E[∆P/P(t)] or E[∆logP(t)]. We can also compute their functions F(E[∆...]), in particular the price P(t) = P(E[∆logP(t)]). We call it "the expected value of P(t)" as if it were the math expectation of P, i.e. its pdf-weighted value while it is not. So, the way you get to the result, "the expected price at time t is exp(t*m)", seems to me questionable, because of the pdf issue.
    Also, to evaluate whether P is ergodic, we have to compute P(N) across N realisations and not P(t). However, in real life we got only one realisation of a stock. Taking different stocks? In general, they are correlated and do not form a statistically sensible ensemble. So, while we know that in practice the wealth field is strongly non-ergodic, it is not so easy to prove in the framework of the brownian motion.
    Peters constructs a non-ergodic GBM and gets <∆logP(N)> = mu. It looks rather arbitrary. One can imagine that ∆P/P's across the ensemble follow GBM, why not? We'll get then ergodic <∆logP(N)> = mu - s^2/2. I wonder also what sense we can give to mu across an ensemble? Further, Peters' findings rely on the log function properties. If we adopt the ABM, we get ergodic values. So confusing...
    Let me finally comment on the "-s^2/2" component. In my eyes, it is model-depending, an accidental by-product of using the log function. The GBM model is handy to produce formulae and to be taught to students, but has too many shortcomings when one comes to reality of markets. The market increments ∆logP follow a Levy-Pareto pdf with alpha ca. 1.4 as measured by Mandelbrot and by Mantegna & Stanley. The trouble is that for alpha's < 2 the integral for the 2d moment diverges, so that the variance is undefined. The results from Itô's lemma, which needs the 2d moment, then evaporate and so do "-s^2/2". A direct approach to evaluate the risk by measuring the WDD (i.e. Kolmogorov-Smirnov metrics) is by far more reliable, more clear, and is model-independent.

    ReplyDelete
  61. Michael,
    When I suggest taking first differences, I was indeed referring to log(W(t))-log(W(t-1)), not log(W(t))-log(W(0)). The assumption is that the time elapsed must be small - a differential time, which makes stochastic calculus applicable.

    Many people are interested in the total compounded growth in their portfolio, and not the average growth rate. The argument I cited is that the total compounded growth cannot be calculated using ensemble average, while the latter can. I don't agree that no one is interested in the former though. I therefore don't agree this is some kind of straw man argument.

    There is extensive discussion of this point in Taleb's new book (to be published, but excerpts available.)

    Ernie

    ReplyDelete
  62. Hi Almas,
    If the price process is presumed to be GBM, then log(P(t)) is a normal variable at given time t, since it is the sum of many independent normal variables. The sum of i.i.d normal variables are naturally normal as well.

    You can certainly simulate the different price paths a stock can take, and take the ensemble average. I don't see why you have to involve other stocks (with their non-zero correlations to the current one) to compute that. After all, we are computing expectation values, not the actual realized averages, so there is no issue of "in real life, the stock price only traversed on single path".

    Your argument about the -s^2/2 is outside the scope of this argument. We are just demonstrating the problem using the simplest GBM where this is valid. We are not discussing the limitations of GBM. Of course, real price paths do not follow GBM, and the -s^2/2 term is not accurate as you pointed out. It is indeed model-dependent, but it doesn't affect the argument we are focusing on.

    Ernie

    ReplyDelete
  63. (I slightly modified my reply of yesterday).
    Dear Ernie,
    Thanks for your reply. It makes me to see better the framework.
    Of course, the sum of normal variables will be normal, and ∆logP(t) = logP(t) - logP(0) is normally distributed. So will be logP(t) = ∆logP(t) + logP(0), if P(0) is assumed to be constant. I drop my concern of an unknown P(0), it unnecessary complicates the discussion.
    "You can certainly simulate the different price paths a stock can take (...)" — Once again, you are right in this framework. I was too much biased by real time constraints and empiric considerations.
    Still, even remaining in the GBM framework, I miss something. You write that "log return (as opposed to log price) is (...) an ergodic variable."
    The log price, in e.g. Peters and Klein GBM simulations turns out to be non-ergodic. Their P(0) is fixed, P(0) = 1 and logP(0) = 0. This gives the equality ∆logP(t) = logP(t), so that the log return ∆logP is not an ergodic variable neither.

    ReplyDelete
  64. Hi Almas,
    As my reply to Michael above indicates, when we speak of log returns, we are speaking of infinitesimal returns. I.e. log(P(t))-log(P(t-delta_t) is ergodic, but log(P(t))-log(P(0)) isn't.
    Ernie

    ReplyDelete
  65. Hi, Ernie,

    I got it. Thanks a lot for your patience.

    Almas

    ReplyDelete
  66. Hi Ernie,

    I see the importance of ergodicity. Does it mean that stuff like Merton's Problem? Or for that matter, some of Alvaro Cartea's stochastic optimization stuff related to limit order books?

    Curious what's your view on this.

    Thanks!

    ReplyDelete
  67. Hi Pix,
    I believe most work in mathematical finance such as Merton's or Cartea's work uses time series stochastic methods, and thus is kosher. They do not assume ergodicity.
    Ernie

    ReplyDelete
  68. I think there are some issues in this post. First of all, percent returns are also ergodic, not only log returns. Secondly Kelly criterion is defined using log utility (contrary to your comment below the article http://epchan.blogspot.com/2018/06/loss-aversion-is-not-behavioral-bias.html) implicitly by using the log returns. Using percent returns directly corresponds to a linear utility, which guarantees risk of ruin. In general there is no optimality without specifying a utility function. But yes we should be mindful of ergodic properties while taking averages of expressions.

    ReplyDelete
  69. The argument is not about whether log or net returns are ergodic. Returns are never ergodic, since they are stationary variables, whether computed as logs or percentages. The argument is that the price, or log price, is not ergodic, hence computing its time average is not the same as computing its ensemble average.

    Ernie

    ReplyDelete
  70. What do you mean "returns are never ergodic"? Of course they are. In the article above you also claim that log returns ergodic. So are percent returns.

    Quoting from the paper you referenced:
    "An ergodic observable for (Eq. 3) exists in the relativechanges in wealth, W(t+T∆t)/W(t), whose distribution does not depend on t."

    Yes log price is not ergodic. Therefore looking at Geometric Brownian Motion result

    E[log(S_t)] = log S_0 + (mu - s^2/2)*t

    is also not an explanation to why risk is bad, because this expression is also an ensemble average of a non-ergodic variable. The only thing it shows is that a non-linear transformation such as log changes the result of expectation due to Jensen's inequality.

    The above referenced paper from Chaos journal (and subsequent paper in Nature) actually received very harsh critic because even though the idea is quite old and well-known, the evidence provided is terribly wrong.

    After he claims that returns are ergodic he goes on saying:

    "Increments in the logarithm of W are now stationary and independent". Because he wants to show the mu-s^2/2 as evidence. But this is not an evidence at all. If he had used the %returns, as he claims to be ergodic as well, he doesn't end up with that expression.

    The correct explanation is that expectation has very bad summarization characteristics in skewed distributions. It makes much more sense to look at things like, median or mode to understand the general long term behavior of things.

    ReplyDelete
  71. This comment has been removed by the author.

    ReplyDelete
  72. I fear you have completely missed the point.

    We are considering only geometric random walk here. There is no skewness in the assumed returns distribution.

    Ergodicity here refers to the terminal wealth of a single investor, not an ensemble of investors. We are not interested in whether return is or is not ergodic.

    ReplyDelete
  73. Contrary to your claim above, percent return seems to be also an ergodic variable, not only the log return. Time average converges to ensemble average almost surely.

    ReplyDelete