tag:blogger.com,1999:blog-35364652.post8230965571724080508..comments2017-11-18T08:19:13.425-05:00Comments on Quantitative Trading: Paradox Resolved: Why Risk Decreases Expected Log Return But Not Expected WealthErnie Chanhttp://www.blogger.com/profile/02747099358519893177noreply@blogger.comBlogger66125tag:blogger.com,1999:blog-35364652.post-44810819284395981152017-09-27T10:57:08.590-04:002017-09-27T10:57:08.590-04:00Hi, Ernie,
I got it. Thanks a lot for your patie...Hi, Ernie,<br /> <br />I got it. Thanks a lot for your patience.<br /><br />AlmasAlmas Chalabaevnoreply@blogger.comtag:blogger.com,1999:blog-35364652.post-69946727435435993932017-09-27T09:30:37.350-04:002017-09-27T09:30:37.350-04:00Hi Almas,
As my reply to Michael above indicates, ...Hi Almas,<br />As my reply to Michael above indicates, when we speak of log returns, we are speaking of infinitesimal returns. I.e. log(P(t))-log(P(t-delta_t) is ergodic, but log(P(t))-log(P(0)) isn't.<br />ErnieErnie Chanhttps://www.blogger.com/profile/02747099358519893177noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-12078466984458225492017-09-27T05:22:52.536-04:002017-09-27T05:22:52.536-04:00(I slightly modified my reply of yesterday).
Dear ...(I slightly modified my reply of yesterday).<br />Dear Ernie,<br />Thanks for your reply. It makes me to see better the framework.<br />Of course, the sum of normal variables will be normal, and ∆logP(t) = logP(t) - logP(0) is normally distributed. So will be logP(t) = ∆logP(t) + logP(0), if P(0) is assumed to be constant. I drop my concern of an unknown P(0), it unnecessary complicates the discussion.<br />"You can certainly simulate the different price paths a stock can take (...)" — Once again, you are right in this framework. I was too much biased by real time constraints and empiric considerations.<br />Still, even remaining in the GBM framework, I miss something. You write that "log return (as opposed to log price) is (...) an ergodic variable." <br />The log price, in e.g. <a href="https://arxiv.org/abs/1209.4517" rel="nofollow"> Peters and Klein</a> GBM simulations turns out to be <i>non-ergodic</i>. Their P(0) is fixed, P(0) = 1 and logP(0) = 0. This gives the equality ∆logP(t) = logP(t), so that the log return ∆logP is not an ergodic variable neither.Almas Chalabaevnoreply@blogger.comtag:blogger.com,1999:blog-35364652.post-89891855190179700302017-09-25T07:43:17.396-04:002017-09-25T07:43:17.396-04:00Hi Almas,
If the price process is presumed to be G...Hi Almas,<br />If the price process is presumed to be GBM, then log(P(t)) is a normal variable at given time t, since it is the sum of many independent normal variables. The sum of i.i.d normal variables are naturally normal as well.<br /><br />You can certainly simulate the different price paths a stock can take, and take the ensemble average. I don't see why you have to involve other stocks (with their non-zero correlations to the current one) to compute that. After all, we are computing expectation values, not the actual realized averages, so there is no issue of "in real life, the stock price only traversed on single path".<br /><br />Your argument about the -s^2/2 is outside the scope of this argument. We are just demonstrating the problem using the simplest GBM where this is valid. We are not discussing the limitations of GBM. Of course, real price paths do not follow GBM, and the -s^2/2 term is not accurate as you pointed out. It is indeed model-dependent, but it doesn't affect the argument we are focusing on.<br /><br />ErnieErnie Chanhttps://www.blogger.com/profile/02747099358519893177noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-71329565798072768682017-09-25T07:33:53.691-04:002017-09-25T07:33:53.691-04:00Michael,
When I suggest taking first differences, ...Michael,<br />When I suggest taking first differences, I was indeed referring to log(W(t))-log(W(t-1)), not log(W(t))-log(W(0)). The assumption is that the time elapsed must be small - a differential time, which makes stochastic calculus applicable.<br /><br />Many people are interested in the total compounded growth in their portfolio, and not the average growth rate. The argument I cited is that the total compounded growth cannot be calculated using ensemble average, while the latter can. I don't agree that no one is interested in the former though. I therefore don't agree this is some kind of straw man argument.<br /><br />There is extensive discussion of this point in Taleb's new book (to be published, but excerpts available.)<br /><br />ErnieErnie Chanhttps://www.blogger.com/profile/02747099358519893177noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-91768161929830663322017-09-25T04:58:54.547-04:002017-09-25T04:58:54.547-04:00Dear Ernie,
First of all, congratulations for yo...Dear Ernie, <br /><br />First of all, congratulations for your great blog. <br /><br />I see an issue in the second paragraph where you compute E[P(t)], the expected value of price at time t. You write: " (...) In our case, the normal variable is the log price." — However, neither price P nor logP are normal variables. Both are functions of path and are not statistically independent. Only the increments ∆P/P and ∆logP may be assumed statistically independent (as far as the time axis is concerned) with a log-normal or normal pdf. We can compute their math-expected values E[∆P/P(t)] or E[∆logP(t)]. We can also compute their functions F(E[∆...]), in particular the price P(t) = P(E[∆logP(t)]). We call it "the expected value of P(t)" as if it were the math expectation of P, i.e. its pdf-weighted value while it is not. So, the way you get to the result, "the expected price at time t is exp(t*m)", seems to me questionable, because of the pdf issue. <br />Also, to evaluate whether P is ergodic, we have to compute P(N) across N realisations and not P(t). However, in real life we got only one realisation of a stock. Taking different stocks? In general, they are correlated and do not form a statistically sensible ensemble. So, while we know that in practice the wealth field is strongly non-ergodic, it is not so easy to prove in the framework of the brownian motion.<br />Peters constructs a non-ergodic GBM and gets <∆logP(N)> = mu. It looks rather arbitrary. One can imagine that ∆P/P's across the ensemble follow GBM, why not? We'll get then ergodic <∆logP(N)> = mu - s^2/2. I wonder also what sense we can give to mu across an ensemble? Further, Peters' findings rely on the log function properties. If we adopt the ABM, we get ergodic values. So confusing...<br />Let me finally comment on the "-s^2/2" component. In my eyes, it is model-depending, an accidental by-product of using the log function. The GBM model is handy to produce formulae and to be taught to students, but has too many shortcomings when one comes to reality of markets. The market increments ∆logP follow a Levy-Pareto pdf with alpha ca. 1.4 as measured by Mandelbrot and by Mantegna & Stanley. The trouble is that for alpha's < 2 the integral for the 2d moment diverges, so that the variance is undefined. The results from Itô's lemma, which needs the 2d moment, then evaporate and so do "-s^2/2". A direct approach to evaluate the risk by measuring the WDD (i.e. Kolmogorov-Smirnov metrics) is by far more reliable, more clear, and is model-independent.Almas Chalabaevnoreply@blogger.comtag:blogger.com,1999:blog-35364652.post-66087831838165810262017-09-25T04:13:22.812-04:002017-09-25T04:13:22.812-04:00Thanks for the reply Ernie. I'm not here to ar...Thanks for the reply Ernie. I'm not here to argue with you as you know I have high respect for you and your work. <br /><br />"The central argument in the cited references is that we should only compute averages of first differences, but not averages of wealth, since the former is independent of whether the average is based on time, or on ensembles, while the latter does. "<br /><br />No one is supposed to average actual wealth, this is a strawman argument, i.e., an artificial image to set up to later attack. <br /><br />If we set W(t = 0) = 0 then all we left with are first differences to sum to get to final wealth . <br /><br />In comparing ensemble and time-averages, no one ever forced anyone to take actual wealth W(t) in the former and returns W(t)/W(t-1) - 1 in the later. My point is that if this is assumed, it is a fallacy just to then make noise. In ensemble average one must use only first differences in the first place. Otherwise, that which is computed makes no sense. If we take that nonsense and try to argue against ensemble averages, then it is only us who are wrong, not the ensemble averages. I never heard of using actual wealth in ensemble averages and I am surprised this has made so much noise and has bothered some academics so much. Michael Harrishttp://www.priceactionlab.com/Blog/noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-85097785796905110692017-09-24T20:06:31.640-04:002017-09-24T20:06:31.640-04:00Hi Michael,
Wealth in this article and the referen...Hi Michael,<br />Wealth in this article and the references cited is defined as dollars accumulated. It isn't defined as the first difference (or change) in dollars accumulated. The latter is indeed ergodic, as I pointed out. The central argument in the cited references is that we should only compute averages of first differences, but not averages of wealth, since the former is independent of whether the average is based on time, or on ensembles, while the latter does. Since the usual mathematical apparatus can only compute ensemble averages, it can only be applied to changes in wealth than to wealth itself.<br /><br />G=A-V/2 is based on Ito's formula, which in turns assumes a log normal distribution of returns. It is a gross approximation to the same extent that Black-Scholes equation is a gross approximation of European option prices.<br /><br />ErnieErnie Chanhttps://www.blogger.com/profile/02747099358519893177noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-35086382424763018292017-09-24T17:42:55.734-04:002017-09-24T17:42:55.734-04:00This statement"
"It is ill-defined beca...This statement"<br /><br />"It is ill-defined because wealth is not an "ergodic" variable: its finite-time average is not equal to its "ensemble average"."<br /><br />Is puzzling to me. Wealth in the sense of a variable depends on how one measures it. If you measure wealth by changes in value, or first differences, then, the ensemble average is the average change in value and multiplied by the sample size gives the value of final wealth. The time-average is the rate of growth of the wealth. No one ever said that the ensemble average and the time average should be the same if the former is determined based on first differences and the latter based on arithmetic returns. More importantly, the time-average does not even depict the path as it calculated from the product of returns and that result does not depend on order of these returns. It is a misconception that time-averages offer more information than ensemble averages. There are many possible paths resulting in the same final wealth that both have same ensemble and time-averages. <br /><br />Therefore, I still do not get the problem. Some wrote a paper claiming that time-averages are better in case of non-ergodic process but in my opinion this is a strawman. Non-ergodicity does not even come into play when calculating ensemble and time-averages the proper way. Although ergodic processes have equal ensemble and time-averages, this is a red herring when considering the differences. No one ever said that the two must be the same and by not being it does not mean that a time-average provides better information than an ensemble average. And no one ever said that volatility does not affect final wealth; it does but this is irrelevant when it comes to a distinction between ensemble and time-averages because the trick is that in the context of final wealth, ensemble averages must be based on first differences, not arithmetic returns, and time-averages on arithmetic returns. Of course, if someone uses arithmetic returns to calculate ensemble averages final wealth will not be easily deduced. But this is due to the wrong choice of calculation, not because there is some fundamental difference between ensemble and time-averages that make the latter a better choice. <br /><br />A minor detail: G = A - V/2 is a gross approximation. Michael Harrishttp://www.priceactionlab.com/Blog/noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-18877068949920761832017-08-31T19:37:53.706-04:002017-08-31T19:37:53.706-04:00Hi Alex,
1) I don't know why your accuracy cha...Hi Alex,<br />1) I don't know why your accuracy changes so dramatically if you use random splitting. It seems that part of your future returns have been used as input feature.<br />2) To me, non-parametric models simply mean we have assumed no known simple distribution function to describe the data. As a result, there are actually more parameters because we need an empirical distribution to predict the future! I am not sure that one is better for long term vs short term. I generally think parametric model is less subject to overfitting and therefore better for finance models.<br />ErnieErnie Chanhttps://www.blogger.com/profile/02747099358519893177noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-15103017410823423462017-08-31T10:57:29.086-04:002017-08-31T10:57:29.086-04:00Hi Ernie,
I have a question about my model that I...Hi Ernie,<br /><br />I have a question about my model that I can't seem to figure out and I'm wondering if you can easily tell what I did. I have a list of 10 features (some price, some technical, some fundamental features). Basically many of the features I use to look at as a fundamental analyst.<br /><br />I ran those features through a random forest classification algo (I'm using Scikit-learn). I first predict next day stock price and randomly split training vs. testing. When I run it through the model I get no signal (accuracy, recall, precision etc. ~50%). <br /><br />Then I try to things: First, I extend the prediction from next day to 2, 3, 4, 5, 10, 50, etc. And my accuracy jumps up dramatically to the point where I know it can't be correct. The further I extend it out the higher it goes - 90%. Clearly this is wrong or else someone much smarter would have figured this out.<br /><br />Second, I try to correct this by not randomly splitting my data. Instead I train on the first 60% and test on the last 40%. My results go back to 50% or lower. <br /><br />My question is, what's going on in the model when I randomly train/test split my data vs. in the first example vs. the second example where I do not?<br /><br />My other question is as I've benign reading more - the differences in how ML models find a prediction seems to be categorized in parametric vs. non-parametric models. Has there been any research on whether one type of model is better suited for the stock market? In my mind, it seems that parametric might be suited for more long-term predictions (i.e. like fundamental analyst are trying to do), and non-parametric is more suited for short-term predictions (because of the noise that often fundamental analyst get wrong). Would like to hear your thoughts? I'm also wondering whether most people in practice use a regression based models or instance based models?<br /><br />Best,<br /><br />AlexAlex (tingli1081@gmail.com)noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-79257248918395984082017-08-30T19:13:04.781-04:002017-08-30T19:13:04.781-04:00Hi Alex,
If your input feature is not prices, ther...Hi Alex,<br />If your input feature is not prices, there is no reason to detrend it. Instead, you can consider using its "Zscore", using a moving lookback period.<br />ErnieErnie Chanhttps://www.blogger.com/profile/02747099358519893177noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-6768562749198281682017-08-30T15:54:20.678-04:002017-08-30T15:54:20.678-04:00Hi Ernie,
Just a follow up on the cross validatio...Hi Ernie,<br /><br />Just a follow up on the cross validation question. I went ahead and ignored it for now and did a random shuffle during cross validation. <br /><br />However, I have seen some blogs/sites that show people also taking the log returns to detrend time series data. My question is if I'm using a multivariate dataset (not just stock prices), do I need to do this? <br /><br />The reason I ask is I used random forest on a set of features and consistently got around 60% for precision, recall and f1. However, when I didn't shuffle the data I got much lower scores. I also tried to detrend my data by taking the log returns and got even worse scores. Just wondering what you think. <br /><br />Best,<br /><br />AlexAlexnoreply@blogger.comtag:blogger.com,1999:blog-35364652.post-60875908991483372452017-08-14T16:14:16.550-04:002017-08-14T16:14:16.550-04:00Great blog, great books.
Except as an approximati...Great blog, great books.<br /><br />Except as an approximation for a runaway stock, we can't use the unrestricted geometrical random walk model for large t. Once you consider absorbance (delistment and ruin), you will get your dependence on risk. Your derivation is correct for the solvent subset of the ensemble. This is different from the possible issue with non-ergodicity; thanks for the paper.Ivan Malyhttps://www.blogger.com/profile/00184181027786404651noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-71633835208455115942017-08-11T14:42:33.123-04:002017-08-11T14:42:33.123-04:00Hi Alex,
I am not too concerned about the "pe...Hi Alex,<br />I am not too concerned about the "peeking ahead" effect during cross-validation. The important point is that our input features occur prior in time to the predicted outcome, for each data point. If the validation data set is consistently prior in time to the training data set, that would be a concern. But generally speaking, the time order will be randomized, so I don't believe there can be much bias there. <br /><br />Of course, I am not objecting the roll forward method described in the video - I just don't think it will make a big difference in the out-of-sample results. However, if you want to be absolutely safe, you can certainly adopt that method.<br /><br />ErnieErnie Chanhttps://www.blogger.com/profile/02747099358519893177noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-50427379070773831432017-08-11T12:43:19.381-04:002017-08-11T12:43:19.381-04:00Hi Ernie,
This is was what I was referring to.
...Hi Ernie, <br /><br />This is was what I was referring to.<br /><br />https://www.youtube.com/watch?v=56Nq6cs2-gc<br /><br />Basically, my understanding is time series data, normal K-fold cross validation does not work because training data should always come before testing data. Wondering if you agree with this?<br /><br />Best,<br /><br />AlexAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-35364652.post-23484541215331414882017-08-10T14:51:43.392-04:002017-08-10T14:51:43.392-04:00Hi Alex,
It depends what you use as input features...Hi Alex,<br />It depends what you use as input features for the problem. <br /><br />If your input features involve L number of days, then it is best to avoid any overlap of the training set and the validation set by picking data at least L days apart to form each set.<br /><br />Other than that, most examples I discussed are time series problem. Why do you think cross validation won't work on these problems?<br /><br /><br />ErnieErnie Chanhttps://www.blogger.com/profile/02747099358519893177noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-16787361871333408752017-08-10T13:56:26.784-04:002017-08-10T13:56:26.784-04:00Hi Ernie,
I was reading your latest book Machine ...Hi Ernie,<br /><br />I was reading your latest book Machine Trading. In your book you talk about cross validation and in particular K-fold. My question is I've also read elsewhere that because of stock prices are a time series problem, K-fold does not work but rather roll forward cross validation is what you need to use. Any thoughts on this?<br /><br />Best,<br /><br />AlexTingnoreply@blogger.comtag:blogger.com,1999:blog-35364652.post-20259590452740200812017-07-20T07:00:37.573-04:002017-07-20T07:00:37.573-04:00Hi,
Cumulative returns are more meaningful.P&...Hi, <br />Cumulative returns are more meaningful.P&L depends on the size of your orders and is arbitrary.<br />ErnieErnie Chanhttps://www.blogger.com/profile/02747099358519893177noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-44331375192297677702017-07-19T22:56:53.194-04:002017-07-19T22:56:53.194-04:00Hi Ernie,
When we do backtesting, shall we use c...Hi Ernie,<br /><br />When we do backtesting, shall we use cumulative pl or cumulative return to evaluate performance? What are their strength or weakness?<br /><br />Thanks.<br /><br />Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-35364652.post-7087857709272656402017-07-19T08:37:41.290-04:002017-07-19T08:37:41.290-04:00Hi,
The typical way is to subtract some fixed perc...Hi,<br />The typical way is to subtract some fixed percentage from each trade's returns. My first book Quantitative Trading has detailed examples on this.<br />ErnieErnie Chanhttps://www.blogger.com/profile/02747099358519893177noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-5593293347262728672017-07-18T23:55:52.348-04:002017-07-18T23:55:52.348-04:00Hi Ernie,
Thank you for quick response!
When we ...Hi Ernie,<br /><br />Thank you for quick response!<br /><br />When we evaluate performance via profit&loss(pl) and cumulative pl, it is straightforward to include transaction costs.<br /><br />When we evaluate performance via net return and cumulative return, how do we include transaction costs?<br /><br />Many thanks.<br /><br /><br />Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-35364652.post-5581412289105555722017-07-18T08:54:19.125-04:002017-07-18T08:54:19.125-04:00Hi,
Log return = log(price(t))-log(price(t-1))
Ne...Hi,<br />Log return = log(price(t))-log(price(t-1))<br /><br />Net return = price(t)/price(t-1) -1<br /><br />ErnieErnie Chanhttps://www.blogger.com/profile/02747099358519893177noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-88474123201385864172017-07-18T08:44:19.695-04:002017-07-18T08:44:19.695-04:00Hi Ernie,
What is the difference between log retu...Hi Ernie,<br /><br />What is the difference between log return and net return?<br /><br />Thanks.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-35364652.post-59978983177263834012017-07-18T06:39:50.677-04:002017-07-18T06:39:50.677-04:00Hi,
Yes, for signal generation, not for performanc...Hi,<br />Yes, for signal generation, not for performance evaluation. The latter need to conform to the industry norm - which uses net return.<br />ErnieErnie Chanhttps://www.blogger.com/profile/02747099358519893177noreply@blogger.com