tag:blogger.com,1999:blog-35364652.post8230965571724080508..comments2017-09-22T08:18:11.468-04:00Comments on Quantitative Trading: Paradox Resolved: Why Risk Decreases Expected Log Return But Not Expected WealthErnie Chanhttp://www.blogger.com/profile/02747099358519893177noreply@blogger.comBlogger57125tag:blogger.com,1999:blog-35364652.post-18877068949920761832017-08-31T19:37:53.706-04:002017-08-31T19:37:53.706-04:00Hi Alex,
1) I don't know why your accuracy cha...Hi Alex,<br />1) I don't know why your accuracy changes so dramatically if you use random splitting. It seems that part of your future returns have been used as input feature.<br />2) To me, non-parametric models simply mean we have assumed no known simple distribution function to describe the data. As a result, there are actually more parameters because we need an empirical distribution to predict the future! I am not sure that one is better for long term vs short term. I generally think parametric model is less subject to overfitting and therefore better for finance models.<br />ErnieErnie Chanhttps://www.blogger.com/profile/02747099358519893177noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-15103017410823423462017-08-31T10:57:29.086-04:002017-08-31T10:57:29.086-04:00Hi Ernie,
I have a question about my model that I...Hi Ernie,<br /><br />I have a question about my model that I can't seem to figure out and I'm wondering if you can easily tell what I did. I have a list of 10 features (some price, some technical, some fundamental features). Basically many of the features I use to look at as a fundamental analyst.<br /><br />I ran those features through a random forest classification algo (I'm using Scikit-learn). I first predict next day stock price and randomly split training vs. testing. When I run it through the model I get no signal (accuracy, recall, precision etc. ~50%). <br /><br />Then I try to things: First, I extend the prediction from next day to 2, 3, 4, 5, 10, 50, etc. And my accuracy jumps up dramatically to the point where I know it can't be correct. The further I extend it out the higher it goes - 90%. Clearly this is wrong or else someone much smarter would have figured this out.<br /><br />Second, I try to correct this by not randomly splitting my data. Instead I train on the first 60% and test on the last 40%. My results go back to 50% or lower. <br /><br />My question is, what's going on in the model when I randomly train/test split my data vs. in the first example vs. the second example where I do not?<br /><br />My other question is as I've benign reading more - the differences in how ML models find a prediction seems to be categorized in parametric vs. non-parametric models. Has there been any research on whether one type of model is better suited for the stock market? In my mind, it seems that parametric might be suited for more long-term predictions (i.e. like fundamental analyst are trying to do), and non-parametric is more suited for short-term predictions (because of the noise that often fundamental analyst get wrong). Would like to hear your thoughts? I'm also wondering whether most people in practice use a regression based models or instance based models?<br /><br />Best,<br /><br />AlexAlex (tingli1081@gmail.com)noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-79257248918395984082017-08-30T19:13:04.781-04:002017-08-30T19:13:04.781-04:00Hi Alex,
If your input feature is not prices, ther...Hi Alex,<br />If your input feature is not prices, there is no reason to detrend it. Instead, you can consider using its "Zscore", using a moving lookback period.<br />ErnieErnie Chanhttps://www.blogger.com/profile/02747099358519893177noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-6768562749198281682017-08-30T15:54:20.678-04:002017-08-30T15:54:20.678-04:00Hi Ernie,
Just a follow up on the cross validatio...Hi Ernie,<br /><br />Just a follow up on the cross validation question. I went ahead and ignored it for now and did a random shuffle during cross validation. <br /><br />However, I have seen some blogs/sites that show people also taking the log returns to detrend time series data. My question is if I'm using a multivariate dataset (not just stock prices), do I need to do this? <br /><br />The reason I ask is I used random forest on a set of features and consistently got around 60% for precision, recall and f1. However, when I didn't shuffle the data I got much lower scores. I also tried to detrend my data by taking the log returns and got even worse scores. Just wondering what you think. <br /><br />Best,<br /><br />AlexAlexnoreply@blogger.comtag:blogger.com,1999:blog-35364652.post-60875908991483372452017-08-14T16:14:16.550-04:002017-08-14T16:14:16.550-04:00Great blog, great books.
Except as an approximati...Great blog, great books.<br /><br />Except as an approximation for a runaway stock, we can't use the unrestricted geometrical random walk model for large t. Once you consider absorbance (delistment and ruin), you will get your dependence on risk. Your derivation is correct for the solvent subset of the ensemble. This is different from the possible issue with non-ergodicity; thanks for the paper.Ivan Malyhttps://www.blogger.com/profile/00184181027786404651noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-71633835208455115942017-08-11T14:42:33.123-04:002017-08-11T14:42:33.123-04:00Hi Alex,
I am not too concerned about the "pe...Hi Alex,<br />I am not too concerned about the "peeking ahead" effect during cross-validation. The important point is that our input features occur prior in time to the predicted outcome, for each data point. If the validation data set is consistently prior in time to the training data set, that would be a concern. But generally speaking, the time order will be randomized, so I don't believe there can be much bias there. <br /><br />Of course, I am not objecting the roll forward method described in the video - I just don't think it will make a big difference in the out-of-sample results. However, if you want to be absolutely safe, you can certainly adopt that method.<br /><br />ErnieErnie Chanhttps://www.blogger.com/profile/02747099358519893177noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-50427379070773831432017-08-11T12:43:19.381-04:002017-08-11T12:43:19.381-04:00Hi Ernie,
This is was what I was referring to.
...Hi Ernie, <br /><br />This is was what I was referring to.<br /><br />https://www.youtube.com/watch?v=56Nq6cs2-gc<br /><br />Basically, my understanding is time series data, normal K-fold cross validation does not work because training data should always come before testing data. Wondering if you agree with this?<br /><br />Best,<br /><br />AlexAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-35364652.post-23484541215331414882017-08-10T14:51:43.392-04:002017-08-10T14:51:43.392-04:00Hi Alex,
It depends what you use as input features...Hi Alex,<br />It depends what you use as input features for the problem. <br /><br />If your input features involve L number of days, then it is best to avoid any overlap of the training set and the validation set by picking data at least L days apart to form each set.<br /><br />Other than that, most examples I discussed are time series problem. Why do you think cross validation won't work on these problems?<br /><br /><br />ErnieErnie Chanhttps://www.blogger.com/profile/02747099358519893177noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-16787361871333408752017-08-10T13:56:26.784-04:002017-08-10T13:56:26.784-04:00Hi Ernie,
I was reading your latest book Machine ...Hi Ernie,<br /><br />I was reading your latest book Machine Trading. In your book you talk about cross validation and in particular K-fold. My question is I've also read elsewhere that because of stock prices are a time series problem, K-fold does not work but rather roll forward cross validation is what you need to use. Any thoughts on this?<br /><br />Best,<br /><br />AlexTingnoreply@blogger.comtag:blogger.com,1999:blog-35364652.post-20259590452740200812017-07-20T07:00:37.573-04:002017-07-20T07:00:37.573-04:00Hi,
Cumulative returns are more meaningful.P&...Hi, <br />Cumulative returns are more meaningful.P&L depends on the size of your orders and is arbitrary.<br />ErnieErnie Chanhttps://www.blogger.com/profile/02747099358519893177noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-44331375192297677702017-07-19T22:56:53.194-04:002017-07-19T22:56:53.194-04:00Hi Ernie,
When we do backtesting, shall we use c...Hi Ernie,<br /><br />When we do backtesting, shall we use cumulative pl or cumulative return to evaluate performance? What are their strength or weakness?<br /><br />Thanks.<br /><br />Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-35364652.post-7087857709272656402017-07-19T08:37:41.290-04:002017-07-19T08:37:41.290-04:00Hi,
The typical way is to subtract some fixed perc...Hi,<br />The typical way is to subtract some fixed percentage from each trade's returns. My first book Quantitative Trading has detailed examples on this.<br />ErnieErnie Chanhttps://www.blogger.com/profile/02747099358519893177noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-5593293347262728672017-07-18T23:55:52.348-04:002017-07-18T23:55:52.348-04:00Hi Ernie,
Thank you for quick response!
When we ...Hi Ernie,<br /><br />Thank you for quick response!<br /><br />When we evaluate performance via profit&loss(pl) and cumulative pl, it is straightforward to include transaction costs.<br /><br />When we evaluate performance via net return and cumulative return, how do we include transaction costs?<br /><br />Many thanks.<br /><br /><br />Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-35364652.post-5581412289105555722017-07-18T08:54:19.125-04:002017-07-18T08:54:19.125-04:00Hi,
Log return = log(price(t))-log(price(t-1))
Ne...Hi,<br />Log return = log(price(t))-log(price(t-1))<br /><br />Net return = price(t)/price(t-1) -1<br /><br />ErnieErnie Chanhttps://www.blogger.com/profile/02747099358519893177noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-88474123201385864172017-07-18T08:44:19.695-04:002017-07-18T08:44:19.695-04:00Hi Ernie,
What is the difference between log retu...Hi Ernie,<br /><br />What is the difference between log return and net return?<br /><br />Thanks.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-35364652.post-59978983177263834012017-07-18T06:39:50.677-04:002017-07-18T06:39:50.677-04:00Hi,
Yes, for signal generation, not for performanc...Hi,<br />Yes, for signal generation, not for performance evaluation. The latter need to conform to the industry norm - which uses net return.<br />ErnieErnie Chanhttps://www.blogger.com/profile/02747099358519893177noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-20412434851051244192017-07-17T20:22:56.106-04:002017-07-17T20:22:56.106-04:00Hi Ernie,
Do you use log return during backtestin...Hi Ernie,<br /><br />Do you use log return during backtesting?<br /><br />Thanks.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-35364652.post-61666151110616730032017-07-15T08:10:43.985-04:002017-07-15T08:10:43.985-04:00Hi Alex,
1) SVM is a classification algorithm, can...Hi Alex,<br />1) SVM is a classification algorithm, can only be used to predict direction of price moves.<br />2) Random forest can be applied to either classification or regression trees. The latter can be used to predict magnitude as well as direction of moves.<br />3) There is no reason to believe one is superior to the other with respect to various out-of-sample (OOS) performance measures.<br />4) Daily price is good enough if you aggregate >1,000 stocks and build the same model for all.<br />5) If a feature was found to work OOS, then of course one should retain it in library. There aren't that many features that work OOS, so no worries about "too many". We can only have too many unproven features - that lead to data snooping bias.<br />6) My book references many other books and articles. In terms of AI in trading, I recommend following @carlcarrie on Twitter, which posts numerous relevant articles daily.<br /><br />ErnieErnie Chanhttps://www.blogger.com/profile/02747099358519893177noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-34152931114268247302017-07-14T21:11:03.546-04:002017-07-14T21:11:03.546-04:00Hi Ernie,
I've listened to a couple of your t...Hi Ernie,<br /><br />I've listened to a couple of your talks online and had a few questions after reading your blog. My background is fundamental investing at asset management and hedge funds and only more recently have I tried to pick up machine learning. <br /><br />In one of lectures, you referenced SVM and random forest as two algorithms that work well for stock prediction. However, I wasn't clear if this is a classification or regression problem. In other words, what is more appropriate, to predict the actual stock price, the actual return or if the stock with be up or down? I've seen a lot of academic papers and I've seen a mix of what people do but I've seen no explanation as to the pros and cons of use one over the other. Perhaps you could enlighten me here.<br /><br />My second question is regarding one of your comments about insufficient data. Are you saying that daily price data for your y label is not enough and you need minute by minute data? <br /><br />And my third question has to do with one of your slides where you say feature rich data sets are a curse. I've read about this before so I'm not questioning what you're saying. However, I remember listening to a talk given my James Simons, and he said that once they find a good predictor they just leave it in in case it comes back. If that's the case, then over time, they would like have a substantial number of features in the model. Are they doing something that perhaps is different?<br /><br />My last question is about your recent book or perhaps you have a suggestion of other resources. I'm trying to find a resource that clearly shows right way to clean and transform feature data and cross validate in order to use it for prediction. I've not seen a source that accurately describes best practices.<br /><br />Thank you,<br /><br />Alex<br /><br /><br /><br /><br /><br /><br />Alexnoreply@blogger.comtag:blogger.com,1999:blog-35364652.post-8130603582941848292017-07-14T21:10:07.747-04:002017-07-14T21:10:07.747-04:00Hi Ernie,
I've listened to a couple of your t...Hi Ernie,<br /><br />I've listened to a couple of your talks online and had a few questions after reading your blog. My background is fundamental investing at asset management and hedge funds and only more recently have I tried to pick up machine learning. <br /><br />In one of lectures, you referenced SVM and random forest as two algorithms that work well for stock prediction. However, I wasn't clear if this is a classification or regression problem. In other words, what is more appropriate, to predict the actual stock price, the actual return or if the stock with be up or down? I've seen a lot of academic papers and I've seen a mix of what people do but I've seen no explanation as to the pros and cons of use one over the other. Perhaps you could enlighten me here.<br /><br />My second question is regarding one of your comments about insufficient data. Are you saying that daily price data for your y label is not enough and you need minute by minute data? <br /><br />And my third question has to do with one of your slides where you say feature rich data sets are a curse. I've read about this before so I'm not questioning what you're saying. However, I remember listening to a talk given my James Simons, and he said that once they find a good predictor they just leave it in in case it comes back. If that's the case, then over time, they would like have a substantial number of features in the model. Are they doing something that perhaps is different?<br /><br />My last question is about your recent book or perhaps you have a suggestion of other resources. I'm trying to find a resource that clearly shows right way to clean and transform feature data and cross validate in order to use it for prediction. I've not seen a source that accurately describes best practices.<br /><br />Thank you,<br /><br />Alex<br /><br /><br /><br /><br /><br /><br />Alexnoreply@blogger.comtag:blogger.com,1999:blog-35364652.post-6143439676377351042017-07-11T17:47:58.568-04:002017-07-11T17:47:58.568-04:00Hi Ever,
Yes, as I have shown in another blog post...Hi Ever,<br />Yes, as I have shown in another blog post (https://epchan.blogspot.ca/2014/08/kelly-vs-markowitz-portfolio.html), Kelly is essentially the same as the widely adopted Mean Variance optimization method, except that Kelly also suggests an overall leverage of a portfolio.<br /><br />It is unclear what it means by periods where Kelly "fails". Fails meaning the portfolio is not profitable? But that has nothing to do with Kelly itself. Fails meaning the portfolio underperforms an equal weighted one? Certainly! But again, Kelly does not promise it always outperforms other allocation methods in all periods. It only promises that in the long run, it generates maximum growth rate. The long run, of course, can be very long.<br /><br />ErnieErnie Chanhttps://www.blogger.com/profile/02747099358519893177noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-46811520081273823602017-07-11T16:24:53.828-04:002017-07-11T16:24:53.828-04:00Based on all your books it seems that Kelly is the...Based on all your books it seems that Kelly is the ideal allocation method for all trading strategies. Just to confirm, the Kelly formula has flaws when in use, I know you covered the half Kelly but beyond that, have you encountered period where you didn't use Kelly at all and periods where you used Kelly and it failed? Keep in mind I am not referring to the periods where you've recommended QT inventors to opt out of strategies based on regime changes or unknowns. I am aware that this specific does focus on Kelly so we can call it a side note. Ever Garciahttps://www.blogger.com/profile/01357178781354564617noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-31639604784274558092017-07-07T08:30:13.600-04:002017-07-07T08:30:13.600-04:00Hi Emil,
The compound growth rate at any finite T...Hi Emil,<br /><br />The compound growth rate at any finite T asymptotically approaches the formula given in this post (which is exact at T-> infinity). This is true for most theoretical derivations involving time averages in finance.<br /><br />ErnieErnie Chanhttps://www.blogger.com/profile/02747099358519893177noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-22994586560565362342017-07-07T05:42:36.765-04:002017-07-07T05:42:36.765-04:00Very interesting, log-utility just happens to maxi...Very interesting, log-utility just happens to maximize the time average which is what we really should be focusing on. But what about the case when we have a fixed time horizon, then taking the limit as T -> infinity does not make sense and we don't get rid of the stochastic component in the growth rate?Emilnoreply@blogger.comtag:blogger.com,1999:blog-35364652.post-49181928037315186512017-06-28T07:20:18.623-04:002017-06-28T07:20:18.623-04:00Hi,
ivolatility.com, quandl.com, optionmetrics.com...Hi,<br />ivolatility.com, quandl.com, optionmetrics.com.<br />ErnieErnie Chanhttps://www.blogger.com/profile/02747099358519893177noreply@blogger.com