tag:blogger.com,1999:blog-35364652.post8877037729862013753..comments2024-03-22T10:29:59.088-04:00Comments on Quantitative Trading: Is News Sentiment Still Adding Alpha?Ernie Chanhttp://www.blogger.com/profile/02747099358519893177noreply@blogger.comBlogger5125tag:blogger.com,1999:blog-35364652.post-4746113180306559962020-01-12T18:58:27.600-05:002020-01-12T18:58:27.600-05:00Thanks for the paper!
ErnieThanks for the paper! <br />ErnieErnie Chanhttps://www.blogger.com/profile/02747099358519893177noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-45654282234884775502020-01-12T11:01:56.083-05:002020-01-12T11:01:56.083-05:00I think the key to getting sentiment working is ha...I think the key to getting sentiment working is having a better measure of it, although it doesn't seem like you had much control over that during this competition. Some AQR researchers claim to have done a good job at this:<br /><br />https://www.aqr.com/Insights/Research/Working-Paper/Predicting-Returns-with-Text-Data<br />https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3489226Anonymoushttps://www.blogger.com/profile/17731741186149817464noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-24297923602289933662019-06-27T09:28:04.970-04:002019-06-27T09:28:04.970-04:00According to the Efficient Market Hypothesis (EMH)...According to the Efficient Market Hypothesis (EMH), all publicly available information is reflected by the historical prices. It is not difficult to screen $1 Million-daily-dollar stocks from the entire collection of Wall Street non-OTC stocks in the past and present. It is equally simple to rank those stock collections in terms of decreasing probability to increase in price for a given correlation time or set of correlation times. It is then straightforward to show that the complete collection produces an alpha of 10.8% since 2010. When you consistently pick the top-ranked 12 long and 12 short positions out of these collections, you produce an alpha of almost 25% since 2010. The details are spelled out in https://www.enterergodics.com/en/alpha-daily-trading-liquidity . Jan Dilhttps://www.enterergodics.com/en/alpha-daily-trading-liquiditynoreply@blogger.comtag:blogger.com,1999:blog-35364652.post-77695580865706445522019-04-28T09:37:04.446-04:002019-04-28T09:37:04.446-04:00Thanks for all great suggestions on how to overcom...Thanks for all great suggestions on how to overcome the constraints imposed by Kaggle!<br /><br />1) Indeed we can save all output to a file for later analysis, or to commit code to save the output, but I think these all fall under the "cumbersome" and "time-consuming" description! When I do quant research on our local GPU machine with Matlab, none of such steps are required, and progress is about 10x faster.<br /><br />2) Indeed we figured out ways to save on memory usage. Some are as you described. Another trick is to reduce a Pandas DataFrame to a simple Numpy array. Again, these fall under the cumbersome category. <br /><br />As I wrote, we do not dispute that Jupyter is great for collaboration and presenting research. Our quibble is with its limited debugging facility - especially the lack of breakpoints and ways to see intermediate values of variables.<br /><br />But good to know that many have trained DL models despite the limitation.<br /><br />3) I agree with you that the primary goal of Two Sigma in running this competition is not to discover whether News Sentiment has alpha (I am sure they know the answer years ago), but to recruit. Our goal is the opposite: we really want to know if news has alpha, and have no interest in showing off our strategies or codes to Two Sigma. Hence we did not submit our codes (another more humble reason is that we ran out of time!)<br /><br />Yes, we too are very interested in finding out if the top competitors find news features important out-of-sample.<br /><br />Ernie<br /><br />Ernie Chanhttps://www.blogger.com/profile/02747099358519893177noreply@blogger.comtag:blogger.com,1999:blog-35364652.post-30177536809967564172019-04-27T22:31:48.695-04:002019-04-27T22:31:48.695-04:00It would be interesting if the top leaderboards ha...It would be interesting if the top leaderboards have no news features at all. Some discussion in the competition suggested that news do not add much value.<br /><br />Did you make submissions for the competitions? if so what is your latest score?<br /><br />2 points:<br /><br />1.<br />"Kaggle kills a kernel if left idle for a few hours. Good luck training a machine learning model overnight and not getting up at 3 a.m. to save the results just in time."<br /><br />That's not how you use the kernel. You write and test your codes in the interactive interface (tip: initially work with small set of data to run and debug), then when you are done, make changes in the code to process the whole data. You can add codes to print the ouput you want or to plot charts, or to save the output on csv file which you can download and analyze later. Afterwards you commit the kernel for it to run at the server which then save the results. Once you click commit and see it running you can close the window and shutdown your pc, it will run until your script finishes or be terminated for exceeding the 9 hours running time hard limit. I usually write codes, click commit then go to sleep. If the script finishes you can come back later and analyze the results. If kernel exceeded 9 hours you can try to figure out which code block that takes the most time and adjust your codes accordingly.<br /><br /><br /><br />2.<br />"Not only is Jupyter Notebook a sub-optimal tool for efficient research and software development, we are only allowed to use 4 CPU's and a very limited amount of memory for the research. GPU access is blocked, so good luck running your deep learning models. Even simple data pre-processing killed our kernels (due to memory problems) so many times that our hair was thinning by the time we were done."<br /><br />Some people thinks that two sigma may want to make new hires from the competition, so demonstration of skills on preprocessing data, understanding data types and how to make efficient algorithms would be something that two sigma wants to see in the codes.<br /><br />You can delete columns that you don't need, especially object or strings columns, these take the most memory. df.info(memory_usage='deep') will give a clear info on how much memory a dataframe use. For floats you can downcast to float32 to reduce the memory footprint.<br /><br />I would have to disagree that jupyter is sub-optimal tool for research and development. Some people have their own preference of course. In terms of performance, it is essentially using the same python kernel as any other editors like anaconda. The ability to save the code, its immediate output, and make annotations is a great way to explain and DOCUMENT our thought process on the code and on the results in jupyter notebook. I learnt so much from other public kernels both from the codes and the explanations that kernel authors wrote. Also if you want to work with scripts only, kaggle kernel allows that.<br /><br />Nine hours of running time for 4 CPU is sufficient to train deep learning model. There are some public kernels which use deep learning models in the competition. The time to train also depends on the type of deep learning models, convolutions and dense networks can be computed relatively fast, LSTM takes more time but so far, most deep learning models in the competition can be trained within the 9 hours hard limit with good results.<br /><br /><br />Anonymousnoreply@blogger.com