Quantitative Trading: Hidden Markov model applied to FX prediction

Saturday, March 03, 2012

Hidden Markov model applied to FX prediction

I read with interest an older paper "Can Markov Switching Models Predict Excess Foreign Exchange Returns?" by Dueker and Neely of the Federal Reserve Bank of St. Louis. I have a fondness for hidden Markov models because of its great success in speech recognition applications, but I confess that I have never been able to create a HMM model that outperforms simple technical indicators. I blame that both on my own lack of creativity as well as the fact that HMM tend to have too many parameters that need to be fitted to historical data, which makes it vulnerable to data snooping bias. Hence I approached this paper with the great hope that experts can teach me how to apply HMM properly to finance.

The objective of the model is simple: to predict the excess return of an exchange rate over an 8-day period. (Excess return in this context is measured by the % change in the exchange rate minus the interest rate differential between the base and quote currencies of the currency pair.) If the expected excess return is higher than a threshold (called "filter" in the paper), then go long. If it is lower than another threshold, go short. Even though the prediction is on a 8-day return, the trading decision is made daily.

The excess return is assumed to have a 3-parameter student-t distribution. The 3 parameters are the mean, the degree of freedom, and the scale. The scale parameter (which controls the variance) can switch between a high and low value based on a Markov model. The degree of freedom (which controls the kurtosis, a.k.a. "thickness of the tails") can also switch between 2 values based on another Markov model. The mean is linearly dependent on the values assumed by the degree of freedom and the scale as well as another Markov variable that switches between 2 values. Hence the mean can assume 8 distinct values. The 3 Markov models are independent. The student-t distribution is more appropriate for the modelling financial returns than normal distribution because of the allowance for heavy tails. The authors also believe that this model captures the switch between periods of high and low volatility, with the consequent change of preference (=different mean returns) for "safe" versus "risky" currencies, a phenomenon well-demonstrated in the period between August 2011 to January 2012.

The parameters of the Markov models and the student-t distributions are estimated in the in-sample period (1974-1981) for each currency pair in order to minimize the cumulative deviation of the excess returns from zero. There are a total of 14 parameters to be so estimated. After these estimations, we have to also estimate the 2 trading thresholds by maximizing the in-sample return of the trading strategy, assuming a transaction costs of 10 basis point per trade.

With this large number (16 in total) of parameters, I dread to see the out-of-sample (1982-2005) results. Amazing, these are far better than I expected: the annualized returns range from 1.1% to 7.5% for 4 major currency pairs. The Sharpe ratios are not as impressive: they range from 0.11 to 0.71. Of course, when researchers report out-of-sample results, one should take that with a grain of salt. If the out-of-sample results weren't good, they wouldn't be reporting them, and they would have kept changing the underlying model until good "out-of-sample" results are obtained! So it is really up to us to implement this model, apply it to data after 2005 and to more currency pairs, to find out if there is really something here. In fact, this is the reason why I prefer to read older papers - to allow for the possibility of true out-of-sample tests immediately.

What do you think can be done to improve this model? I suspect that as a first step, one can see whether the estimated Markov states correspond reasonably to what traders think of as risk-on vs risk-off regimes. If they do, then regardless of the usage of this model as a signal generator, it can at least generate good risk indicators. If not, then maybe the hidden Markov model need to be replaced with a Markov model that is conditioned on observable indicators.

36 comments:

Anonymous said...: Ernie,

You've got a typo in the title of the paper. The word "reserves" should be replaced with *returns*. Man, I was really confused when I saw the title of that you wrote! I was thinking, "why on earth would anybody care about predicting excess foreign exchange reserves"?

- aagold; Saturday, March 3, 2012 at 12:15:00 PM EST
Anonymous said...: Ernie,

Your comment about "out of sample tests" in research papers not really being so out-of-sample is spot on! I don't think many people understand the issue you raised, and I think it's a very important point.

- aagold; Saturday, March 3, 2012 at 12:23:00 PM EST
Ernie Chan said...: aagold,
Thanks for pointing that out ... actually, the typo was in the original preprint, which is why I copied it!
Ernie; Saturday, March 3, 2012 at 1:07:00 PM EST
Anonymous said...: Ernie,
Not to question your quant capabilities but are you seriously suggesting a model with that many parameters to fit to has any applicability to trading? I say this as quant trader with over 14 years of industry experience and running my own mid to hft firm. To me this paper is absolute nonesense and the mentioned Sharpe ratios are way too low even in their own "out of sample" backtests to justify taking such paper seriously.; Saturday, March 3, 2012 at 6:48:00 PM EST
Ernie Chan said...: AsiaProp,
Actually, the 16 parameters are not as many as they sound. 14 of those are for fitting the time series itself: they are independent of the trading strategy. Only 2 of the parameters are used to optimize the strategy return.

The Sharpe ratios reported in academic research are almost always low. If they are high, they won't be published. Our job as traders is to take those research as inspiration and tweak them into practical strategies.

Ernie; Saturday, March 3, 2012 at 7:10:00 PM EST
Zack said...: Ernie,

Thanks again for all your hard work. On top of your blog and book, I gain great insight just reading through your conversations with other commentors on your site.

In a previous comment thread from the other day you mentioned that a large portion of your returns in 2011 came from mean-reversion strategies in the FX market. I was wondering if you employ any type of regime switching model in your FX trading to determine whether you want to be allocated primarily to your momentum or mean-reversion strategies?

- Zack; Monday, March 5, 2012 at 6:29:00 PM EST
Ernie Chan said...: Zack,
No, I didn't use any regime switching models. I have never found that these models work out-of-sample.
Ernie; Monday, March 5, 2012 at 8:21:00 PM EST
Anonymous said...: Hi Erine

Did you read this paper before, any comment?

liu.diva-portal.org/smash/get/diva2:17431/FULLTEXT01; Wednesday, March 7, 2012 at 2:12:00 PM EST
Ernie Chan said...: Hi Anon,
No, I haven't seen that paper, but will put that on my reading list!

Also, Chris Neely, the author of the paper I described, mentioned to me this other relevant paper:

http://research.stlouisfed.org/wp/more/2006-046/

and his website:

http://research.stlouisfed.org/econ/cneely/

Ernie; Thursday, March 8, 2012 at 7:27:00 AM EST
Dave said...: Just speaking from an academic perspective, instead of the plain HMM perhaps something like the Maximum Entropy Hidden Markov Model may work better?; Friday, March 9, 2012 at 5:08:00 AM EST
Dave said...: This comment has been removed by the author.; Friday, March 9, 2012 at 5:09:00 AM EST
Ernie Chan said...: Dave,
Why do you think maximum entropy HMM will work better? It seems to be just another method to estimate the parameters.
Ernie; Friday, March 9, 2012 at 7:14:00 AM EST
Dave said...: I have no empirical evidence and financial prediction isn't really my area of expertise. It is just that in my few attempts at using machine learning for financial predictions, I learnt that the amount of noise tends to swamp out any trends the market may have. As a result most learners tend to perform really poorly, quite possibly due to over-fitting to the training data.

So one of my ideas is to use techniques like Maximum Entropy to reduce the degree of over-fitting. However, I have not actually tried this out.; Friday, March 9, 2012 at 8:14:00 AM EST
Anonymous said...: Hi ernie:
I am currently reading your book called "quantitative trading", and already programmed and tried MATHLAB for backtesting. However, the results differs from MetaTrader Strategy tester/Optimization.

In MT4, I have hundreds of passes which agree with most of my real trades (thankfully) but the latter is not as positive. I use the same dataset, which I track from 2001-2009.

The main reason why MATHLAB is that i wish to employ Sharpe Ratio. Usually, in MT4, choosing my parameters is fairly easy, straightforward. I choose the ones with minimal drawdown + best returns, and then run separate copies of them.

After reading your book, I was thinking of choosing parameters with:
1) Minimal drawdown
2) Best returns

and add a third criteria, Sharpe Ratio. This way, I feel I can increase my returns, no? The formula looks complicated but nonetheless, its no harm trying. What do you think? And thanks!; Tuesday, March 20, 2012 at 3:53:00 AM EDT
Ernie Chan said...: Hi Anon,
When you said the results from Matlab differs from Metatrader, can you be more specific? Are you sure that the logic in the 2 programs are identical?

You can employ Sharpe ratio in any programs you choose, not necessarily Matlab. It is just mean return divided by standard deviation.
Ernie; Tuesday, March 20, 2012 at 8:01:00 AM EDT
farmland investment in Australia said...: I also thought that the Sharpe ratio could still be employed in any program. Is it really just limited to Mathlab?; Tuesday, March 20, 2012 at 4:42:00 PM EDT
Anonymous said...: Ernie Chan said...
Hi Anon,
When you said the results from Matlab differs from Metatrader, can you be more specific? Are you sure that the logic in the 2 programs are identical?

-------------

Yes, Im very sure they are.

Ok, i be more specific. My strategy is extremely simple, but profitable (at least for me) - just 2 lines of logic, 2 integer parameters. I cant see how or why such simple logic differs greatly, between the two.

The difference is that in MT4 I get hundreds passes, but in MATHLAB, I only get around 50 passes. In MATHLAB, one of the 1 year test pass return a balance of 200+K from initial capital of 10K, but in MT4, the balances is within range 50K-100K, for all the passes.

One more thing, in MT4, time of the bars are considered within the tester. I dont need to re-program anything. But in MATHLAB, I have to separate this data set. Maybe thats why the difference?

Thx again for your kind help.

Rgds
Ruthstein; Wednesday, March 21, 2012 at 2:45:00 AM EDT
Ernie Chan said...: Hi Ruthstein,
Yes, it is likely that errors in data preparation is what caused the differences. In Metatrader, data is installed as part of the program. But Matlab is a general computing platform, much like a calculator. You have to be very careful in preparing data for input into Matlab.
Ernie; Wednesday, March 21, 2012 at 8:44:00 AM EDT
Anonymous said...: Hi ernie, thank you very much for your comments. someone help me out with his plug-in for the time part and there was a a very slight error in the time preparation in MATHLAB. Still, the results remain inconsistent. But surprisingly now, the Sharpe Ratio is almost the same value for the top 5 minimal drawdown passes! but not in terms of profits, though.

On the bright side, this makes choices way easier than before, since I just decide in terms of safest drawdown, since the sharpe ratio for all them are pretty acceptable.

Again, thanks for your kind help and I must say, your book is a good read... I will have no doubt that I buy again your next book!; Wednesday, March 21, 2012 at 10:42:00 PM EDT
Ernie Chan said...: Hi Ruthstein,
I am glad you found a bug. If the programming logic are the same in Matlab and MT, then the only reason results can be different is the input data is wrong.
Ernie; Thursday, March 22, 2012 at 8:06:00 AM EDT
Anonymous said...: Ernies,
when do you come to USA to teach Quantitative Trading class?; Wednesday, April 11, 2012 at 1:38:00 PM EDT
Ernie Chan said...: Anon,
It is up to the organizer of the workshops, Technical Analyst magazine. If you are interested, please request a New York or Chicago workshop at training@technicalanalyst.co.uk
Ernie; Wednesday, April 11, 2012 at 5:29:00 PM EDT
Anonymous said...: Ernie,

I am trying to use Matlab's HMM function to do some simple modeling. I am still trying to understand how to use all the functions to make the prediction. Say I have a time series of daily return, I change it to Up, Flat or Down (1, 0, -1) as my observation. Say I have a simple 2 states model. Now I can put the entire observation series along with some initial guess values for the Emission Probability and Transition Probability to estimate the Transition and Emission Probability matrix.

[TRANS_EST2, EMIS_EST2] = hmmtrain(seq, TRANS_GUESS, EMIS_GUESS)

Now, with these two matrix, what do you do to create the new prediction?

Do you just run [seq,states] = hmmgenerate(1,TRANS,EMIS) to generate 1 number which is your next observation sequence and call it your prediction?

Thanks; Friday, May 24, 2013 at 8:58:00 AM EDT
Ernie Chan said...: Anon,
I am not familiar with the specific Matlab function that you use (I use a free package instead), but generally speaking, yes, if you want to predict the next measurement variable, that's what you do. In other applications, traders are more interested in the state variable (e.g. a hedge ratio, which is not directly observable and thus "hidden"), and the state variable prediction would be the focus.
Ernie; Friday, May 24, 2013 at 10:02:00 AM EDT
Anonymous said...: Thanks Ernie. Those functions are provided by Matlab Statistics toolbox. There are five functions available there.

*

hmmgenerate — Generates a sequence of states and emissions from a Markov model
*

hmmestimate — Calculates maximum likelihood estimates of transition and emission probabilities from a sequence of emissions and a known sequence of states
*

hmmtrain — Calculates maximum likelihood estimates of transition and emission probabilities from a sequence of emissions
*

hmmviterbi — Calculates the most probable state path for a hidden Markov model
*

hmmdecode — Calculates the posterior state probabilities of a sequence of emissions

Regarding your comment on Predicting the State Variables, the reality is that we have no idea what are the states and how many of them should that be? so do people just assume some arbitrary states "Sunny, Rainy, Cloudy" or ie (RiskOn, RiskOff, RiskNeutral) type scenario.

For me to get the most likely states, I need to use the Viterbi function.

likelystates = hmmviterbi(seq, TRANS, EMIS).

But I will need to first find out those TRANS, EMIS probability matrix given our own seq. of observations.

[TRANS_EST2, EMIS_EST2] = hmmtrain(seq, TRANS_GUESS, EMIS_GUESS)

After all, it sounds like there will be quite a bit of estimating guessing work here. You estimate the probability matrix, and use the estimated probability matrix to deduce your states.

After all these hardwork, what you can find is a bunch of State numbers which they call it "Most Likely" state given "What had happened"?

Question is how do we use it NOW for the future prediction?

Am I missing something here?; Monday, May 27, 2013 at 12:41:00 AM EDT
Ernie Chan said...: Anon,
To determine what a state variable should be, often you need some domain knowledge. I.e., you need more than HMM to constrain your model. A good example is given in Chapter 3 of my new book, which illustrates the use of HMM in finding the hedge ratio of a cointegrating pair of ETFs. The state variable chosen in this case is not arbitrary at all. Also, in this case, the objective is not in predicting the next measurement, though you can choose to do so.

Ernie; Monday, May 27, 2013 at 7:39:00 AM EDT
Anonymous said...: Hi Ernie,

I think this paper from Jerry Hong is worth reading for you, very interesting (on HMM and SVM) : http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-63.pdf

Laurent; Saturday, February 15, 2014 at 3:16:00 AM EST
Ernie Chan said...: Hi Laurent,
I have actually read this paper before. In fact, some collaborators and I have tried to replicate and extend the results to more stocks. The effort was a failure, and reinforced my opinion that machine learning techniques that directly learn rules are unsuitable for trading.
Ernie; Saturday, February 15, 2014 at 10:57:00 AM EST
Anonymous said...: This is interesting. I implemented my version of the markov model and backtests gave me results of an average of 66% win rate on an hourly trading period over a cumulative trading period of 5 years. I then applied a ppmc method to these results and the win rate went up to an average of 83%. In terms of actual trading I've been trading for 7 months now and the average win ratio is 69% using both methods. It gets better with time and similarly adapts to changing market conditions so I'm confident in it. Anyways just saying that it is possible to do this thing.; Saturday, September 20, 2014 at 5:54:00 AM EDT
Ernie Chan said...: Thanks for your report of success with the HMM model!
By PPMC, do you mean particle filter Monte Carlo?

Ernie; Saturday, September 20, 2014 at 8:05:00 AM EDT
Anonymous said...: Hi Ernie,
You mentioned in your book that You used "Buy on gap" strategy in live trading.
How do you handle a case where there are no trades/quotes for one or more instruments during pre opening session?
Analyzing historical data, this case is sometimes true. An another problem occurs when there are trades/quotes but they are too old, for instance timestamp is equal 08:55 Am.

I'll be grateful for the help; Sunday, November 23, 2014 at 7:54:00 AM EST
Anonymous said...: Hi Ernie,
You mentioned in your book that You used "Buy on gap" strategy in live trading.
How do you handle a case where there are no trades/quotes for one or more instruments during pre opening session?
Analyzing historical data, this case is sometimes true. An another problem occurs when there are trades/quotes but they are too old, for instance timestamp is equal 08:55 Am.

I'll be grateful for the help; Sunday, November 23, 2014 at 7:55:00 AM EST
Ernie Chan said...: All intraday backtesting should be done with quotes instead of trades.

Quotes are always present at 9:30am.

Ernie; Sunday, November 23, 2014 at 7:56:00 AM EST
Stefan said...: Well, once the subject/research directly relates to money making opportunity it is totally pointless to expect any kind of useful feedback/contribution: fools contribute, smarts make money...
If someone has a working idea it's a very simple to validate - make money; the alternative would be to contribute and to have a lot of nice talk...; Friday, January 9, 2015 at 2:47:00 PM EST
Richard Harrison said...: Hi Ernie,

Regarding the discussion of the "ppmc" in connection to markov models above, did you ever do any more research into this and/or figure out exactly what ppmc means in the realm of markov models? A few things I've found googling are:
posterior predictive model checking,
particle photon monte carlo,
parallel processing of markov chain,
prediction by partial match method-C,
pairwise partically markov chains, and finally
pearson product moment correlation

Thanks,

Richard; Wednesday, August 11, 2021 at 12:55:00 PM EDT
Ernie Chan said...: Hi Richard,
I am waiting for the commenter to clarify what exactly s/he meant by PPMC. My guess is that s/he meant PFMC (particle filter monte carlo) which is a well-established technqiue.
Ernie; Monday, August 16, 2021 at 4:50:00 PM EDT