Saturday, March 03, 2012

Hidden Markov model applied to FX prediction

I read with interest an older paper "Can Markov Switching Models Predict Excess Foreign Exchange Returns?" by Dueker and Neely of the Federal Reserve Bank of St. Louis. I have a fondness for hidden Markov models because of its great success in speech recognition applications, but I confess that I have never been able to create a HMM model that outperforms simple technical indicators. I blame that both on my own lack of creativity as well as the fact that HMM tend to have too many parameters that need to be fitted to historical data, which makes it vulnerable to data snooping bias. Hence I approached this paper with the great hope that experts can teach me how to apply HMM properly to finance.

The objective of the model is simple: to predict the excess return of an exchange rate over an 8-day period. (Excess return in this context is measured by the % change in the exchange rate minus the interest rate differential  between the base and quote currencies of the currency pair.) If the expected excess return is higher than a threshold (called "filter" in the paper), then go long. If it is lower than another threshold, go short. Even though the prediction is on a 8-day return, the trading decision is made daily.

The excess return is assumed to have a 3-parameter student-t distribution. The 3 parameters are the mean, the degree of freedom, and the scale. The scale parameter (which controls the variance) can switch between a high and low value based on a Markov model. The degree of freedom (which controls the kurtosis, a.k.a. "thickness of the tails") can also switch between 2 values based on another Markov model. The mean is linearly dependent on the values assumed by the degree of freedom and the scale as well as another Markov variable that switches between 2 values. Hence the mean can assume 8 distinct values. The 3 Markov models are independent. The student-t distribution is more appropriate for the modelling financial returns than normal distribution because of the allowance for heavy tails. The authors also believe that this model captures the switch between periods of high and low volatility, with the consequent change of preference (=different mean returns) for "safe" versus "risky" currencies, a phenomenon well-demonstrated in the period between August 2011 to January 2012.

The parameters of the Markov models and the student-t distributions are estimated in the in-sample period (1974-1981) for each currency pair in order to minimize the cumulative deviation of the excess returns from zero. There are a total of 14 parameters to be so estimated. After these estimations, we have to also estimate the 2 trading thresholds by maximizing the in-sample return of the trading strategy, assuming a transaction costs of 10 basis point per trade.

With this large number (16 in total) of parameters, I dread to see the out-of-sample (1982-2005) results. Amazing, these are far better than I expected: the annualized returns range from 1.1% to 7.5% for 4 major currency pairs. The Sharpe ratios are not as impressive: they range from 0.11 to 0.71. Of course, when  researchers report out-of-sample results, one should take that with a grain of salt. If the out-of-sample results weren't good, they wouldn't be reporting them, and they would have kept changing the underlying model until good "out-of-sample" results are obtained! So it is really up to us to implement this model, apply it to data after 2005 and to more currency pairs, to find out if there is really something here. In fact, this is the reason why I prefer to read older papers - to allow for the possibility of true out-of-sample tests immediately.

What do you think can be done to improve this model? I suspect that as a first step, one can see whether the estimated Markov states correspond reasonably to what traders think of as risk-on vs risk-off regimes. If they do, then regardless of the usage of this model as a signal generator, it can at least generate good risk indicators. If not, then maybe the hidden Markov model need to be replaced with a Markov model that is conditioned on observable indicators.

36 comments:

  1. Ernie,

    You've got a typo in the title of the paper. The word "reserves" should be replaced with *returns*. Man, I was really confused when I saw the title of that you wrote! I was thinking, "why on earth would anybody care about predicting excess foreign exchange reserves"?

    - aagold

    ReplyDelete
  2. Ernie,

    Your comment about "out of sample tests" in research papers not really being so out-of-sample is spot on! I don't think many people understand the issue you raised, and I think it's a very important point.

    - aagold

    ReplyDelete
  3. aagold,
    Thanks for pointing that out ... actually, the typo was in the original preprint, which is why I copied it!
    Ernie

    ReplyDelete
  4. Ernie,
    Not to question your quant capabilities but are you seriously suggesting a model with that many parameters to fit to has any applicability to trading? I say this as quant trader with over 14 years of industry experience and running my own mid to hft firm. To me this paper is absolute nonesense and the mentioned Sharpe ratios are way too low even in their own "out of sample" backtests to justify taking such paper seriously.

    ReplyDelete
  5. AsiaProp,
    Actually, the 16 parameters are not as many as they sound. 14 of those are for fitting the time series itself: they are independent of the trading strategy. Only 2 of the parameters are used to optimize the strategy return.

    The Sharpe ratios reported in academic research are almost always low. If they are high, they won't be published. Our job as traders is to take those research as inspiration and tweak them into practical strategies.

    Ernie

    ReplyDelete
  6. Ernie,

    Thanks again for all your hard work. On top of your blog and book, I gain great insight just reading through your conversations with other commentors on your site.

    In a previous comment thread from the other day you mentioned that a large portion of your returns in 2011 came from mean-reversion strategies in the FX market. I was wondering if you employ any type of regime switching model in your FX trading to determine whether you want to be allocated primarily to your momentum or mean-reversion strategies?

    - Zack

    ReplyDelete
  7. Zack,
    No, I didn't use any regime switching models. I have never found that these models work out-of-sample.
    Ernie

    ReplyDelete
  8. Hi Erine

    Did you read this paper before, any comment?

    liu.diva-portal.org/smash/get/diva2:17431/FULLTEXT01

    ReplyDelete
  9. Hi Anon,
    No, I haven't seen that paper, but will put that on my reading list!

    Also, Chris Neely, the author of the paper I described, mentioned to me this other relevant paper:

    http://research.stlouisfed.org/wp/more/2006-046/

    and his website:

    http://research.stlouisfed.org/econ/cneely/

    Ernie

    ReplyDelete
  10. Just speaking from an academic perspective, instead of the plain HMM perhaps something like the Maximum Entropy Hidden Markov Model may work better?

    ReplyDelete
  11. This comment has been removed by the author.

    ReplyDelete
  12. Dave,
    Why do you think maximum entropy HMM will work better? It seems to be just another method to estimate the parameters.
    Ernie

    ReplyDelete
  13. I have no empirical evidence and financial prediction isn't really my area of expertise. It is just that in my few attempts at using machine learning for financial predictions, I learnt that the amount of noise tends to swamp out any trends the market may have. As a result most learners tend to perform really poorly, quite possibly due to over-fitting to the training data.

    So one of my ideas is to use techniques like Maximum Entropy to reduce the degree of over-fitting. However, I have not actually tried this out.

    ReplyDelete
  14. Hi ernie:
    I am currently reading your book called "quantitative trading", and already programmed and tried MATHLAB for backtesting. However, the results differs from MetaTrader Strategy tester/Optimization.

    In MT4, I have hundreds of passes which agree with most of my real trades (thankfully) but the latter is not as positive. I use the same dataset, which I track from 2001-2009.

    The main reason why MATHLAB is that i wish to employ Sharpe Ratio. Usually, in MT4, choosing my parameters is fairly easy, straightforward. I choose the ones with minimal drawdown + best returns, and then run separate copies of them.

    After reading your book, I was thinking of choosing parameters with:
    1) Minimal drawdown
    2) Best returns

    and add a third criteria, Sharpe Ratio. This way, I feel I can increase my returns, no? The formula looks complicated but nonetheless, its no harm trying. What do you think? And thanks!

    ReplyDelete
  15. Hi Anon,
    When you said the results from Matlab differs from Metatrader, can you be more specific? Are you sure that the logic in the 2 programs are identical?

    You can employ Sharpe ratio in any programs you choose, not necessarily Matlab. It is just mean return divided by standard deviation.
    Ernie

    ReplyDelete
  16. I also thought that the Sharpe ratio could still be employed in any program. Is it really just limited to Mathlab?

    ReplyDelete
  17. Ernie Chan said...
    Hi Anon,
    When you said the results from Matlab differs from Metatrader, can you be more specific? Are you sure that the logic in the 2 programs are identical?

    -------------

    Yes, Im very sure they are.

    Ok, i be more specific. My strategy is extremely simple, but profitable (at least for me) - just 2 lines of logic, 2 integer parameters. I cant see how or why such simple logic differs greatly, between the two.

    The difference is that in MT4 I get hundreds passes, but in MATHLAB, I only get around 50 passes. In MATHLAB, one of the 1 year test pass return a balance of 200+K from initial capital of 10K, but in MT4, the balances is within range 50K-100K, for all the passes.

    One more thing, in MT4, time of the bars are considered within the tester. I dont need to re-program anything. But in MATHLAB, I have to separate this data set. Maybe thats why the difference?

    Thx again for your kind help.

    Rgds
    Ruthstein

    ReplyDelete
  18. Hi Ruthstein,
    Yes, it is likely that errors in data preparation is what caused the differences. In Metatrader, data is installed as part of the program. But Matlab is a general computing platform, much like a calculator. You have to be very careful in preparing data for input into Matlab.
    Ernie

    ReplyDelete
  19. Hi ernie, thank you very much for your comments. someone help me out with his plug-in for the time part and there was a a very slight error in the time preparation in MATHLAB. Still, the results remain inconsistent. But surprisingly now, the Sharpe Ratio is almost the same value for the top 5 minimal drawdown passes! but not in terms of profits, though.

    On the bright side, this makes choices way easier than before, since I just decide in terms of safest drawdown, since the sharpe ratio for all them are pretty acceptable.

    Again, thanks for your kind help and I must say, your book is a good read... I will have no doubt that I buy again your next book!

    ReplyDelete
  20. Hi Ruthstein,
    I am glad you found a bug. If the programming logic are the same in Matlab and MT, then the only reason results can be different is the input data is wrong.
    Ernie

    ReplyDelete
  21. Ernies,
    when do you come to USA to teach Quantitative Trading class?

    ReplyDelete
  22. Anon,
    It is up to the organizer of the workshops, Technical Analyst magazine. If you are interested, please request a New York or Chicago workshop at training@technicalanalyst.co.uk
    Ernie

    ReplyDelete
  23. Ernie,

    I am trying to use Matlab's HMM function to do some simple modeling. I am still trying to understand how to use all the functions to make the prediction. Say I have a time series of daily return, I change it to Up, Flat or Down (1, 0, -1) as my observation. Say I have a simple 2 states model. Now I can put the entire observation series along with some initial guess values for the Emission Probability and Transition Probability to estimate the Transition and Emission Probability matrix.

    [TRANS_EST2, EMIS_EST2] = hmmtrain(seq, TRANS_GUESS, EMIS_GUESS)

    Now, with these two matrix, what do you do to create the new prediction?

    Do you just run [seq,states] = hmmgenerate(1,TRANS,EMIS) to generate 1 number which is your next observation sequence and call it your prediction?

    Thanks

    ReplyDelete
  24. Anon,
    I am not familiar with the specific Matlab function that you use (I use a free package instead), but generally speaking, yes, if you want to predict the next measurement variable, that's what you do. In other applications, traders are more interested in the state variable (e.g. a hedge ratio, which is not directly observable and thus "hidden"), and the state variable prediction would be the focus.
    Ernie

    ReplyDelete
  25. Thanks Ernie. Those functions are provided by Matlab Statistics toolbox. There are five functions available there.

    *

    hmmgenerate — Generates a sequence of states and emissions from a Markov model
    *

    hmmestimate — Calculates maximum likelihood estimates of transition and emission probabilities from a sequence of emissions and a known sequence of states
    *

    hmmtrain — Calculates maximum likelihood estimates of transition and emission probabilities from a sequence of emissions
    *

    hmmviterbi — Calculates the most probable state path for a hidden Markov model
    *

    hmmdecode — Calculates the posterior state probabilities of a sequence of emissions


    Regarding your comment on Predicting the State Variables, the reality is that we have no idea what are the states and how many of them should that be? so do people just assume some arbitrary states "Sunny, Rainy, Cloudy" or ie (RiskOn, RiskOff, RiskNeutral) type scenario.

    For me to get the most likely states, I need to use the Viterbi function.

    likelystates = hmmviterbi(seq, TRANS, EMIS).

    But I will need to first find out those TRANS, EMIS probability matrix given our own seq. of observations.

    [TRANS_EST2, EMIS_EST2] = hmmtrain(seq, TRANS_GUESS, EMIS_GUESS)

    After all, it sounds like there will be quite a bit of estimating guessing work here. You estimate the probability matrix, and use the estimated probability matrix to deduce your states.

    After all these hardwork, what you can find is a bunch of State numbers which they call it "Most Likely" state given "What had happened"?

    Question is how do we use it NOW for the future prediction?

    Am I missing something here?

    ReplyDelete
  26. Anon,
    To determine what a state variable should be, often you need some domain knowledge. I.e., you need more than HMM to constrain your model. A good example is given in Chapter 3 of my new book, which illustrates the use of HMM in finding the hedge ratio of a cointegrating pair of ETFs. The state variable chosen in this case is not arbitrary at all. Also, in this case, the objective is not in predicting the next measurement, though you can choose to do so.

    Ernie

    ReplyDelete
  27. Hi Ernie,

    I think this paper from Jerry Hong is worth reading for you, very interesting (on HMM and SVM) : http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-63.pdf

    Laurent

    ReplyDelete
  28. Hi Laurent,
    I have actually read this paper before. In fact, some collaborators and I have tried to replicate and extend the results to more stocks. The effort was a failure, and reinforced my opinion that machine learning techniques that directly learn rules are unsuitable for trading.
    Ernie

    ReplyDelete
  29. This is interesting. I implemented my version of the markov model and backtests gave me results of an average of 66% win rate on an hourly trading period over a cumulative trading period of 5 years. I then applied a ppmc method to these results and the win rate went up to an average of 83%. In terms of actual trading I've been trading for 7 months now and the average win ratio is 69% using both methods. It gets better with time and similarly adapts to changing market conditions so I'm confident in it. Anyways just saying that it is possible to do this thing.

    ReplyDelete
  30. Thanks for your report of success with the HMM model!
    By PPMC, do you mean particle filter Monte Carlo?

    Ernie

    ReplyDelete
  31. Hi Ernie,
    You mentioned in your book that You used "Buy on gap" strategy in live trading.
    How do you handle a case where there are no trades/quotes for one or more instruments during pre opening session?
    Analyzing historical data, this case is sometimes true. An another problem occurs when there are trades/quotes but they are too old, for instance timestamp is equal 08:55 Am.


    I'll be grateful for the help

    ReplyDelete
  32. Hi Ernie,
    You mentioned in your book that You used "Buy on gap" strategy in live trading.
    How do you handle a case where there are no trades/quotes for one or more instruments during pre opening session?
    Analyzing historical data, this case is sometimes true. An another problem occurs when there are trades/quotes but they are too old, for instance timestamp is equal 08:55 Am.


    I'll be grateful for the help

    ReplyDelete
  33. All intraday backtesting should be done with quotes instead of trades.

    Quotes are always present at 9:30am.

    Ernie

    ReplyDelete
  34. Well, once the subject/research directly relates to money making opportunity it is totally pointless to expect any kind of useful feedback/contribution: fools contribute, smarts make money...
    If someone has a working idea it's a very simple to validate - make money; the alternative would be to contribute and to have a lot of nice talk...

    ReplyDelete
  35. Hi Ernie,

    Regarding the discussion of the "ppmc" in connection to markov models above, did you ever do any more research into this and/or figure out exactly what ppmc means in the realm of markov models? A few things I've found googling are:
    posterior predictive model checking,
    particle photon monte carlo,
    parallel processing of markov chain,
    prediction by partial match method-C,
    pairwise partically markov chains, and finally
    pearson product moment correlation


    Thanks,

    Richard

    ReplyDelete
  36. Hi Richard,
    I am waiting for the commenter to clarify what exactly s/he meant by PPMC. My guess is that s/he meant PFMC (particle filter monte carlo) which is a well-established technqiue.
    Ernie

    ReplyDelete