Friday, November 17, 2017

Optimizing trading strategies without overfitting

By Ernest Chan and Ray Ng

===

Optimizing the parameters of a trading strategy via backtesting has one major problem: there are typically not enough historical trades to achieve statistical significance. Whatever optimal parameters one found are likely to suffer from data snooping bias, and there may be nothing optimal about them in the out-of-sample period. That's why parameter optimization of trading strategies often adds no value. On the other hand, optimizing the parameters of a time series model (such as a maximum likelihood fit to an autoregressive or GARCH model) is more robust, since the input data are prices, not trades, and we have plenty of prices. Fortunately, it turns out that there are clever ways to take advantage of the ease of optimizing time series models in order to optimize parameters of a trading strategy.

One elegant way to optimize a trading strategy is to utilize the methods of stochastic optimal control theory - elegant, that is, if you are mathematically sophisticated and able to analytically solve the Hamilton-Jacobi-Bellman (HJB) equation (see Cartea et al.) Even then, this will only work when the underlying time series is a well-known one, such as the continuous Ornstein-Uhlenbeck (OU) process that underlies all mean reverting price series. This OU process is neatly represented by a stochastic differential equation. Furthermore, the HJB equations can typically be solved exactly only if the objective function is of a simple form, such as a linear function. If your price series happens to be neatly represented by an OU process, and your objective is profit maximization which happens to be a linear function of the price series, then stochastic optimal control theory will give you the analytically optimal trading strategy: with exact entry and exit thresholds given as functions of the parameters of the OU process. There is no more need to find such optimal thresholds by trial and error during a tedious backtest process, a process that invites overfitting to sparse number of trades. As we indicated above, the parameters of the OU process can be fitted quite robustly to prices, and in fact there is an analytical maximum likelihood solution to this fit given in Leung et. al.

But what if you want something more sophisticated than the OU process to model your price series or require a more sophisticated objective function? What if, for example, you want to include a GARCH model to deal with time-varying volatility and optimize the Sharpe ratio instead? In many such cases, there is no representation as a continuous stochastic differential equation, and thus there is no HJB equation to solve. Fortunately, there is still a way to optimize without overfitting.

In many optimization problems, when an analytical optimal solution does not exist, one often turns to simulations. Examples of such methods include simulated annealing and Markov Chain Monte Carlo (MCMC). Here we shall do the same: if we couldn't find an analytical solution to our optimal trading strategy, but could fit our underlying price series quite well to a standard discrete time series model such as ARMA, then we can simply simulate many instances of the underlying price series. We shall backtest our trading strategy on each instance of the simulated price series, and find the best trading parameters that most frequently generate the highest Sharpe ratio. This process is much more robust than applying a backtest to the real time series, because there is only one real price series, but we can
we can simulate as many price series (all following the same ARMA process) as we want. That means we can simulate as many trades as we want and obtain optimal trading parameters with as high a precision as we like. This is almost as good as an analytical solution. (See flow chart below that illustrates this procedure - click to enlarge.)

Optimizing a trading strategy using simulated time series

Here is a somewhat trivial example of this procedure. We want to find an optimal strategy that trades  AUDCAD on an hourly basis. First, we fit a AR(1)+GARCH(1,1) model to the data using log midprices. The maximum likelihood fit is done using a one-year moving window of historical prices, and the model is refitted every month. We use MATLAB's Econometrics Toolbox for this fit. Once the sequence of monthly models are found, we can use them to predict both the log midprice at the end of the hourly bars, as well as the expected variance of log returns. So a simple trading strategy can be tested: if the expected log return in the next bar is higher than K times the expected volatility (square root of variance) of log returns, buy AUDCAD and hold for one bar, and vice versa for shorts. But what is the optimal K?

Following the procedure outlined above, each time after we fitted a new AR(1)+GARCH(1, 1) model, we use this to simulate the log prices for the next month's worth of hourly bars. In fact, we simulate this 1,000 times, generating 1,000 time series, each with the same number of hourly bars in a month. Then we simply iterate through all reasonable value of K and remember which K generates the highest Sharpe ratio for each simulated time series. We pick the K that most often results in the best Sharpe ratio among the 1,000 simulated time series (i.e. we pick the mode of the distribution of optimal K's across the simulated series). This is the sequence of K's (one for each month) that we use for our final backtest. Below is a sample distribution of optimal K's for a particular month, and the corresponding distribution of Sharpe ratios:

Histogram of optimal K and corresponding Sharpe ratio for 1,000 simulated price series

Interestingly, the mode of the optimal K is 0 for any month. That certainly makes for a simple trading strategy: just buy whenever the expected log return is positive, and vice versa for shorts. The CAGR is about 4.5% assuming zero transaction costs and midprice executions. Here is the cumulative returns curve:


You may exclaim: "This can't be optimal, because I am able to trade AUDCAD hourly bars with much better returns and Sharpe ratio!" Of course, optimal in this case only means optimal within a certain universe of strategies, and assuming an underlying AR(1)+GARCH(1, 1) price series model. Our universe of strategies is a pretty simplistic one: just buy or sell based on whether the expected return exceeds a multiple of the expected volatility. But this procedure can be extended to whatever price series model you assume, and whatever universe of strategies you can come up with. In every case, it greatly reduces the chance of overfitting.

P.S. we invented this procedure for our own use a few months ago, borrowing similar ideas from Dr. Ng’s computational research in condensed matter physics systems (see Ng et al here or here). But later on, we found that a similar procedure has already been described in a paper by Carr et al

===

About the authors: Ernest Chan is the managing member of QTS Capital Management, LLC. Ray Ng is a quantitative strategist at QTS. He received his Ph.D. in theoretical condensed matter physics from McMaster University. 

===

Upcoming Workshops by Dr. Ernie Chan

November 18 and December 2:  Cryptocurrency Trading with Python

I will be moderating this online workshop for Nick Kirk, a noted cryptocurrency trader and fund manager, who taught this widely acclaimed course here and at CQF in London.

February 24 and March 3: Algorithmic Options Strategies

This online course focuses on backtesting intraday and portfolio option strategies. No pesky options pricing theories will be discussed, as the emphasis is on arbitrage trading.



33 comments:

Anonymous said...

Could you include the Matlab for this post?

John said...

Interesting post. I see this as basically the same thing as portfolio resampling, but applied to trading instead of portfolio optimization.

Ernie Chan said...

Anon,
You are welcome to email me at ernest@epchan.com for source codes.
Ernie

Ernie Chan said...

John,
It isn't really resampling, since resampling means we use real historical data to generate more historical data. Here, we merely use the model that describes the historical data to generate more historical data.
Ernie

Jonathan Shore said...

Very nice idea. Overfitting is indeed a big problem for strategy development. A potential issue in using this is how well one can model he underlying price / volume processes. Depending on what one's signal dependent on, the process may either not express the pattern or may have a different outcome then realized on average in the market.

One thing I have done in creating more data for optimizing equity strategies has been to do the following:

- normalize equity bars (by mean/ sd)
- cluster equities into groups by some similarity measures
- within each group, evaluate signal on the combined histories of equities in the group

The MC / model-based approach is very appealing however, as one can generate even larger amounts of data. I'll have to give this approach a shot with equities and see how it does.

Ernie Chan said...

Hi Jonathan,
Yes, you pointed out some very valid limitations on this approach.
Thanks for describing how you approached the problem with equities strategies! It does make sense in that context.
Ernie

John said...

@Ernie Perhaps it's more similar than you think. In Michaud resampling, you are estimating a model. Implicitly you are assuming the assets follow random walks with multivariate normal error (parameters mu and sigma as mean and covariance). Then you resample more mus and sigmas, optimize a portfolio for each, and then average the final portfolio weights.

So if you took the model, resampled to get new parameters to the model, then sample a path of asset prices, you could calculate the mean and covariance at the end and input this into the optimizer. The optimizer is like whatever you would use to set up a trading strategy.

I suppose the difference is if you are using one version of the model, whereas the resampling is like many versions of the model. So it's like you have some posterior distribution over mu and sigma.

Ernie Chan said...

@John, I can see the similarity now - thanks.

Ernie

Michael Harris said...

Very interesting article Ernie. I have a few questions and comments.

1. Will there be cases where the optimal parameters will result in negative performance in actual price series? Would you then trade the strategy regardless or reject it?

2. Are all generated price series equally probable in the real world? If not, will that result in high Type-I error? .

3. The max DD in your example is about 50% or even more. I realize this is an example. Will this method be effective if minimization of DD is desired?

4. I have become skeptical of any generalizations against over-fitted strategies in recent years after discovering some simple ones that worked for several decades very well. If you find a chance, I have examples in my paper. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2810170

Thanks

Ernie Chan said...

Hi Michael,
1) Theoretically, that is possible, but that often indicates that the time series model isn't a good fit to the underlying prices. If we ascertain that it is a good fit and it still gives bad backtest performance, we would reject the strategy. In practice, it hasn't happened yet.

2) The simulated price series are sampled according to their likelihood of occurring in reality. Hence a unweighted average of their performance is a good estimate of their actual performance in reality.

3) Yes, you can certainly choose to reduce DD instead of maximize Sharpe. A better metric may be the Calmar ratio which maximizes ratio of returns vs. DD.

4) Thanks for the link! I also agree that simple strategies that work are the best, and parameter optimization is best avoided if possible.

Ernie

Unknown said...

Ernie

Good stuff. I have been reviewing many volatility strategies recently and over fitting has been a common theme in the blogosphere. For that reason, I am very conscious moving forward with regards to optimization! Thanks for sharing the papers and your thoughts.

Ernie Chan said...

Good to know - thanks Andrew!

Roge Klay said...

Hi Ernie,

As usual, very interesting article, thank you. Did you think about a Bayesian approach in this context? Given the highly uncertain nature of financial markets, I think Bayesian is really the way to go, except maybe if you have tons of data.

First without talking about the underlying model, instead of picking the mode of K what would happen if you pick the expectation of K? This a nice way to avoid parameter optimization and express the result as a pseudo-Bayesian. I believe that it could add some robustness. In a full Bayesian framework, you could add parameter uncertainty in the model, or even use model combination.

L



Ernie Chan said...

Hi Laurent,
Interesting suggestion - yes, Bayesian approach makes sense.
Thanks,
Ernie

SR said...

Ernie, thanks for your writing. Big fan of your books. Some background: I am no quant PhD, and certainly no hedge fund guy. I am just an engineer with decent programming skill.

I've found a few profitable strategies inspired by your books and confirmed using paper trading and backtesting. Can you help me intuit how your systems work?

- What are the market forces making these systems work?
- Is this effectively taking money from less sophisticated traders? Or, am I riding the wave with smart money who just don't mind taking smaller losses which are actually quite meaningful for retail traders?
- Why haven't hedge funds/smarter people arbitraged or sucked the alpha out of my system?
- From your books, it seems it might be related to 'capacity' but I am still not sure why it is actually working.

Thanks in advance.

Ernie Chan said...

Hi SR,
Thanks for your kind words on my books, and great to hear you have found some good strategies!

The market forces at work are typically either temporary liquidity demands leading to mean reversion, or event / fundamental events leading to trends.

Yes, often arbitrage strategies are making money from less sophisticated (or slower, or less informed, or more emotional) traders. But in other times, it is merely the rewards of providing liquidity.

If the capacity of your strategies are small, hedge funds are not interested. Capacity is proportional to the total amount of money you can make from this opportunity. Individuals may not have your sophistication to find these opportunities. Even if they have found them, they may not have enough capital to arbitrage away all profits.

Without knowing more about your strategies, these are of course just general observations.

Ernie

SR said...

@Ernie - thanks for your comments. Above all, I am very surprised by the simplicity of some of the strategies implemented (most are momentum based, with modifications of your examples) - and how they are managing to stay consistently in the green even in ranging markets. This is what made me question the actual economics of the trade and where the money comes from. I trade spot FX mainly, and am curious to see how long I'll be able to hold this edge. Guess I'll know/report back in a year or so!

Aym said...

Hi Ernie,

Thanks for your usefull readings and as SR I'm a big fan of your work. Your method is very interesting for a mono-asset strategy. In my case, I only build and use multi-assets strategy. To avoid overfitting, I already use walkforward optimization, and bootstrap methods for resampling and generating "new time series" with the closest properties as possible in order to keep relationship between the time series. But the work is hard and often the time series properties are altered.

Have you ever use your method for a multi-asset strategy?

Thank you Ernie.

Ernie Chan said...

Hi Aym,
I have in fact discussed a bootstrapping procedure similar to what you described in my QuantCon 2017 talk: https://twitter.com/quantopian/status/955545871348895746. That is for a multiasset strategy, and certainly improved results.
Ernie

Unknown said...

Hi Ernest,

Thank you for the informative post.

Can we treat price data from other markets as out of sample data?

For example if I am back testing EUR USD, can I use data from GBP JPY or XAU USD as a out of sample?

Ernie Chan said...

Hi Alex,
To some extent, they are OOS. However, many currencies are correlated to some degree, so it isn't ideal OOS testing if they are contemporaneous.
Ernie

Unknown said...

Thank you for your answer.

If we would have markets which are non-correlated, then you would accept them as OOS?

Let's say for same EUR USD example I would take 10y Note and Copper.

Idea came from simple public domain trend following models which are painfully simple, but work on many markets for decades. They just treat every market as OOS data....

Ernie Chan said...

Hi Alex,
Even when markets are not correlated (i.e. no first order correlation), they can still have higher order dependence (often called tail dependence). It is very hard to find contemporaneous markets that do not have tail dependence.

I would not accept contemporaneous data as true OOS testing. However, they are OK for a quick and dirty backtest.

Ernie

Klaus said...

Hi Ernest,

one of the first commentators asked you for the code and you asked him to send you an email. I also send you an email asking for the code but didn't get a reply so far.

May I ask you to send me the code, too?

You will find my inquiry in your emails.

Thanks in advance,

Klaus.

Ernie Chan said...

Hi Klaus,
Actually, we are going to publish a new, related, post this weekend, and the simulation code for that will be in Python, and is posted on Github. Please stay tuned!

(On the other hand, I still haven't received your email, not that of anon, for this request.)
Ernie

Klaus said...

Hi Ernie,

thanks for your reply and the hint to the new post.

I could imagine that these emails landed in your spam folder? Anyhow, I sent them to ernest@epchan.com .

Bye,

Klaus

Ernie Chan said...

Hi Klaus,
Yes, that's the correct email, and I did check the spam box, but there was no email from Klaus. You can see that we published a blog post on the same topic using GAN to simulate time series, and the source code is available on GitHub.
Cheers,
Ernie

Unknown said...

Hi Ernie

The majority of your recent posts have addressed metalabelling and PredictNow. I'm wondering what role overfitting plays in metalabelling, and whether simulated data backtests of the kind you describe in this post still form part of the trader's toolbox.

Suppose you have a simple trading strategy, like a moving average crossover, and that you can vary its parameters to realise different trade characteristics and performance statistics. If the set of all possible parameter combinations, S, is sufficiently large, there's a high likelihood that a significant number of the parameter combinations are overfitted.

You could obviously subject each of those parameter combinations to thousands of backtests on simulated data in order to filter out those that are likely overfitted, retaining all combinations that meet some minimum profitability threshold on average. Alternatively, you may instead train a metalabelling classifier on all the trades generated under each of those parameter combinations, using (among other things) the specific parameters as input variables to the classifier. This would yield a classifier that predicts when trades are likely to be profitable, presumably assigning a low probability to those parameter combinations that are overfitted.

Would simulated data backtests still play a role in this scenario? If the metalabelling classifier is already doing most of the heavy lifting in determining which parameter combinations are overfitted, is there any need to run thousands of backtests on simulated data for the sake of filtration?

Kind regards,

Jeff

Ernie Chan said...

Hi Jeff,
Actually, much of the advance of AI/ML in the last couple of decades was due to overcoming overfitting by various means (regularization, cross validation, CPCV, dropout, ensemble methods, feature selection, etc.) So it is a much more studied and researched topic than the typical trading strategy development procedure.

Certainly simulation is one major method in avoiding overfitting. Simulation, combined with our Conditional Parameter Optimization (CPO) algorithm that I described in https://www.predictnow.ai/blog/conditional-parameter-optimization-adapting-parameters-to-changing-market-regimes/ is similar to what you suggest above. Without simulations, there may not be enough data to train the CPO algorithm.

Best,
Ernie

Anonymous said...

Hello sir Chan,

Thank you for the knowledge you share.

Regarding your last posted comment, can you please explain how simulation can help add data for CPO training?

I believe market features that are needed in CPO learning cannot be simulated since they are inherently tied to the real non-simulated price data.

In my understanding, I would use simulation as first step to identify best-non-overfitted trading parameters, and then use those parameters set during CPO procedure. But I can't think of a way to use simulation to add data in the CPO machine learning dataset.

Kind regards,
Clément

Ernie Chan said...

Hi Clement,
For CPO, the only input needed is the returns series of individual components of a portfolio. They should not be simulated returns - they should be backtest or actual historical returns.
Ernie

Anonymous said...

Thank you for your feedback.

Could those returns come from simulated backtests, in case of a strategy (not a portfolio) ?

Kind regards,
Clément

Ernie Chan said...

Hi Clement,
If you are referring to Conditional Parameter Optimization instead of Portfolio Optimization, then it can be applied to a single strategy. Obviously it has to be a backtest because live track record can only be obtained for a single set of parameters.
Ernie