Thursday, April 01, 2021

Conditional Parameter Optimization: Adapting Parameters to Changing Market Regimes via Machine Learning

Every trader knows that there are market regimes that are favorable to their strategies, and other regimes that are not. Some regimes are obvious, like bull vs bear markets, calm vs choppy markets, etc. These regimes affect many strategies and portfolios (unless they are market-neutral or volatility-neutral portfolios) and are readily observable and identifiable (but perhaps not predictable). Other regimes are more subtle, and may only affect your specific strategy. Regimes may change every day, and they may not be observable. It is often not as simple as saying the market has two regimes, and we are currently in regime 2 instead of 1. For example, with respect to the profitability of your specific strategy, the market may have 5 different regimes. But it is not easy to specify exactly what those 5 regimes are, and which of the 5 we are in today, not to mention predicting which regime we will be in tomorrow. We won’t even know that there are exactly 5!

Regime changes sometimes necessitate a complete change of trading strategy (e.g. trading a mean-reverting instead of momentum strategy). Other times, traders just need to change the parameters of their existing trading strategy to adapt to a different regime. My colleagues and I at PredictNow.ai have come up with a novel way of adapting the parameters of a trading strategy, a technique we called “Conditional Parameter Optimization” (CPO). This patent-pending invention allows traders to adapt new parameters as frequently as they like—perhaps for every trading day or even every single trade.

CPO uses machine learning to place orders optimally based on changing market conditions (regimes) in any market. Traders in these markets typically already possess a basic trading strategy that decides the timing, pricing, type, and/or size of such orders. This trading strategy will usually have a small number of adjustable trading parameters. Conventionally, they are often optimized based on a fixed historical data set (“train set”). Alternatively, they may be periodically reoptimized using an expanding or rolling train set. (The latter is often called “Walk Forward Optimization”.) With a fixed train set, the trading parameters clearly cannot adapt to changing regimes. With an expanding train set, the trading parameters still cannot respond to rapidly changing market conditions because the additional data is but a small fraction of the existing train set. Even with a rolling train set, there is no evidence that the parameters optimized in the most recent historical period gives better out-of-sample performance. A too-small rolling train set will also give unstable and unreliable predictive results given the lack of statistical significance. All these conventional optimization procedures can be called unconditional parameter optimization, as the trading parameters do not intelligently respond to rapidly changing market conditions. Ideally, we would like trading parameters that are much more sensitive to the market conditions and yet are trained on a large enough amount of data.

To address this adaptability problem, we apply a supervised machine learning algorithm (specifically, random forest with boosting) to learn from a large predictor (“feature”) set that captures various aspects of the prevailing market conditions, together with specific values of the trading parameters, to predict the outcome of the trading strategy. (An example outcome is the strategy’s future one-day return.) Once such machine-learning model is trained to predict the outcome, we can apply it to live trading by feeding in the features that represent the latest market conditions as well as various combinations of the trading parameters. The set of parameters that results in the optimal predicted outcome (e.g., the highest future one-day return) will be selected as optimal, and will be adopted for the trading strategy for the next period. The trader can make such predictions and adjust the trading strategy as frequently as needed to respond to rapidly changing market conditions.

In the example you can download here, I illustrate how we apply CPO using PredictNow.ai’s financial machine learning API to adapt the parameters of a Bollinger Band-based mean reversion strategy on GLD (the gold ETF) and obtain superior results which I highlight here:

 

 

 

Unconditional Optimization

Conditional Optimization

Annual Return

17.29%

19.77%

Sharpe Ratio

1.947

2.325

Calmar Ratio

0.984

1.454

 

The CPO technique is useful in industry verticals other than finance as well – after all, optimization under time varying and stochastic condition is a very general problem. For example, wait times in a hospital emergency room may be minimized by optimizing various parameters, such as staffing level, equipment and supplies readiness, discharge rate, etc. Current state-of-the-art methods generally find the optimal parameters by looking at what worked best on average in the past. There is also no mathematical function that exactly determines wait time based on these parameters. The CPO technique employs other variables such as time of day, day of week, season, weather, whether there are recent mass events, etc. to predict the wait time under various parameter combinations, and thereby find the optimal combination under the current conditions in order to achieve the shortest wait time.

We can provide you with the scripts to run CPO on your own strategy using Predictnow.ai’s API. Please email info@predictnow.ai for a free trial.

Friday, January 22, 2021

The Amazing Efficacy of Cluster-based Feature Selection

One major impediment to widespread adoption of machine learning (ML) in investment management is their black-box nature: how would you explain to an investor why the machine makes a certain prediction? What's the intuition behind a certain ML trading strategy? How would you explain a major drawdown? This lack of "interpretability" is not just a problem for financial ML, it is a prevalent issue in applying ML to any domain. If you don’t understand the underlying mechanisms of a predictive model, you may not trust its predictions.

Feature importance ranking goes a long way towards providing better interpretability to ML models. The feature importance score indicates how much information a feature contributes when building a supervised learning model. The importance score is calculated for each feature in the dataset, allowing the features to be ranked. The investor can therefore see the most important predictors (features) used in the predictions, and in fact apply "feature selection" to only include those important features in the predictive model. However, as my colleague Nancy Xin Man and I have demonstrated in Man and Chan 2021a, common feature selection algorithms (e.g. MDA, LIME, SHAP) can exhibit high variability in the importance rankings of features: different random seeds often produce vastly different importance rankings. For e.g. if we run MDA on some cross validation set multiple times with different seeds, it is possible that a feature in a run is ranked at the top of the list but dropped to the bottom in the next run. This variability of course eliminates any interpretability benefit of feature selection. Interestingly, despite this variability in importance ranking, feature selection still generally improves out-of-sample predictive performance on multiple data sets that we tested in the above paper. This may be due to the "substitution effect": many alternative (substitute) features can be used to build predictive models with similar predictive power. (In linear regression, substitution effect is called "collinearity".)

To reduce variability (or what we called instability) in feature importance rankings and to improve interpretability, we found that LIME is generally preferable to SHAP, and definitely preferable to MDA. Another way to reduce instability is to increase the number of iterations during runs of the feature importance algorithms. In a typical implementation of MDA, every feature is permuted multiple times. But standard implementations of LIME and SHAP have set the number of iterations to 1 by default, which isn't conducive to stability. In LIME, each instance and its perturbed samples only fit one linear model, but we can perturb them multiple times to fit multiple linear models. In SHAP, we can permute the samples multiple times. Our experiments have shown that instability of the top ranked features do approximately converge to some minimum as the number of iterations increases; however, this minimum is not zero. So there remains some residual variability of the top ranked features, which may be attributable to the substitution effect as discussed before.

To further improve interpretability, we want to remove the residual variability. L√≥pez de Prado, M. (2020) described a clustering method to cluster together features are that are similar and  should receive the same importance rankings. This promises to be a great way to remove the substitution effect. In our new paper Man and Chan 2021b, we applied a hierarchical clustering methodology prior to MDA feature selection to the same data sets we studied previously. This method is generally called cMDA. As they say in social media click baits, the results will (pleasantly) surprise you. 

For the benchmark breast cancer dataset, the top two clusters found were:

Topic

Cluster Importance Scores

Cluster Rank

Features

Geometry summary

0.360

1

  'mean radius',

  'mean perimeter',

  'mean area',

  'mean compactness',

  'mean concavity',

  'mean concave points',

  'radius error',

  'perimeter error',

  'area error',

  'worst radius',

  'worst perimeter',

  'worst area',

  'worst compactness',

  'worst concavity',

  'worst concave points'

 

Texture summary

0.174

2

'mean texture', 'worst texture'


Not only do these clusters have clear interpretations (provided by us as a "Topic"), these clusters almost never change in their top importance rankings under 100 random seeds! 

Closer to our financial focus, we also applied cMDA to a public dataset with features that may be useful for predicting S&P 500 index excess monthly returns. The two clusters found are

Topic

Cluster Scores

Cluster Rank

Features

Fundamental

0.667

1

d/p, d/y, e/p, b/m, ntis, tbl, lty, dfy, dfr, infl

Technical

0.333

2

d/e, svar, ltr, tms



The two clusters can clearly be interpreted as fundamental vs technical indicators, and their rankings don't change: fundamental indicators are always found to be more important than technical indicators in all 100 runs with different random seeds.

Finally, we apply this technique to our proprietary features for predicting the success of our Tail Reaper strategy. Again, the top 2 clusters are highly interpretable, and never change with random seeds. (Since these are proprietary features, we omit displaying them.) 

If we select only those clearly interpretable, top clusters of features as input to training our random forest, we find that their out-of-sample predictive performances are also improved in many cases. For example, the accuracy of the S&P 500 monthly returns model improves from 0.517 to 0.583 when we use cMDA instead of MDA, while the AUC score improves from 0.716 to 0.779.

 

S&P 500 monthly returns prediction

 

F1

AUC

Acc

cMDA

0.576

0.779

0.583

MDA

0.508

0.716

0.517

Full

0.167

0.467

0.333


Meanwhile, the accuracy of the Tail Reaper metalabeling model improves from 0.529 to 0.614 when we use cMDA instead of MDA and select all clustered features with above-average importance scores, while the AUC score improves from 0.537 to 0.672.

 

F1

AUC

Acc

cMDA

0.658

0.672

0.614

MDA

0.602

0.537

0.529

Full

0.481

0.416

0.414

This added bonus of improved predictive performance is a by-product of capturing all the important, interpretable features, while removing most of the unimportant, uninterpretable features. 

You can try out this hierarchical cluster-based feature selection for free on our financial machine learning SaaS predictnow.ai. You can use the no-code version, or ask for our API. Details of our methodology can be found here.

Industry News

  1. Jay Dawani recently published a very readable, comprehensive guide to deep learning "Hands-On Mathematics for Deep Learning".
  2. Tradetron.tech is a new algo strategy marketplace that allows one to build algo strategies without coding and others to subscribe to them and take trades in their own linked brokerage accounts automatically. It can handle complex strategies such as arbitrage and options strategies. Currently some 400 algos are on offer.
  3. Jonathan Landy, a Caltech physicist, together with 3 of his physicist friends, have started a deep data science and machine learning blog with special emphasis on finance.