Friday, July 22, 2022

The demise of Zillow Offers: it is not AI's fault!

The story is now familiar: Zillow Group built a home price prediction system based on AI in order  to become a market-maker in the housing industry. As a market maker, the goal is simply to buy low and sell high, quickly, and with minimal transaction cost. Backtests showed that its AI model's predictive accuracy was over 96% (Hat tip: Peter U., for that article). In reality, though, it lost half a billion dollars.

This is a cautionary tale for anyone using AI to predict prices or returns, including those of us in more liquid markets than housing. Despite Zillow’s failure, the root cause of this discrepancy between backtest and live market-making is well-known, and it has nothing to do with machine learning or AI. Their failure was due to  adverse selection, which can happen to any market maker, whether human or machine. In this context, "market maker" is used in a broad sense - a market maker provides liquidity to the market using limit orders. For instance, any mean-reversion trader is a market maker. As long as the market maker is trading against a counterparty who has more information (a.k.a. the "informed trader"), adverse selection will take money away from the market maker and give it to the informed trader. This is because as market makers, the only model is to buy when prices are cheap, no matter why they are cheap. In contrast, the informed traders may know why the asset is cheap and if it will get cheaper, so they are happy to sell to a market maker. In the opposite situation, if the informed traders believe  that the current prices are cheap, but will get higher, they will refrain from selling. In this case, the limit order will not get executed, and market makers  suffer from "opportunity cost". In Zillow’s case, the informed traders are the homeowners who have a  better understanding of the value of their own home due to qualitative factors (e.g. views, interior design, neighborhood safety, etc.)  outside of Zillow’s model.

In my book Machine Trading, I wrote, "Adverse selection happens when prices on average go down after we buy something, and go up when we sell something". Therefore, adverse selection can be measured quite easily by computing the difference between the (paper) P&L of unfilled orders and the P&L of filled orders over a short time frame. In order to determine whether your AI predictive model will work in reality, it is ideal to deploy it live in a small capacity, and measure the differences over time. If there is significant adverse selection, the trader  can always choose not to participate in the market. For example, it is legendary that high frequency traders stopped providing liquidity to the market during extreme events such as flash crashes. Traders  don't want to be the suckers at the game. Unfortunately for Zillow, they weren’t aware of the well-practiced art of market making.

Another common way to reduce adverse selection is to keep a close tab on your inventory. If, in a short period of time, inventory suddenly changes significantly compared to average trends, it may indicate that there is new information arriving on the market that you are not aware of (e.g. mortgage rate going up by 1%). In this situation, it would be wise to cancel your limit orders until the coast clears. For a mathematical interpretation of this concept, view the formulation by Avellaneda and Sasha. Inventory management was a key  technique that Zillow did not adopt, which could have minimized their adverse selection risk.

AI has been a major asset in numerous business processes, including market making, but it is just one part of complex production machinery. As we can see from Zillow’s use case, predictions, even accurate ones, are not enough to generate profits. As I explained in my previous blog post, we at don't think that AI is the be-all and end-all of decision making. Instead, we believe the value of AI lies in its ability to correct human-made decisions. But, an even larger lesson here is that experts in one industry (e.g. housing) can benefit from the knowledge of experts in another industry (e.g. quantitative finance). This transdisciplinary knowledge is exactly what offers enterprises to improve and enhance their processes.

Friday, January 28, 2022

800+ New Crypto Features

 By Quentin Viville, Sudarshan Sawal, and Ernest Chan is excited to announce that we’re expanding our feature zoo to cover crypto features! This follows our work on US stock features, and features based on options activities, ETFs, futures, and macroeconomic indicators. To read more on our previous work, click here. These new crypto features can be used as input to our machine-learning API to help improve your trading strategy. In this blog we have outlined the new crypto features as well as demonstrated  how we have used them for short term alpha generation and crypto portfolio optimization.

Our new crypto features are designed to capture market activity  from subtle movements to large overarching trends. These features will quantify the variations of the price, the return, the order flow, the volatility and the correlations that appear among them.

To create these features, we first constructed the Base Features  using raw market data that includes microstructure information. Next, we applied simple mathematical functions such as exponential moving average to create the Final Features.

Base Features

The Base Features are constructed using Binance’s dollar bar data, which includes:

  • Open
  • High
  • Low
  • Close
  • Volume
  • Order flow (sum of signed volumes) 
    • +ve volume for buy aggressor tag and -ve volume for sell aggressor tag
  • Buy market order value (sum of volumes corresponding to buy aggressor tag)
  • Sell market order value (sum of volumes corresponding to sell aggressor tag)

Base Features are based on:

  1. Relations between the price, the high price, the low price.
    • Relative High: High Price relative to Open Price.
    • Relative Low: Low Price relative to Open Price.
    • Relative Close: Close Price relative to Open Price.
    • Relative Volume: Buy orders relative to total absolute volume.
    • Target Effort: computes an estimation of the “effort” that the price has to produce to reach the target price by comparing the observed low price and high price.
  2. Volume exchanged.
    • Dollar Speed: Average signed quantity of dollars exchanged per second.
  3. Relations and potential correlations among the variations of the price, the order flow and the intensity of the activity in the market.
    • Kyle’s Lambda: Relation between price change and orderflow.
    • SCOF: Correlation of Order Flow with its lagged series.
    • VPIN: Volume-synchronized probability of informed trading. 
  4. Volatility observed.
    • VLT: Volatility of the returns (Exponentially Weighted)

Each feature is associated with a ‘time span’, or lookback period, which helps capture market activity across  multiple time frames.

Final Features

Once we generated the Base Features, a new, varied set of features was derived called the Final Features.These Final Features are transformations of the initial Base Features into exponentially moving averages and probabilities over many time periods.

This approach has allowed us to produce a large set of Final Features (879 features to be exact), which can capture and quantify the activity of the market within any time span we choose.

Applications to Short Term Alpha Generation’s core functionality is metalabelling, which assigns a Probability of Profit for every trade of an existing strategy (or a future time period of an existing portfolio). This requires us to build a machine learning model using a large number of input features and a target (label), which would be the trades’ (or portfolio’s) returns.

To evaluate the performance of the features described above, we first built a base strategy and then applied metalabelling to the signals of that strategy with those features as input. The base strategy is a high frequency strategy which predicts abnormal returns due to unusual order flow. The out-of-sample backtest performance of just the base strategy:

Maximum drawdown: −6.250%

Annualized Sharpe ratio:3.3

Annualized profit: 32.6% 

Using the Final Features as described above as input to metalabelling, we have been successful in improving  the strategy’s performance drastically. The improved performance after applying metalabelling:

Maximum drawdown: −4.998%

Annualized Sharpe ratio: 5.6

Annualized profit: 227% 

Comparative plot to give an idea of the metalabelling model’s performance in comparison to the base strategy:

The Sharpe ratio is increased from 3.1 to 5.6 and we have almost 7x the annual returns to 227% by applying metalabelling using our new crypto features.

Applying CPO to Crypto Portfolio

Mean Variance Optimization (MVO) is a popular method of portfolio optimization which generates a portfolio with maximum expected returns given a fixed level of risk. One shortcoming of the MVO method is that the selected portfolio is optimal only on average in the past. This doesn’t guarantee it to be optimal in different market regimes. This limitation gives us an opportunity to apply our patent-pending Conditional Parameter Optimization (CPO) technique.

Our CPO technique can be used to improve strategy performance in different market regimes by adapting a trading strategy’s parameters to fit those regimes. Similarly, it can optimize allocations to different constituents of a portfolio in different market regimes. Rather than optimizing based only on the historical means and covariances of a portfolio’s constituents’ returns, CPO involves training a machine learning model with a vast number of external “big data” features to drive the optimization process.

In our next example, we used our crypto features as input. We then compared the Sharpe ratios of a crypto portfolio based on the conventional MVO technique vs our CPO technique on out-of-sample data.

Backtest Result:

  • Portfolios are constituted of 8 symbols (all crypto perpetual futures): BTCUSDT, ETHUSDT, XRPUSDT, ADAUSDT, EOSUSDT, LTCUSDT, ETCUSDT, XLMUSDT
  • Position type includes Long and Short Positions
  • The target variable is the forward Sharpe ratio, computed as the 3-hour return divided by the standard deviation of the sequence of the 5-minute consecutive returns during the 3-hour period
  • Out-of-sample test data set starts on Jan. 2020 and ends on June 2021
  • Results (annualized Sharpe ratio over 365 days per year):

  • CPO improves the Sharpe ratio by x3.8!


We have demonstrated that our new crypto features are powerful additions to any crypto trader or investor’s toolkit by applying them to a crypto trading strategy in live deployment, and to optimizing a crypto portfolio using our proprietary CPO technique. Our features and strategy combined with our machine learning software is proven to increase a base trading strategy’s returns by 7x and increase a crypto portfolio’s Sharpe ratio 3.8x over MVO. Additionally, with our Explainable AI function using our feature selection methodology, we’ve removed the guesswork so you’ll know exactly which of our new crypto features are important to improving your strategy.

To sign up for a free trial to experiment with these new features using our API or to explore our machine learning software please click here. Institutional investors can also inquire about subscribing to our trading signals from our crypto strategy or to updates from our dynamically optimized long-short crypto portfolio.

If you have any questions or would like to work with us, please email us at:

Wednesday, September 22, 2021

Welcome to Our Feature Zoo with 600+ features!

 By Akshay Nautiyal and Ernest Chan

This has been a summer of feature engineering for First, we launched the US stock cross-sectional features and the time-series market-wide features. Now we have launched the features based on options activities, ETFs, futures, and macroeconomic indicators. In total, we are now offering 616 ready-made features to our subscribers. 

There is a lot to read here. If you would rather join our October 1, 12pm EST webinar where Ernie and I will discuss these factors / features and answer your questions, please sign up here.

NOPE - Net options pricing effect - is a normalized measure of the net delta imbalance between the put and call options of a traded instrument across its entire option chain, calculated at the market close for contracts of all maturities. This indicator was invented by Lily Francus (Twitter: @nope_its_lily) and is normalized with the total traded volume of the underlying instrument. The imbalance estimates the amount of delta hedging by market markers needed to keep their positions delta-neutral. This hedging causes price movement in the underlying, which NOPE should ideally capture. The data for this has been sourced from Delta Neutral.and the instrument we applied it to was SPY ETF options. The SPX index options were’t used because the daily traded volume of the underlying SPX index “stock” was irrational. It was calculated as the traded volume of the constituents of the index.

Canary - is an indicator that acts similar to a  canary in a coal mine, which will raise an alarm when there’s an impending danger. This indicator comes from the dual momentum strategies of Vigilant and Defensive Asset allocation. The canary value can be either 0,1 or 2. This is a daily measure of which of the two bond or stock ETFs has a negative absolute momentum - 1) BND - Vanguard Total Bond Market ETF  2) VMO - Vanguard Emerging Markets Stock Index Fund ETF. The momentum is calculated using the 13612W method where we take a proportionally weighted average of percentage change in the bond/stock ETF returns in the last 1 month, 3 months, 6 months, and 1 year. In the paper, the values of “0”,”1” or “2” of the canary portfolio represent what percentage of the canary is bullish. This indicates  what proportion of the asset portfolio was allocated to global risky assets (equity, bond and commodity ETFs) and what proportion was allocated to cash. For example, a “2” would imply 100% cash or cash equivalents, while a “0” would imply 100% allocation to the global risky assets. Alternatively, a value of “1” would imply 50% allocation to global risky assets and 50% to cash. 

Carry - “Carry”,  defined a carry feature as, “the return on a futures position when the price stays constant over the holding period”. (It is also called “roll yield” or “convenience yield”.) We calculate carry for 1) global equities - calculated as a ratio of expected dividend and daily close prices; 2) SPX futures - calculated from price of front month SPX futures contract and spot price of the index; and3) Currency -  calculated from the two nearest months futures data.

Macro factors - macro factors are derived from global macroeconomic data, from the US and 12 other major economies. These are sourced from either Factset or FRED. The factors being offered are: 

1) US Market index adjusted by inflation, money supply - mainly calculated for the US - SP500 adjusted for CPI, PCE, M1 and M2 - tells us if the market index is “inflated” or bubbled up by increased money supply or increasing prices. All these features are daily percentage changes, to make them stationary. 

2) Principal components of continuous maturity bond data

Pricing factors can be extracted as the principal components of the cross-section of treasury yields i.e. these factors are linear combinations of the treasury yields. The first three PCs have been prime candidates in this regard as they generally explain over 99% of the variability in the term structure of bond yields and, due to their loadings, may be interpreted as the level, slope and curvature factor. More can be explored in the paper, Equity tail risk in the treasury bond market.

3) Common sovereign ratios (calculated month-on-month and year-on-year)-

  1. Sovereign Debt normalised with GDP

  2. Foreign Exchange normalised with GDP, 

  3. Government spending normalised to GDP 

  4. Current account balance to normalised GDP

  5. Government Budget balance normalised by GDP

  6. Labour force as a percentage of population

4) Fixed income term premia - the risk or the term premium is the premium or compensation the bond holder gets to account for the possibility of short-term interest rates to deviate from the expected path. This is sourced from the FRED. The methodology for the term structure model used to calculate term premia is covered in the paper, Three-Factor Nominal Term Structure Model. All the term premia features are daily percentage changes, to make them stationary. 

5) Features that are calculated as month-on-month and year-on-year percentage changes: 

  1. Current Account Balance - the percentage change in a country’s international transactions with other countries. 

  2. Exports - the percentage change in a country’s exports to other countries.

  3. Industrial production - the percentage change in a country’s output by industrial sector. 

  4. Imports - the percentage change in a country’s imports from other countries. 

  5. Money supply - the percentage change in a country’s M2 money supply. 

  6. Retail Sales Index- the percentage change in a country’s demand for durable and non-durable goods. 

  7. Employment -  the percentage change in a country’s employment numbers. 

  8. Housing Starts - the percentage change in a country’s new residential construction projects. 

  9. Trade balance - the percentage change in a country’s net sum of imports and exports. 

  10. Unemployment rate - the percentage change in a country’s percentage of labour that is jobless.

  11. Labour force - the percentage change in a country’s active labour force. 

  12. Foreign Exchange Reserves - the percentage change in a country’s forex reserves. 

  13. Consumer Price Index - the percentage change in a country’s CPI inflation measure.

  14. Wholesale Price Index - the percentage change in a country’s WPI inflation measure. 

6) Features that are calculated as quarter-on-quarter change:

  1. Government Spending - the percentage change in a country’s government spending.

  2. Fixed Investment - the percentage change in a country’s assets.

  3. Personal Consumption Expenditure - the percentage change in a country’s household expenditures.

  4. Government debt - the percentage change in a country’s government debt.

  5. Gross Domestic product - the percentage change in a country’s gross domestic product.

  6. Read Gross domestic product - GDP adjusted for inflation.

  7. GDP Price deflator -  the percentage change in a country’s price levels.

7) Seasonally adjusted features - calculated using additive seasonal decomposition to break the series into trend, seasonal and noise components. Only the trend is extracted to get a seasonally adjusted signal. After seasonal adjustment, we calculate the month-on-month and year-on-year change.

a) Seasonally adjusted Employment 

b) Seasonally adjusted  Retail Sales Index 

c) Seasonally adjusted Housing Starts 

8) Total Credit to the non-financial sector- 

The measure of the credit given to non-financial sectors in selected developed economies. This is a leading indicator and can inform us about movement in indicators like Gross domestic product in the future. We calculate the quarter-on-quarter change for these features.

9) Treasury Interest rate spreads - various combinations of spreads between sovereign yields of various maturities. These produce the slopes of the yield curves. Read more about the difference between term spread and term premium here.

10) Retail Inventory to Sales ratio - The percentage of inventory for durable and non-durable goods is sold. This can forecast changes in gross domestic product. We calculate the month-on-month change for these features.

11) Feds Fund rate - daily percentage change in the interbank overnight rate at which excess reserves based on bank requirements are lent or borrowed. The FOMC makes its decisions about rate adjustments based on key economic indicators that may show signs of inflation, recession, or other issues that can affect sustainable economic growth.



The underlying reason for the price movement for an asset is the imbalance of buyers and sellers. An onslaught of market sell orders portends a decrease in price and vice versa..


Order flow is the signed transaction volume aggregated over a period of time and over many transactions in that period to create a more robust measure. It’s also positively correlated with the price movement. This feature is calculated using tick data from Algoseek with aggressor tags (which flag the trade as a buy or sell market order). The data is time-stamped at milliseconds. We aggregate the tick-based order flow to form order flow per minute. 

An example: 

Order flow feature with time stamp 10:01 am will consider trades from 10:00:00 am 10:00:59 am

Time Trade Size Aggressor Tag

10:00:01 am 1 B

10:00:03 am 4 S

10:00:09 am 2 B

10:00:19 am 1 S

10:00:37 am 5 S

10:00:59 am 2 S

The order flow would be 1-4+2-1-5-5=-9

This would be reflect in our feature as Time:10:01 , Order flow :-9


With the 616 features  has developed for  our subscribers, applying machine learning to risk management and portfolio optimization is easier than ever , especially given our built-in financial machine learning API. Our features importance ranking and selection function can indicate which of our  features are most important to predict a user’s portfolio or strategy’s return, so there’s no need to spend hours deciding on which features to include. . Ideally, a user  will also merge them with their own proprietary features to improve predictive accuracy. If you have any questions or would like to learn more about these features, download our detailed user manual here, or book a live demo and chat with one of our consultants here.

Wednesday, July 14, 2021

Metalabeling and the duality between cross-sectional and time-series factors

By Ernest Chan and Akshay Nautiyal

Features are inputs to supervised machine learning (ML) models. In traditional finance, they are typically called “factors”, and they are used in linear regression models to either explain or predict returns. In the former usage, the factors are contemporaneous with the target returns, while in the latter the factors must be from a prior period.

There are generally two types of factors: cross-sectional vs time-series. If you are modeling stock returns, cross-sectional factors are variables that are specific to an individual stock, such as its earnings yield, dividend yield, etc. In our previous blog post, we described how we provide 40 such factors to our subscribers for backtesting and live predictions. But as we advocate using ML for risk management and capital allocation purposes (i.e. metalabeling), not for returns predictions, you may wonder how these factors can help predict the returns of your trading strategy or portfolio. For example, if you have a long-short portfolio of tech stocks such as AAPL, GOOG, AMZN, etc., and want to predict whether the portfolio as a whole will be profitable in a certain market regime, does it really make sense to have the earnings yields of AAPL, GOOG, and AMZN as individual features?

Meanwhile, time-series factors are typically market-wide or macroeconomic variables such as the familiar Fama-French 3-factors:market (simply, the market index return), SMB (the relative return of small cap vs large cap stocks), and HML (the relative return of value vs growth stocks). These time-series factors are eminently suitable for metalabeling, because they can be used to predict your portfolio or strategy’s returns.

Given that there are many more obvious cross-sectional factors than time-series factors available, it seems a pity that we cannot use cross-sectional factors as features for metalabeling. Actually, we can –  Eugene Fama and Ken French themselves showed us how. If we have a cross-sectional factor on a stock, all we need to do is to use it to rank the stocks, form a long-short portfolio using the rankings, and use the returns of this portfolio as a time-series factor. The long-short portfolio is called a hedge portfolio.

We show the process of creation of a hedge portfolio with the help of an example, starting with Sharadar’s fundamental cross-sectional factors (which we generated as shown in the blog). There are 40 cross sectional factors updated at three different frequencies - quarterly, yearly and twelve month trailing. In this exercise, however, we use only the quarterly cross-sectional factors. Given a factor like capex (capital expenditure), we consider the normalized (the normalization procedure is found in the previously cited blog post) capex of approximately 8500 stocks on particular dates from January 1st, 2010 till current date. There are 4 particular dates of interest every year -  January 15th, April 15th, July 15th and October 15th. We call these the ranking dates. On each of these dates we find the percentile rank of the stock based on normalized capex. The dates are carefully chosen to capture change in the cross sectional factors of the maximum number of stocks post the quarterly filings.

Once the capex across stocks is ranked at each ranking date (4 dates) each year we obtain the stocks present in the upper quartile (i.e ranked above 75 percentile) and the stocks present in the lower quartile (i.e ranked below 25 percentile). We take a long position on the ones which showed highest normalized capex and take a short position on the ones with the lowest. Both these sets together make our long-short hedge portfolio.

Once we have the portfolio on a given ranking date we generate the daily returns of the portfolio using risk parity allocation (i.e allocate proportional to inverse volatility). The daily returns of each chosen stock are calculated for each day till the next ranking date. The portfolio weights on each day are the normalized inverse of the rolling standard deviation of returns for a two month window. These weights change on a daily basis and are multiplied to the daily returns of individual stocks to get the daily portfolio returns.  If a portfolio stock is delisted in between ranking dates we simply drop the stock and not use it to calculate the portfolio returns. The daily returns generated in this process are the capex time series factors. This process is repeated for all other Sharadar cross-sectional factors. 

So, voila! 40 cross-sectional factors become 40 time-series factors, and they can be used for metalabeling any portfolio or trading strategy, whether it trades stocks, futures, FX, or anything at all.

What about the opposite conversion? Can we turn time-series factors into cross-sectional factors suitable for predicting the returns of individual stocks? Actually, there is no need. You can directly add any time-series factor to your feature set for predicting individual stock’s returns. This is equivalent to building a linear factor model with an individual stock’s returns as dependent variable and the time-series factor as independent variable, a process well-known in traditional finance.

On a side note: besides these 40 time-series (and their corresponding cross-sectional) features, we have compiled an additional 197 proprietary time-series features available to our Premium subscribers, and available via our API.

Thursday, April 01, 2021

Conditional Parameter Optimization: Adapting Parameters to Changing Market Regimes via Machine Learning

Every trader knows that there are market regimes that are favorable to their strategies, and other regimes that are not. Some regimes are obvious, like bull vs bear markets, calm vs choppy markets, etc. These regimes affect many strategies and portfolios (unless they are market-neutral or volatility-neutral portfolios) and are readily observable and identifiable (but perhaps not predictable). Other regimes are more subtle, and may only affect your specific strategy. Regimes may change every day, and they may not be observable. It is often not as simple as saying the market has two regimes, and we are currently in regime 2 instead of 1. For example, with respect to the profitability of your specific strategy, the market may have 5 different regimes. But it is not easy to specify exactly what those 5 regimes are, and which of the 5 we are in today, not to mention predicting which regime we will be in tomorrow. We won’t even know that there are exactly 5!

Regime changes sometimes necessitate a complete change of trading strategy (e.g. trading a mean-reverting instead of momentum strategy). Other times, traders just need to change the parameters of their existing trading strategy to adapt to a different regime. My colleagues and I at have come up with a novel way of adapting the parameters of a trading strategy, a technique we called “Conditional Parameter Optimization” (CPO). This patent-pending invention allows traders to adapt new parameters as frequently as they like—perhaps for every trading day or even every single trade.

CPO uses machine learning to place orders optimally based on changing market conditions (regimes) in any market. Traders in these markets typically already possess a basic trading strategy that decides the timing, pricing, type, and/or size of such orders. This trading strategy will usually have a small number of adjustable trading parameters. Conventionally, they are often optimized based on a fixed historical data set (“train set”). Alternatively, they may be periodically reoptimized using an expanding or rolling train set. (The latter is often called “Walk Forward Optimization”.) With a fixed train set, the trading parameters clearly cannot adapt to changing regimes. With an expanding train set, the trading parameters still cannot respond to rapidly changing market conditions because the additional data is but a small fraction of the existing train set. Even with a rolling train set, there is no evidence that the parameters optimized in the most recent historical period gives better out-of-sample performance. A too-small rolling train set will also give unstable and unreliable predictive results given the lack of statistical significance. All these conventional optimization procedures can be called unconditional parameter optimization, as the trading parameters do not intelligently respond to rapidly changing market conditions. Ideally, we would like trading parameters that are much more sensitive to the market conditions and yet are trained on a large enough amount of data.

To address this adaptability problem, we apply a supervised machine learning algorithm (specifically, random forest with boosting) to learn from a large predictor (“feature”) set that captures various aspects of the prevailing market conditions, together with specific values of the trading parameters, to predict the outcome of the trading strategy. (An example outcome is the strategy’s future one-day return.) Once such machine-learning model is trained to predict the outcome, we can apply it to live trading by feeding in the features that represent the latest market conditions as well as various combinations of the trading parameters. The set of parameters that results in the optimal predicted outcome (e.g., the highest future one-day return) will be selected as optimal, and will be adopted for the trading strategy for the next period. The trader can make such predictions and adjust the trading strategy as frequently as needed to respond to rapidly changing market conditions.

In the example you can download here, I illustrate how we apply CPO using’s financial machine learning API to adapt the parameters of a Bollinger Band-based mean reversion strategy on GLD (the gold ETF) and obtain superior results which I highlight here:




Unconditional Optimization

Conditional Optimization

Annual Return



Sharpe Ratio



Calmar Ratio




The CPO technique is useful in industry verticals other than finance as well – after all, optimization under time varying and stochastic condition is a very general problem. For example, wait times in a hospital emergency room may be minimized by optimizing various parameters, such as staffing level, equipment and supplies readiness, discharge rate, etc. Current state-of-the-art methods generally find the optimal parameters by looking at what worked best on average in the past. There is also no mathematical function that exactly determines wait time based on these parameters. The CPO technique employs other variables such as time of day, day of week, season, weather, whether there are recent mass events, etc. to predict the wait time under various parameter combinations, and thereby find the optimal combination under the current conditions in order to achieve the shortest wait time.

We can provide you with the scripts to run CPO on your own strategy using’s API. Please email for a free trial.

Friday, January 22, 2021

The Amazing Efficacy of Cluster-based Feature Selection

One major impediment to widespread adoption of machine learning (ML) in investment management is their black-box nature: how would you explain to an investor why the machine makes a certain prediction? What's the intuition behind a certain ML trading strategy? How would you explain a major drawdown? This lack of "interpretability" is not just a problem for financial ML, it is a prevalent issue in applying ML to any domain. If you don’t understand the underlying mechanisms of a predictive model, you may not trust its predictions.

Feature importance ranking goes a long way towards providing better interpretability to ML models. The feature importance score indicates how much information a feature contributes when building a supervised learning model. The importance score is calculated for each feature in the dataset, allowing the features to be ranked. The investor can therefore see the most important predictors (features) used in the predictions, and in fact apply "feature selection" to only include those important features in the predictive model. However, as my colleague Nancy Xin Man and I have demonstrated in Man and Chan 2021a, common feature selection algorithms (e.g. MDA, LIME, SHAP) can exhibit high variability in the importance rankings of features: different random seeds often produce vastly different importance rankings. For e.g. if we run MDA on some cross validation set multiple times with different seeds, it is possible that a feature in a run is ranked at the top of the list but dropped to the bottom in the next run. This variability of course eliminates any interpretability benefit of feature selection. Interestingly, despite this variability in importance ranking, feature selection still generally improves out-of-sample predictive performance on multiple data sets that we tested in the above paper. This may be due to the "substitution effect": many alternative (substitute) features can be used to build predictive models with similar predictive power. (In linear regression, substitution effect is called "collinearity".)

To reduce variability (or what we called instability) in feature importance rankings and to improve interpretability, we found that LIME is generally preferable to SHAP, and definitely preferable to MDA. Another way to reduce instability is to increase the number of iterations during runs of the feature importance algorithms. In a typical implementation of MDA, every feature is permuted multiple times. But standard implementations of LIME and SHAP have set the number of iterations to 1 by default, which isn't conducive to stability. In LIME, each instance and its perturbed samples only fit one linear model, but we can perturb them multiple times to fit multiple linear models. In SHAP, we can permute the samples multiple times. Our experiments have shown that instability of the top ranked features do approximately converge to some minimum as the number of iterations increases; however, this minimum is not zero. So there remains some residual variability of the top ranked features, which may be attributable to the substitution effect as discussed before.

To further improve interpretability, we want to remove the residual variability. L√≥pez de Prado, M. (2020) described a clustering method to cluster together features are that are similar and  should receive the same importance rankings. This promises to be a great way to remove the substitution effect. In our new paper Man and Chan 2021b, we applied a hierarchical clustering methodology prior to MDA feature selection to the same data sets we studied previously. This method is generally called cMDA. As they say in social media click baits, the results will (pleasantly) surprise you. 

For the benchmark breast cancer dataset, the top two clusters found were:


Cluster Importance Scores

Cluster Rank


Geometry summary



  'mean radius',

  'mean perimeter',

  'mean area',

  'mean compactness',

  'mean concavity',

  'mean concave points',

  'radius error',

  'perimeter error',

  'area error',

  'worst radius',

  'worst perimeter',

  'worst area',

  'worst compactness',

  'worst concavity',

  'worst concave points'


Texture summary



'mean texture', 'worst texture'

Not only do these clusters have clear interpretations (provided by us as a "Topic"), these clusters almost never change in their top importance rankings under 100 random seeds! 

Closer to our financial focus, we also applied cMDA to a public dataset with features that may be useful for predicting S&P 500 index excess monthly returns. The two clusters found are


Cluster Scores

Cluster Rank





d/p, d/y, e/p, b/m, ntis, tbl, lty, dfy, dfr, infl




d/e, svar, ltr, tms

The two clusters can clearly be interpreted as fundamental vs technical indicators, and their rankings don't change: fundamental indicators are always found to be more important than technical indicators in all 100 runs with different random seeds.

Finally, we apply this technique to our proprietary features for predicting the success of our Tail Reaper strategy. Again, the top 2 clusters are highly interpretable, and never change with random seeds. (Since these are proprietary features, we omit displaying them.) 

If we select only those clearly interpretable, top clusters of features as input to training our random forest, we find that their out-of-sample predictive performances are also improved in many cases. For example, the accuracy of the S&P 500 monthly returns model improves from 0.517 to 0.583 when we use cMDA instead of MDA, while the AUC score improves from 0.716 to 0.779.


S&P 500 monthly returns prediction

















Meanwhile, the accuracy of the Tail Reaper metalabeling model improves from 0.529 to 0.614 when we use cMDA instead of MDA and select all clustered features with above-average importance scores, while the AUC score improves from 0.537 to 0.672.

















This added bonus of improved predictive performance is a by-product of capturing all the important, interpretable features, while removing most of the unimportant, uninterpretable features. 

You can try out this hierarchical cluster-based feature selection for free on our financial machine learning SaaS You can use the no-code version, or ask for our API. Details of our methodology can be found here.

Industry News

  1. Jay Dawani recently published a very readable, comprehensive guide to deep learning "Hands-On Mathematics for Deep Learning".
  2. is a new algo strategy marketplace that allows one to build algo strategies without coding and others to subscribe to them and take trades in their own linked brokerage accounts automatically. It can handle complex strategies such as arbitrage and options strategies. Currently some 400 algos are on offer.
  3. Jonathan Landy, a Caltech physicist, together with 3 of his physicist friends, have started a deep data science and machine learning blog with special emphasis on finance.