Saturday, April 23, 2011

The many facets of linear regression

Many years ago, a portfolio manager asked me in a phone interview: "Do you believe that linear or nonlinear models are more powerful in building trading models?" Being a babe-in-the-woods, I did not hesitate in answering "Nonlinear!" Little did I know that this is the question that separate the men from the boys in the realm of quantitative trading. Subsequent experiences showed me that nonlinear models have mostly been unmitigated disasters in terms of trading profits. As Max Dama said in a recent excellent article on linear regression: "...when the signal to noise ratio is .05:1, ... there’s not much point in worrying about [higher order effects]". One is almost certain to overfit a nonlinear model to non-recurring noise.


Until recently, I have used linear regression mainly in finding hedge ratios between two instruments in pair trading, or more generally in finding the weightings (in number of shares) of individual stocks in a basket in some form of index arbitrage. Of course, others have found linear algebra useful in principal component analysis and more generally factor analysis as well. But thanks to a number of commenters on this blog as well as various private correspondents, I have begun to apply linear regression more directly in trading models.


One way to directly apply linear regression to trading is to use it in place of moving averages. Using moving average implicitly assumes that there is no trend in a price series, that the mean of the prices will remain the same. This of course may not be true. So using linear regression to project the current equilibrium price is sometimes more accurate than just setting it equal to a moving average. I have found that in some cases, this equilibrium price results in better mean-reverting models: e.g. short an instrument when its current price is way above the equilibrium price. Of course, one can also use linear regression in a similar way in momentum models: e.g. if the current price is way above the equilibrium price, consider this a "breakout" and buy the instrument. 


Max in his article referenced above also pointed out a more sophisticated version of linear regression, commonly called "weighted least squares regression" (WLS). WLS is to linear regression what exponential moving average (EMA) is to simple moving average (SMA): it gives more weights to recent data points. Indeed I have found that EMA often gives better results than SMA in trading. However, so far I have not found WLS to be better than simple least squares. Max also referenced an article which establishes the equivalence between weighted least squares and Kalman filter. Now Kalman filter is a linear model that is very popular among quantitative traders. The nice feature about Kalman filter is that there is very few free parameters: the model will adapt itself to the means and covariances of the input time series gradually. And furthermore, it can do so one-step at a time (or in technical jargon, using an "online" algorithm) : i.e., there is no need to separate the data into "training" and "test" sets, and no need to define a "lookback" period unlike moving averages. It makes use of "hidden states" much like Hidden Markov Models (HHM), but unlike HHM, Kalman filter is faithfully linear.


I haven't used Kalman filter much myself, but I would welcome any comments from our readers on its usage. Also, if you know of other ways to use linear regression in trading, do share with us here!



87 comments:

Damian said...

I've often heard people refer to the Kalman filter as a T3 moving average - but I've not seen one coded up that didn't include a lookback period.

Here's an implementation in Amibroker - curious what you think.

http://www.wisestocktrader.com/indicators/240-t3-function-include-afl

Ernie Chan said...

Hi Damian,
Thanks for the link. I think the implementation of the T3 Kalman filter is too complicated and ad-hoc. The "nonlinear" in "... T3 is a six-pole non-linear Kalman filter ..." is precisely what most of us want to avoid.

In the Kalman implementation referenced by Max (http://www2.econ.iastate.edu/tesfatsi/FLSTemporalDataMining.GMontana2009.pdf), there is a parameter that controls how fast the regression parameter is allowed to change. This can be viewed as some kind of lookback parameter, since the faster it is allowed to change, the shorter the effective lookback period is.

Ernie

Damian said...

Yes a very different thing....pretty interesting. Thanks for the link.

Matthew said...

The Kalman filtering approach is a really important concept. Two things to keep in mind is that the filter uses a model to predict the system's next state. So you still have to choose between linear and non-linear modeling even after deciding to apply such a Kalman filter.

Also be aware that the mathematical underpinnings of the Kalman filter assume continuous, normally distributed variables. Items which are strikingly hard to come by in trading.

Mr.LoL said...

Hi Ernest:

I am pretty new in this quant world, I have been reading your posts and I have a question if you don't mind...

What is the point of focusing on HFT when using pair trading and holding them for some days seems to be so much easier?

I mean that you could get a 10%? 20% maybe? using that approach and you just have to worry about the mathematical model itself, not the implementation,slippage, etc.

I also assume that the shorter the timeframe, the greater the randomness, is this right? Does it really pay off in terms of reward?Could you give us some approximation here comparing those two ways of trading??

Ernie Chan said...

Hi LoL,
The advantage of HFT is that the Sharpe ratio is typically much higher than overnight trading, so that allows you to use more leverage, and that in turns allow you to have much higher returns.

I do not believe that randomness increases with trading frequency. In fact, I find the opposite to be true, as short time scales prevents extraneous events from disrupting the model.

Ernie

Anonymous said...

Hi Ernie,

Dumb me, I just noticed I missed the seasonal trade on RBOB (+$3381 per RBOB contract this year)...

Did you remember to trade it?

Ernie Chan said...

Hi Anon,
Good to hear RBOB is still working!
No, I didn't trade either: I am focusing on higher frequency trading these days.
Ernie

Anonymous said...

Can you elaborate on how you use linear regression in place of moving averages? What's the dependent variable and what's the regressor? Thanks.

Shaun Overton said...

Hi Ernie,

This is a decidedly non-mathematical approach, but I've taken to resetting my moving averages whenever a bar > 2 standard deviations forms. The MA period grows linearly until the next volatility event.

Click on the "Resetting Moving Average" to get the indicator code for MT4 or NinjaTrader.

Ernie Chan said...

Anon,
In using LR instead of MA, the time variable t=1,2,3... is the independent variable, and the price is the dependent one.
Ernie

Ernie Chan said...

Hi Shaun,
That's an interesting approach and it does make sense. Thanks for sharing!
Ernie

Wei said...

Hi Ernie,

Great article and thanks for sharing your thoughts on linear regression and other technical methods. There are a couple of important aspects worth pointing out:

1. OLS and WLS require specifying hyperparameters like the length of lookback window, with expanding or rolling window being popular choices. The coefficients however are sensitive to the size of window: too slow to adapt if too long a window, high sample error if too short. I have encountered situations where the hedge ratio changes its sign as data samples rolling forward, completely nonsense and purely an artifact of LR properties and sampling errors. It led to a breakdown of the regression-based trading model, but I consider this a fortunate revelation as I had long worried about the arbitrariness of selecting a hyperparameter without clear economic justification. One could "optimize" the window size to get the best backtest results but the problem is that tomorrow is a different day. That leads to the second and more serious problem with linear regression.

2. OLS, and even WLS, disregard the intertemporal structure of the time series data. Max claims that WLS solves this problem, but my experience has been that it makes no significant difference, and you seem to agree. Again, one faces the problem of deciding what kind of weights and decay rate to be applied. Yet another hyperparameter, to be decided not on economic grounds.

3. Kalman filter solves these problems to a large extend, and it works well with discrete data (unlike one commentator claimed). It's also simple and efficient to implement, but it's not a free lunch. To use it, you need a model specification, and there is no off-the-shelf way of doing that. It's completely up to your creativity and understanding the trading problem. Of course, it has its own set of issues too, but at least you can frame it in economic terms, because hopeful you have created a model specification based on sensible economics.

4. Last but not least, on whether high frequency microstructure is more or less noisy. It depends on the market and assets you are looking at. For an asset with high intraday volatility, you might be better off using low frequency data. Almost by definition, high volatility is an indication of high degree of noise around the "true" fundamental value.

Ernie Chan said...

Hi Wei,
Thanks for your thoughtful comments on OLS, WLS, KF, and noise.

In my experience, hedge ratios do not vary too much based on the lookback period. Perhaps that's because I focus only on ETF pairs and they are pretty stable. I am surprised though to hear that even WLS has such a sensitive dependence, as the weights are supposed to smooth this out.

With regard to choosing the right model for KF, I stick to Occam's razor as usual! But yes, if you know a bit about the economics of the trade, it would be a big help, though usually I am clueless until after-the-fact.

With regard to noise -- for a mean-reversion trader, more noise means more profit opportunities! (We assume that the noise is mean-reverting.) So if intraday trading is noisy in that sense, intraday trading is therefore very profitable. The noise we don't like is the type that does not mean-revert: for e.g. those created by exogenous corporate/economic/political events.

Ernie

Kenneth said...

Hi Ernest,

I just started my own blog for my personal usage, can I have your permission to add your link to my blog?

Thank you.

Regards
Kenneth

Ernie Chan said...

Hi Kenneth,
Sure, please feel free to link to my blog.
Ernie

Anonymous said...

Ernie,

This may sound like a strange question to ask on a blog called, "Quantitative Trading", but have you ever evaluated the quant-oriented daytrading methods discussed on this blog in comparison to plain old value investing and security analysis? That method can be summarized as follows: gain a detailed understanding of a stock or bond security through research, buy when the price is much lower than what it's really worth (i.e., intrinsic value), and sell as it approaches intrinsic value.

The reason I ask such a fundamental question is that my background is very technical and mathematical, much like yours. I have a Ph.D. in Electrical Engineering and have a background in things like linear regression and kalman filters. But after evaluating all investment methods I was aware of, the simple non-quant idea of buying securities at a deep discount to fair value still makes the most sense to me.

I do enjoy your blog - don't get me wrong. I was just wondering if you ever thought about this more fundamental question.

Thanks,
aagold

Ernie Chan said...

Anon,
Both approaches are equally valid.

The value investing approach usually implies a long holding period. However, value investing is not antithetical to a quantitative approach. In fact, many people call Ben Graham the first "quant". For e.g. factor models utilize many fundamental and economic indicators in order to determine the fair value of a stock.

What people usually have in mind as algorithmic or quantitative trading typically occur at a higher frequency. At such frequency, fundamental information becomes less important.

Value investing typically has low Sharpe ratio and large drawdown, but it has very high capacity. High frequency algorithmic trading has the opposite characteristics.

An ideal hedge fund should encompass both approaches, but few managers have equally excellent skills in both, not even Jim Simons.

Ernie

Ronnie said...

Hi Ernie!

Sorry for the off topic..

Could you explain which way is the better for a new trader to start?

Imagine you have not a lot of capital, pair trading often involves buying-selling contracts that are quite big for the small money, so the way to do this would be over-leveraging which is dangerous.

My question is: Where should a trader with 20k for example look at? forex? commodities? stocks?

Would pair trading be ok for this? O should the trader go for other options like volatility trading or such?

Thanks in advance.

Ernie Chan said...

Hi Ronnie,
FX and futures are the best areas for a trader with small capital base to start, due to the small margin requirement. Of course, that assumes that you have good strategies in those areas!

Pair trading ETF's are pretty easy and safe, but as you said, requires a good bit of capital to make a living.

Ernie

Anonymous said...

Hi Erine

Roughly, how much you have to spend on setting up the infrastructure (co-location) of HFT business, those hardware seems pretty expensive. The setup cost seems too much for retail trader

Kat

Ernie Chan said...

Hi Kat,
Hardware is not expensive. Any server of about $5K will do. What's expensive is what your broker will charge for the ongoing colocation expense: at least $2K / month.

None of these matter if your HFT strategy actually works!
Ernie

Anonymous said...

Hi Erine

HFT seems a quite profitable strategy for small fund capital, I heard that some of the banks embbeded their trading strategy in a microchip to gain extra speed. Sounds likes everyone keeps on investing on hardware to front run the other taders. What software language you use to implement the HFT, matlab?

Dave

Ernie Chan said...

Hi Dave,
I hesitate to call my strategies HFT: I can certainly tolerate latency of a few seconds.

While the high turnover of HFT does allow a small fund to use its small capital base very efficiently, the infrastructure cost for a true HFT strategy is beyond most small funds.

Yes, I implement all my strategies in Matlab.

Ernie

Alpha said...

hi Ernie, what's your thought on measuring divergence between the price and an oscillator such as RSI?

Given that the divergence is done on the swings; and not on the raw data points; is Linear Regression a good candidate?

Issy

Ernie Chan said...

Hi Issy,
I am not exactly sure what you mean by "divergence is done on the swing, and not the raw data points". Could you please elaborate?
Ernie

GTji said...

I suspect that Linearregression Avg may cause your system to be overly curve fitted, whats you opinion on that?

Suny said...

I have a question on OLS function by Spatial Econometrics. I used that function as suggested by Ernie in this book. Somehow the hedge ratio (or beta) of the regression comes out different from when I run it with glmfit function in Econometric toolbox. The result from OLS in Spatial econometrics and REGRESS in matlab comes out the same but different from glmfit. I have tested with simple Excel regression and SAS function. Those numbers agree with glmfit. I am just wondering what makes the difference here. Am I missing something here? Thanks

Ernie Chan said...

Suny,
Have you made sure that no offset was used in the regression fit in all cases?
Ernie

Anonymous said...

Hi Ernie - quick question if you don't mind the time. Appreciate your time as always. I'm wondering how do you set up the regression in place of the MA:

>In using LR instead of MA, the time >variable t=1,2,3... is the >independent variable, and the price >is the dependent one.

Price = a + b * t

1) Are you using intercept or is it better to leave it out?

2) The time variable t - are you using just an integer for t that increments by 1 as you move forward in time?

3) I imagine you are doing a rolling-window regression, similar to how a moving average rolls forward based on the window period selected.

Greatly appreciate the insights.

Shal

Ernie Chan said...

Anon,
1) Intercept is needed here because prices do not go to zero at an arbitrary t=0.

2) t can increment by 1 at every bar.

3) For ordinary OLS, a rolling window is needed. For WLS or Kalman Filter, we don't need rolling window.
Ernie

Anonymous said...

Thanks Ernie for the feedback on the regression setup. Here is a simple way to do this in R if anyone wants to fiddle around.

library('quantmod')
getSymbols(c('AAPL'),from='01-01-2003')

lm(Cl(AAPL) ~ index(AAPL)) -> results

summary(results)
plot(index(AAPL), Cl(AAPL),type='line')

abline(coef = coef(test))

Shal

Boris said...

Hi Ernie,
you mentioned you used LR also in basket arbitrage trading. I am using it in forex basket arbitrage trading with R where the regressand is EURUSD. Can you tell me how you are deriving the lot sizes from the calculated coefficients ?

Ernie Chan said...

Hi Boris,
I assume you are regressing A.USD, B.USD, ..., against EUR.USD, so that all independent variables are denominated in USD? If so, then the regression coefficients are the lot sizes.
Ernie

Boris said...

Hi Ernie, thanks for the feedback. I am using mixed pairs xxxUSD and USDxxx like USDJPY and USDCHF. Deposit currency is USD.

Ernie Chan said...

Hi Boris,
In that case, you have to first convert all the pairs to X.USD first, do the LR, obtain the lot size, and then convert the lot size back to USD.X.
Ernie

Boris said...

Thanks for the feedback. It sounds good. I am thinking about another approach: normalization of the currency pairs should be also achieved by deviding quotes through its related pip value per lot - calculated with ticksize and tickvalue correct ?

Ernie Chan said...

Boris,
Yes, as long as each point move represents the same dollar amount, you can run your LR on any price series.
Ernie

Unknown said...

Hi Ernie,

I have one question regarding LR. I am using them as Moving Average for example 21 day LR & 63 day LR. I will be looking out for cross over and also price cross them either from up or down. My question is what can be better option to filter for trend identification and also how to avoid whipsaw as we often see in MA crossover.

Ernie Chan said...

Unknown,
Different lookback is optimal for different time series. If you are looking for trend, you should check the correlation coefficient of various lookback periods with a holding period and see which one is optimal for your time series.
Ernie

sg said...

By the way, what does it mean by "independent variable" and "dependent variable" in this context?

Ernie Chan said...

Hi sg,
For pair trading, you can arbitrarily pick any one price series as independent variable, and the other as dependent. However, it is a good idea to try both permutations.
Ernie

Reid Minto said...

Hi Ernie,

(Stats grad student who just started following your blog here) -- I wanted to comment on the use of OLS and WLS. For those who may not know, WLS is a special case of Generalized Least Squares (GLS) when we have no autocorrelation in the model errors (in other words all the off-diagonal terms in the covariance matrix are zero). GLS outperforms OLS (among all other linear unbiased estimators) in terms of efficiency when there is heteroskedasticity (non-constant variance) and/or autocorrelation in the error terms by essentially weighting observations according to the magnitude of the model errors. If you use GLS/WLS and choose the weights according to time periods instead of giving relatively larger weights to observations with smaller errors and giving less weight to the ones with larger errors, then you will indeed get some funky results which I think explains why you weren't getting better results using WLS over OLS. If there is strong evidence of heteroskedasticity and/or autocorrelation (usually at least autocorrelation in financial time series) then WLS/GLS should give you better results than OLS.

BTW - Great blog. I'm just recently getting into computational finance and I'm enjoying your blog along with all the comments. It's been very helpful.

Reid Minto said...

Hi all,

One more comment on the use of linear regression. Anonymous asked if he should leave out the intercept term in his model
price = a + b*t which would give us a different model of
price = b*t
which would force the regression line to go through the origin. For purposes of interpretation, we would not want to have the intercept term since exclusion of the intercept would imply a price of 0 at t=0. However, for forecasting purposes it really does not matter. Given that it's hardly much extra work to run both models, I would suggest trying both and comparing the models using cross validated root mean squared error .


Also, my previous post which mentioned why we would want to use generalized or weighted least squres, after seeing the specific regression equation I felt compelled to add my two cents. To get the best results out of our linear regression of
price = a + b*t, I would suggest the following method.

1. Test our variables for non-stationarity

2. Use a stationary transformation (differencing) on any non-stationary variables

3. Try lagging the variables in the regression equation

4. Using the time series plots and autocorrelation function plots etc., we can estimate the number of lags and order of differencing we should use. We'll get a regression equation that looks something like

price(t) = a + price(t-1) + b1*((t-1)-(t-2)) + b2*(price(t-1)-price(t-2)) for a single order of differencing and a one period lag. After we're happy with our stationary transformations we can run ols then check if we still have heteroskedasticity and/or autocorrelation then go from there, using gls/wls as needed.

Ernie Chan said...

Hi RM,
Thank you for your detailed comment and insights!

I am not sure what you are referring to when you said "choose the weights according to time periods". What we did is to give more weight to more recent data, which is generally deemed to be more relevant to the current market condition. Is that bad?

Ernie

Reid Minto said...

Hi Ernie,

I apologize I should have asked you for details before commenting on anything. If you are interested in forecasting and wish to extrapolate the model several periods forward out of sample then indeed the use of wls with more heavily weighted recent data is perfectly fine. On the other hand, if the goal were prediction - whether your data, a test subset of your data, validation on another dataset, etc - then using the weights in that matter would be arbitrary and definitely not advisable.

Although I think we could conjure up some cases where if the weights were too heavily weighted towards recent data, then we would run into some problems. If I have time this weekend I might look at some time series price data and give a few examples on all these procedures and problems. I think another commenter here alluded to this fact that the weights we choose for wls are a nuisance parameter we might want to somehow average out or avoid altogether.

In fact, if we follow the correct procedure for testing non-stationarity, using the acf plots, etc., then we get a pretty good estimate at the order of differencing and number of lagged variables we should use in our regression equation. These estimates of the order of differencing and number of lags are just indicating how much "memory" our variables contain. If we know how much memory is in fact contained in our variables, then we know what we should include in our regression equation which will be the transformed version of the data
that gives us only white noise for errors and not the problematic error structure which breaks classical assumptions that justify the use of ols to begin with, thereby leaving us without a need for wls and its extra nuisance parameters - the weights.

Too long; didn't read version -- If we correctly transform the data by differencing and using lagged variables, we get a stationary series with only white noise as errors and should therefore use ols and ignore wls

hammy said...

when we find hedge ratio using ordinary least square method and then apply agumented dicky fuller test on the spread(including hedge ratio). is it possible that ADf test says its not cointegrated. if yes. why is so???

Ernie Chan said...

hammy,
Just because one uses OLS to find hedge ratio doesn't guarantee that the resulting time series is stationary. For e.g., the R^2 of the OLS can be close to zero and the fit very poor.
Ernie

Stefan Martinek said...

The backtest of the linear regression strategy applied on the large portfolio is here: http://www.oxfordstrat.com/trading-strategies/linear-regression/

Jacob Seltz said...

Hi guys, Read the book and am now programming a Kalman filter. The issue I'm having, and I'm not sure where it is in my code (I'm coding in Mathematica) is that my Intercept term in Beta is staying very low when it shouldn't be. I'm still getting a grip on the Kalman State updates.

Without going into more detail or providing code (for now) I'm wondering if anyone else experienced the same issue in their implementation.. perhaps they had a matrix operation incorrect.

Note: I'm using a simple Linear Regression model where Beta represents a slope and intercept.

Jacob Seltz said...
This comment has been removed by the author.
Jacob Seltz said...

I believe my State covariance update was not proper. In mathematica I had to make sure to use an Outer Product with K * x[[t]]... which looks like :
Outer[Times,x[[t]],K]

Making the Covariance update:

P = R - Outer[Times,x[[t]],K].R where t is the iterator.

Chicago said...

Instead of trying both products as the dependent variable, try an orthogonal regression (total least squares) approach. It adds value by not assigning regression errors to just one product, but distributes them on an orthogonal basis.

Chicago said...

Instead of trying both products as the dependent variable, try an orthogonal regression (total least squares) approach. It adds value by not assigning regression errors to just one product, but distributes them on an orthogonal basis.

Anonymous said...

Hi, 2 questions:

Do you have a copy of the article on Linear regression by Max Dama I can't find it anywhere.

Ernie Chan said...

You can google "Max Dama" to see links to his articles. I don't have the link specifically to linear regression anymore, but perhaps the link to Quantopian includes that.

Ernie

Unknown said...

Hi Ernie,

This is an interesting article, I do have one question with regards to EMA's. If I was to build a regression model where I smooth the independent variable using an EMA. How do you decide on the value of the alpha weighting? Should this be optimised by finding the lowest MSE? or are there better alternatives?

Ernie Chan said...

Hi Akhil,
Yes, alpha is just any other parameter in a trading model: it needs to be optimized using in-sample data, and the optimized model validated using out-of-sample data. Alternatively, use cross-validation method.
Ernie

Lucas Huang said...

Hi Ernie,

where would it be possible to understand better how to implement the kalman filter; because i am struggling to apply this to a historical series of futures/spot prices.

Thanks,

luke

Ernie Chan said...

Hi Luke,
Have you read my book Algorithmic Trading? It has two examples on using Kalman filter for trading.
Ernie

Jdemp0913 said...

Hi Ernie do you have any examples or references on how one would use a kalman filter for ETF or stock market making?

Ernie Chan said...

Have you checked out the sections on using Kalman Filter for ETF arbitrage and market making in my book Algorithmic Trading?

Ernie

Jdemp0913 said...

Hi Earnie, I have read the Kalman Filter as Markt Making Model on pages 82 and 83 of your book, but I am not sure the kalman equations have been adapted correctly.
I am using 5 minute bars of the SVXY ETF from Jan 1, 2015 to Sept 4, 2015. There are 78 datapoints each day.

I have attempted to adapt the equations from example 3.3m but something seems to be wrong.

For each day the size of y is 79x1.

yhat=NaN(size(y)); % measurement prediction
e=NaN(size(y)); % measurement prediction error
Q=NaN(size(y)); % measurement prediction error variance

% For clarity, we denote R(t|t) by P(t).
% initialize R, P and beta.
R=zeros(1);
P=zeros(1);
Vw=delta/(1-delta)*eye(1);
Ve=0.001;

% initialize the first value to zero
m(1)=0;


% Given initial beta and R (and P)
for t=1:length(y)
if (t > 1)
m(t)=m(t-1); % state prediction. Equation 3.7 (3.15 in MM section)
R=P+Vw; % state covariance prediction. Equation 3.8
end

yhat(t)=m(t); % ** measurement prediction. Equation 3.9

Q(t)=var(m(t))+R; % ** measurement variance prediction. Equation 3.10


% Observe y(t)
e(t)=y(t)-yhat(t); % measurement prediction error

K=R/(R+Ve); % Kalman gain
m(t)=m(t)+ K*e(t); % State update. Equation 3.11 (3.16 in MM section)
P=R-K*m(t)*R; % State covariance update. Equation 3.12 (3.18 in MM section)

end

It seems that I made a mistake somewhere. Any insights would be appreciated.
I love your books and have both of them and am looking forward to your next book.

Thank you

Ernie Chan said...

Hi Jdemp,

The only error I spot in your equations is that

P=R(t|t)=R(t|t-1)-K*R(t|t-1) from Eq. 3.12

which becomes P=(1-K)*R(t|t-1) as in Eq. 3.19.

x does not appear in this application, since the observation model is just the unit matrix. And m should not appear in Eq. 3.12 or 3.19.

Ernie

Jdemp0913 said...

Ernie, Thank you so much for your reply. I have made the correction. I have one more question. On Page 83 you show in equation 3.20 that Ve = (Rt|t-1)*((T/Tmax)-1).

That implies that Ve should be included in the t indexed loop and Trade size (T) should be available. Would you recommend assuming T=Tmax for back-testing purposes?

Thank you again for your help on this Labor Day weekend.

Ernie Chan said...

Hi Jdemp,
No, you should not use T=Tmax for backtesting. It would have removed the essential ingredient in this strategy.

Ernie

Sqrt Alpha said...

Hi,
Did you try grey model? I think it is linear regression with MA and original data itself when set to GM(1,1)

Ernie Chan said...

Hi Sqrt Alpha,
No, I haven't. ARMA itself already incorporates linear regression, so how is this different from ARMA?
Ernie

Sqrt Alpha said...

I am a freshman in quantitative research. Probably I have no precise understanding. Grey Model treats autoregression as differentiate equations in some ways. So maybe they are the same in nature.

Christophe said...

About using a KF as a better alternative to OLS for finding regression coefficients... I am struggling to use an existing KF implementation in Java (the one from apache math3) for that purpose: http://commons.apache.org/proper/commons-math/userguide/filter.html

It seems to me that the code provided in the book as an example for trading EWA / EWC does not exactly implement a KF as usually defined, that is with a constant measurement matrix.

That implementation requires a classical iteration over each measurement, with 2 phases (prediction and then correction) within each iteration step. What bothers me is that if I try to implement the algo in the book I somehow find myself forced to update the measurement matrix within each iteration (a new 3rd phase, between the predict and the correct phases).

By searching on the web I found another example implemented in python: http://www.thealgoengineer.com/2014/online_linear_regression_kalman_filter/ , but here again the algo does not really iterates over each measurement samples. You have to provide the whole set of dependent and independent data at once.

So, my question finally is: has someone already tried to implement a regression with an existing KF package that allows iterating over the measurements?

Ernie Chan said...

Christophe,
I am not sure what you meant by "forced to update the measurement matrix within each iteration". In equation 3.5 of my second book, the measurement matrix is the price series of one of the ETF. So it is of course updated every time step and is not a constant.
Ernie

Christophe said...

Thank you Ernie for your time with this. My point is exactly that the measurement matrix is updated at each step in the book, which does not follow one of the Kalman filter requirements of keeping it constant... unless I misunderstood the equations of kalman filter of course!

Ernie Chan said...

Christophe,
Thanks for your clarification.

Yes, classical derivations of KF equations assume everything is constant, but I am not sure this is a requirement. You should refer to the original paper I cited (Montana et al, 2009) in the book where I adapted this methodology for determining hedge ratio. They may describe the theoretical justifications in details.

Ernie

Christophe said...

Thank you Ernie for the pointer! Will give it a try.

Ben Human said...

Hi Ernie,

I have a quick question regarding the Kalman Filter example you have included in your book Winning Strategies and Their Rationale, as a market making model. You have said in equation 3.17 that the measurement variance prediction Q(t) is var(m(t)) + V. Firstly what look-back does the var calculation assume and secondly shouldn't Q(t) be a function of the measurement variable y(t) not the hidden variable m(t)? Also on the next page you suggest weighting this as a function of the traded volume. Using equation 3.2 I cant seem to get good results, either because R becomes so small (x 10^-6) or because (T/Tmax - 1 ) can go negative. I'm sure there are many ways to do this but I was wondering what sort of results you would expect ? For me the value of Q is so small that yhat(t) hardly changes at all (of course because the gain is close to zero).

Any help on this would be much appreciated!

Many thanks
Ben

Ernie Chan said...

Hi Ben,
Actually, I should have written Eq. 3.17 as Q(t)=R(t|t-1) + Ve instead, making clear that R(t|t-1) is the state covariance variable which is updated at every time step. There is no explicit lookback for computing this covariance. Its values are given by Eq. 3.8 and 3.12. Thanks for pointing out my unfortunate notation.

Q(t) is the variance of the measurement variable y, but is expressed as a function of the covariance of the hidden state variable via Eq. 3.10.

I have in fact tried this approach on some trades data, and although I didn't get results that are better than not using Kalman filter, they weren't too terrible. Did you set Tmax to previous day's volume?

Ernie

John Nash said...

I found this Kalman Filter for Amibroker using PyKalman. How do you see this information?
http://www.marketcalls.in/amibroker/kalman-filter-unscented-kalman-filter-afl-amibroker-using-python-comserver.html

How once can take advantage of Kalman Filter Estimation when comes to stock prediction?

Ernie Chan said...

Hi John,
You can think of KF as a weighted linear regression (lower weights for older data). To use linear regression for prediction, it depends on the input (x), and it will give you the predicted value (y). For example, if there is a relationship between EWA and EWC, and you know the value of EWA, KF will tell you what to expect for EWC.

The details are in my second book Algorithmic Trading.

Ernie

Alejandro said...

Hi Ernie,

Do you mind telling me if Kalman filter in pair trading uses maximum likelihood estimation for the unknown parameters?

greetings from Argentina!

Ernie Chan said...

Hi Alejandro,
Yes, Kalman filter's hyperparameters can be estimated using MLE. See my discussion in Machine Trading, p. 71-80. (KF is one type of state space model.)
Ernie

Jdemp0913 said...

Hi Ernie,

On page 78 of your book Algorithmic Trading example 3.3 (Kalman Filter) you show two values Ve and Vw. Do those values remain constant in the remaining calculations shown on page 79 where you have:

Eq 3.10 Q(t) = x(t,:)*R*x(t,:)' + Ve;

Eq.3.8 where R=P+Vw;

Is the best way (or at least an acceptable way) to get these values is to just do a brute force search on training data?


Also, should the value of Q(end) equal (or at least be close to ) the value of the var(e(:)) where e is the residual vector y(:)-yhat(:)?

Thank you,

Jack

Ernie Chan said...

Hi Jack,
Yes, Ve and Vw are assumed to be constant matrices. In that example, you can certain regard them as hyperparameters to be optimized in training data. The objective function can be minimum squared error of the predicted spread. However, in my 3rd book Machine Trading, I have described using the Econometrics Toolbox's ssm (state space model) function for estimating them using maximum likelihood estimation.

On p.80, I stated that var(e(t)) is indeed Q(t). So you are right.

Ernie

Swimtwitter said...

Hi Ernie,

Season Greetings!

Have queries on this equation you mentioned earlier in this blog post section.

Eq. 3.17 as Q(t)=R(t|t-1) + Ve

Where Ve can be derived from Ve = R(t|t-1) * ((T/Tmax) - 1 ) ) , in the market making section of your book ( page 83 ),

I have 3 queries pertaining to Ve :)

1) Can Ve be Positive ? (i.e. in the case where T > Tmax ) , or shall we cap the value at 1 in case where T is more than Tmax ?


2) If Ve can be positive, when apply back to Eq. 3.17, should Ve be absolute? or carry +,- sign. Because this will impact the calculation of Q(t) .

3) When applying Ve in the Kalman Gain(K) equation (Eq 3.18 R / (R+Ve) ), if Ve is negative, one may have a K > 1 , which i think should not be the case. As I am thinking Ve (Measurement Noise) , should be a magnitude and hence we should take the Absolute (ie. abs(Ve)) ,Could you help confirm? thanks!

Thanks in advance for your time!

Darren

Ernie Chan said...

Hi Darren,
1) Yes, no reason why Ve can't be positive.
2) Ve should always be signed.
3) It may, if the time series exhibits geometric random walk or some other explosive behavior.
Ernie

Swimtwitter said...

hi Earnie,

Thanks for the reply, understood the reason now and have implemented a strategy with some not too bad backtesting result.

I have just a question on volume (Tmax) in Ve = R(t|t-1) * ((T/Tmax) - 1 ) ) , would be great if you could provide some guidance on what is a good method to train to get ideal Tmax Volume. For now, I am taking a fraction of yesterday max volume price to signify the noise in the market , but the fraction is an arbitrary number based on backtesting result ( which is a bad approach) . I didn't have an intuition yet on what to train on to get good Tmax and i search a few literature but to no avail. Any starting direction would be of great help! thx in advance!


Darren

Ernie Chan said...

Hi Darren,
Tmax is a parameter that needs to be optimized. Traditional optimization methods include finding the best Sharpe in train set, or Walk Forward Optimization in finding the best return in a rolling lookback period. But we have recently developed a novel, patent-pending, technique called Conditional Parameter Optimization using machine learning. Watch out for my next blog post on predictnow.ai for that!
Ernie

Swimtwitter said...

thanks for the pointers , look forward for the exciting new stuff!

Darren