Monday, August 18, 2014

Kelly vs. Markowitz Portfolio Optimization

In my book, I described a very simple and elegant formula for determining the optimal asset allocation among N assets:

F=C-1*M   (1)

where F is a Nx1 vector indicating the fraction of the equity to be allocated to each asset, C is the covariance matrix, and M is the mean vector for the excess returns of these assets. Note that these "assets" can in fact be "trading strategies" or "portfolios" themselves. If these are in fact real assets that incur a carry (financing) cost, then excess returns are returns minus the risk-free rate.

Notice that these fractions, or weights as they are usually called, are not normalized - they don't necessarily add up to 1. This means that F not only determines the allocation of the total equity among N assets, but it also determines the overall optimal leverage to be used. The sum of the absolute value of components of F divided by the total equity is in fact the overall leverage. Thus is the beauty of Kelly formula: optimal allocation and optimal leverage in one simple formula, which is supposed to maximize the compounded growth rate of one's equity (or equivalently the equity at the end of many periods).

However, most students of finance are not taught Kelly portfolio optimization. They are taught Markowitz mean-variance portfolio optimization. In particular, they are taught that there is a portfolio called the tangency portfolio which lies on the efficient frontier (the set of portfolios with minimum variance consistent with a certain expected return) and which maximizes the Sharpe ratio. Left unsaid are

  • What's so good about this tangency portfolio?
  • What's the real benefit of maximizing the Sharpe ratio?
  • Is this tangency portfolio the same as the one recommended by Kelly optimal allocation?
I want to answer these questions here, and provide a connection between Kelly and Markowitz portfolio optimization.

According to Kelly and Ed Thorp (and explained in my book), F above not only maximizes the compounded growth rate, but it also maximizes the Sharpe ratio. Put another way: the maximum growth rate is achieved when the Sharpe ratio is maximized. Hence we see why the tangency portfolio is so important. And in fact, the tangency portfolio is the same as the Kelly optimal portfolio F, except for that fact that the tangency portfolio is assumed to be normalized and has a leverage of 1 whereas F goes one step further and determines the optimal leverage for us. Otherwise, the percent allocation of an asset in both are the same (assuming that we haven't imposed additional constraints in the optimization problem). How do we prove this?

The usual way Markowitz portfolio optimization is taught is by setting up a constrained quadratic optimization problem - quadratic because we want to optimize the portfolio variance which is a quadratic function of the weights of the underlying assets - and proceed to use a numerical quadratic programming (QP) program to solve this and then further maximize the Sharpe ratio to find the tangency portfolio. But this is unnecessarily tedious and actually obscures the elegant formula for F shown above. Instead, we can proceed by applying Lagrange multipliers to the following optimization problem (see http://faculty.washington.edu/ezivot/econ424/portfolioTheoryMatrix.pdf for a similar treatment):

Maximize Sharpe ratio = FT*M/(FT*C*F)1/2    (2)

subject to constraint FT*1=1   (3)

(to emphasize that the 1 on the left hand side is a column vector of one's, I used bold face.)

So we should maximize the following unconstrained quantity with respect to the weights Fof each asset i and the Lagrange multiplier λ:

FT*M/(FT*C*F)1/2  - λ(FT*1-1)  (4)

But taking the partial derivatives of this fraction with a square root in the denominator is unwieldy. So equivalently, we can maximize the logarithm of the Sharpe ratio subject to the same constraint. Thus we can take the partial derivatives of 

log(FT*M)-(1/2)*log(FT*C*F)  - λ(FT*1-1)   (5)

with respect to Fi. Setting each component i to zero gives the matrix equation

(1/FT*M)M-(1/FT*C*F)C*F=λ1   (6)

Multiplying the whole equation by Fon the right gives

(1/FT*M)FT*M-(1/FT*C*F)FT*C*F=λFT*1   (7)

Remembering the constraint, we recognize the right hand side as just λ. The left hand side comes out to be exactly zero, which means that λ is zero. A Lagrange multiplier that turns out to be zero means that the constraint won't affect the solution of the optimization problem up to a proportionality constant. This is satisfying since we know that if we apply an equal leverage on all the assets, the maximum Sharpe ratio should be unaffected. So we are left with the matrix equation for the solution of the optimal F:

C*F=(FT*C*F/FT*M)M    (8)

If you know how to solve this for F using matrix algebra, I would like to hear from you. But let's try an ansatz F=C-1*M as in (1). The left hand side of (8) becomes M, the right hand side becomes (FT*M/FT*M)M = M as well. So the ansatz works, and the solution is in fact (1), up to a proportionality constant. To satisfy the normalization constraint (3), we can write

F=C-1*M / (1T*C-1*M)  (9)

So there, the tangency portfolio is the same as the Kelly optimal portfolio, up to a normalization constant, and without telling us what the optimal leverage is.

===
Workshop Update:

Based on popular demand, I have revised the dates for my online Mean Reversion Strategies workshop to be August 27-29. 

===
Follow me @chanep on Twitter.




75 comments:

HK said...

This is a true PHD level post:)
-HK

marco said...

I never thought about Kelly in these terms, I should have done it before. Really elegant solution at the problem.
http://nightlypatterns.wordpress.com

Tom said...

Thank you for this post. I too have always wondered why Markowitz is taught but not Kelly portfolio optimization.

Anonymous said...

Hi Ernie,

Great post.

I have a question about sharpe ratio. Let's say I have an intraday fx strategy and I'm using hourly data. fx trading is 24 hours but my strategy is only active 8 hours a day. Is the correct way to calculate sharpe:

sqrt(252*8) * mean(ret)/std(ret)

or

sqrt(252*24) * mean(ret)/std(ret)

where

ret = hourly return vector (24 hours). Data is actually from 5pm EST Sundays until 4pm EST Fridays


Many thanks

Ernie Chan said...

You should use sqrt(252*8)*mean(ret)/std(ret),
where the ret has only 8 hourly bars. Otherwise the zeros in the other 16 hours will distort the true picture.

Ernie

Anonymous said...

Hi Ernie,

Is it possible to trade intraday US stocks pairs?

Could we get better sharpe ratio?

How long do we need to do backtesting for intraday strategies?

Thanks.

Ernie Chan said...

Sure, intraday stock pairs may improve Sharpe ratio as it may have more trades than interday pairs.

Even with intraday pairs, we need to backtest whether the strategy held up during extreme events, so it is best to start with 2007.

Ernie

Anonymous said...

Hi Ernie,

For pairs trading, how do we size every trade to have a potential equal dollar impact on our portfolio?

Anonymous said...

Hi Ernie,

I find there are 1 second, 5 secs, 15 secs, 30 secs, 1 min , 5 mins bars for intraday.

Do we need to test all of them?

Thanks.

Ernie Chan said...

You can set each side of a pair trade to equal dollar value, or you can use a hedge ratio derived from linear regression or Johansen test to set the number of shares. The returns and risks differ for each approach.

Ernie

Ernie Chan said...

Ideally, you can backtest at the highest frequency. If you desire longer holding period, there are many ways to accomplish that without artificially limiting yourself to lower frequency data. There is nothing magical about 5-min bars vs. 1-min bar. The latter is a superset of the former.

Ernie

Anonymous said...

Hi Ernie,

Thank you for response.

I mean, to size trades from different stocks pairs in our portfolio.
It seems some pairs have higher returns while some have lower returns. Do we need to balance them to smooth equity curve?

It seems this is more important for trend following strategies.

Ernie Chan said...

Allocating capital to different pairs is similar to allocating capital to different stocks. So the topic of my current article is relevant here. Other practitioners prefer allocating capital inversely proportional to volatility, resulting in a minimum variance portfolio. All these different approach are discussed comprehensively in Prof. Ang's book, on top of my Recommended Books list on this blog.

Ernie

Anonymous said...

Hi Ernie,

So we can just use Kelly formula to allocate capital to different stock pairs in our portfolio?

Ernie Chan said...

Yes.

Ernie

Anonymous said...

Hi Ernie,

Could we trade strategies on IB TWS if the holding period is 2 mins?

HK said...

Hi Ernie,

Would you use option in some strategies? Seem like a lot of hedge funds using option with stock/future for statistical arbitrage.

-HK

Ernie Chan said...

Yes, IB's latency is short enough for you to trade at 2-min bars.

Ernie

Ernie Chan said...

Hi HK,
Yes, I have considered using options to implement statarb strategies. (See an earlier blog article of mine on this topic.)
However, I generally find that the bid-ask spread is too large for my strategies.
Ernie

Anonymous said...

Hi Ernie,

To compute returns in Kelly formula,
usually, how long is "one-period"?

Could we set risk-free rate as zero?
Or where could we get that number?

Ernie Chan said...

We typically take one period to be one trading day, but of course it can be one minute, one hour, or one week depending on your strategy.

Risk-free rate is zero if your portfolio or strategy is self-financing. Otherwise, you need to look up the Federal Reserve's website (for US investors) to look up the 3-month treasury rate.

Ernie

Anonymous said...

Hi Ernie,

How long is the lookback window to compute expected returns and variance in Kelly?

Pairs trading is self-financing if I hold dollar neutral positions?

Ernie Chan said...

Minimum of 3 years. Ideally, the lookback will include periods of market stress, just as in a backtest.

Pairs trading is self-financing.

Ernie

Anonymous said...

Hi Ernie,

If "one-period" is one day,
we need to calculate leverage, F* every day using moving windows, 3 years? Drop one return, add one every day?

Btw, IB only have only one year BID, ASK one min bars. Where could we get longer historical BID, ASK bars?

Thanks.

Ernie Chan said...

Yes.

Follow the link called "High Frequency Historical Data" in the Links section of the right side bar of my blog. That is the cheapest source. (See also the Tech Update section of my article Short Interest as a Factor.)

Ernie

HK said...

Hi Ernie,

Thanks for the reply. There are many inactive ETF in Hong Kong market which let me trade worldwide index. For example, there is Brazil index ETF. Most of the time the bid-ask spread is still reasonable with one official unit different, sometimes it would be two official unit different. I guess the ETF management companies would not take advantage in the price since they suppose to earn from management fee.

So is there any disadvantage to trade inactive ETF?

-HK

Ernie Chan said...

Hi HK,
Besides the bid-ask spread, one should pay attention to the bid-ask sizes as well. Are they big enough to support your proposed order? If so, there is no reason not to trade these ETFs.
Ernie

Anonymous said...

Hi Ernie,

I just find that we can buy Quote Booster packs to get 4 years historical data in IB. So we may get 4 years 1 min BID/ASK bars in IB.

Have you heard about that?

Ernie Chan said...

Good to know that - thanks!
Ernie

Anonymous said...

Hi Ernie,

If we trade intraday stock pairs,
we need to read companies news every day?

Ernie Chan said...

Yes, if you want to avoid pairs that dis-cointegrate.

Ernie

Anonymous said...

Hi Ernie,

Stocks Pairs trading is a little bit like gap trading.

I mean, every morning, we have stocks gap up, gap down, which generate buy/sell signals for stock pairs.

HK said...

Dear Ernie,

What do you think about LPPL model for bubble burst forecasting? Is there any easier way to understand it and implement it with coding?

-HK

Ernie Chan said...

Hi HK,
I have not studied the LPPL model, and it has a low priority for me because we tend to trade market-neutral models, so bubbles in asset prices are not of major concern to us.
Ernie

Anonymous said...

Hi Ernie,

For stocks sectors and industries categories, there are some different lists. I am a little bit confused. Would you please recommend one?

Ernie Chan said...

There are many more industry groups than sectors. Industry group is a fine-grained categorization, while sector is more coarse-grained.

Which one to use depends on your strategy.

Ernie

Anonymous said...

Hi Ernie,

Thank you for quick response.

Would you please recommend websites or documents which provide US stocks lists for different industry groups and sectors?

I find that, to some extent, they are all different. For example,
sectors and groups in IB TWS filter are different those in Yahoo finance.
Ideally, in the same groups or sectors, companies have similar business activities.

Thanks.

Ernie Chan said...

Yahoo Finance has list of stocks in various industries or sectors. E.g. http://biz.yahoo.com/p/821conameu.html

or http://biz.yahoo.com/p/515conameu.html

Ernie

Anonymous said...

Hi Ernie,

For a stock pair, I got two sets of statistic. The first has Sharpe ratio 3.36, 33% return, 147 trades a year(69% wins), max drawdown 3.95%.

The second has Sharpe ratio 3.45,20% return, 33 trades a year(79% wins),
max drawdown 1.12%.

Which one would you pick to trade in real-time?

Thanks

Ernie Chan said...

For a mean reverting strategy, I am concerned about tail risk, so I like to look at Calmar ratio as well.

The first one has Calmar ratio=8.4, the second one is 16.7. So the second one has higher Sharpe and Calmar ratio, and I prefer that.

Ernie

Anonymous said...

Hi Ernie,

I just read your managed accounts.
You only have one big drawdown.
May I ask what happened in the market on Sep. 2011 causing that drawdown?

Do you still trade ETFs pairs or stock pairs? Or you focus on fx.

Ernie Chan said...

In Sept 2011, there was a strange day for the Mexican Peso, which moved more than 2.5% in a few hours. Their central bank since then has decided to intervene in the markets to keep peso from changing more than a certain band daily and therefore this tail risk was eliminated.

We still trade long-short stock portfolios. Not trading ETFs at the moment, but may start again soon.

Ernie

Anonymous said...

Hi Ernie,


What is the difference between
long-short stock portfolios and stock pairs trading?

I find some people trade stock pairs in the opposite way (not mean-reverting, but directional). Do you have any comments about that?
Thanks.

Ernie Chan said...

Long-short stock portfolios involve many long and short stocks, not just a pair. We bet on "cross-sectional" mean reversion or momentum.

A example where we can trade a pair using momentum strategy is when it is a merger arbitrage.

Ernie

Anonymous said...

Hi Ernie,

IS it ok to trade 40 stock pairs at the same time if we find they are profitable in backtesting?

Ernie Chan said...

Why not?
Ernie

Anonymous said...

Great post Ernie! Just what I was looking for.

Many thanks, Tom

Tom said...

Hi Ernie, I don't want to waste too much of your time but if you have a minute, would you care giving a little more detail of the step where you go from the derivative of the log() to the next expression. You mention that you set Fi=0, wouldn't that set the whole expression to zero? I would be grateful for some more explanation. Thanks so much.

Ernie Chan said...

Hi Tom,
When I wrote "...we can take the partial derivatives of
log(FT*M)-(1/2)*log(FT*C*F) - λ(FT*1-1) (5)
with respect to Fi. Setting each component i to zero ..." I did not mean setting Fi to zero. I meant setting the expression that results from taking the partial derivative w.r.t. Fi to zero.

Hope this helps.

Ernie

Tom said...

Doh! Of course, now I get it. Thank you so much for your help. Really good work!

Tom

Anonymous said...

Hi Ernie,
I am reading through your 2nd book. Like your style, code and details. You have provided nice examples with code. However, I was sad to see no code for example 8.1, though it was implicitly referred in another part of the book. I wanted to see your implementation.
Regards
Anon

Ernie Chan said...

Example 8.1 does not require code. I have already displayed the entire arithmetic calculation on what amount of stock to sell under constant leverage. The arithmetic calculation can be done on a simple calculator or by hand.

Ernie

Anonymous said...

I meant something on p164 for, "The APR of trading xxx is 15 percent with a Sharpe ratio of 1.8 from October 12, 2011, to October 25, 2012"
Regards,
Anon

Ernie Chan said...

The strategy described on p. 164 is very simple to backtest - hence no code was provided. You can do this on Excel.

Ernie

yifan said...

Hi Ernie

What happens if there is a positive value constraint for portfolio allocation?

Ernie Chan said...

Hi Yifan,
Generally speaking, imposing inequality constraints makes the optimization problem insolvable analytically. So you would have to resort to numerical solutions. See http://en.wikipedia.org/wiki/Karush%E2%80%93Kuhn%E2%80%93Tucker_conditions

Ernie

manuka said...

After reading some theory (e.g. Estrada, 2010, "Geometric mean optimisation"), it seems to me that the Kelly criterion does not lead to sharpe maximisation, but to growth maximisation, and that the two generally lead to different outcomes. I.e. if you use CAPM, you maximise sharpe, if you use Kelly, you maximise the expected geometric return.

Ernie Chan said...

Hi manuka,
The formula for the maximum growth rate is g=r+S^2/2, where r is the risk free rate and S is the Sharpe ratio (see http://www.edwardothorp.com/sitebuildercontent/sitebuilderfiles/KellyCriterion2007.pdf). So maximum growth rate coincides with maximum Sharpe ratio.

Ernie

manuka said...

Ernie,
Thanks for your excellent comment, and sorry for my very tardy reply - I was travelling and could not access the blog. Having read the paper you refer to, you are absolutely right. Intuition failed me: I thought if Kelly tells us how much to allocate to each investment, then surely we should just look to maximise Kelly, i.e. we could use Kelly both for sizing and for choosing between investments...no doubt this is wrong. In fact, playing around with some simple examples shows that a higher sharpe ratio does not always lead to a higher Kelly: for example suppose the risk free rate is 0, and we have two strategies A and B. mu A = 0.3, std A=0.15; mu B=0.2, std B=0.11. Here srategy A has a higher sharpe, but Kelly guides us to invest a higher portion into B. And this is the path that leads to maximised expected geometric growth. Another very interesting thing about the link you referred me to is that Thorpe's derivation shows a reason for maximising the Sharpe ratio (i.e. maximum geometric growth of wealth) which has nothing to do with minimum variance portfolio theory...i.e. he derives the capital asset pricing model key result with a whole different set of assumptions. Thanks for the help.

Ernie Chan said...

Hi manuka,
I think you may be a little confused. We can indeed use Kelly to choose the optimal leverage, and to optimally allocate investments (see my book Quantitative Trading's chapter 6 for examples.)

The maximum growth rate formula I quoted before only works when we are levered at exactly the Kelly leverage. Otherwise maximizing Sharpe ratio does not in general lead to maximum growth. You can definitely simultaneously optimize leverage and the Sharpe ratio: Kelly ratio of a portfolio will tell you how much you need to leverage, it is independent of the internal asset allocations. But if you compute the Kelly ratio of each individual asset taking into account their covariances of returns, then each individual Kelly ratio will tell you the asset allocation as well as the individual (and portfolio level) leverage to use.

Ernie

manuka said...

Ernie,
Many Thanks again for your insightful reply. I did the derivation for half Kelly and found that it implies a geometric (per period) growth of r+3/8*S^2, i.e. a bit lower than the one for the full Kelly, but still implying that the growth is maximised when sharpe ratio is maximised. Hence my current understanding is that in order to maximise the geometric growth: 1. Always pick investment strategies with the highest sharpe ratios 2. Allocate between them using Kelly (or fractional Kelly), 3. Leverage according to Kelly (relatively). I think this is in line with what you are saying in both your books, but part of my confusion was the wrong view that I can forget about sharpe ratios altogether and just pick those strategies that show the highest Kelly. Thanks again, and I hope I got it right now.

Ernie Chan said...

Hi Manuka,
Yes, I agree with your latest statements.
Ernie

Wen said...

Hi Ernie,

The Kelly formula may give negative weight to a strategy. For a long-only portfolio, can the Kelly formula still apply with minor revision? Thank you!

Wen

Ernie Chan said...

Hi Wen,
Yes, you can add the positivity constraint when you maximize the Sharpe ratio in formulas 2-3 above.
Ernie

PavelB said...

Hi Ernie,

Am I right that this formula F=C-1*M implies equal leverage for different components of a portfolio? For example, if we have two components with F = (2.5, 4.5) than overall leverage will be 7.0 and weights 2.5/7.0=0.36 and 4.5/7.0=0.64. So each component has the same leverage 7 and weights 0.36 and 0.64 summing up to 1. If we have 100$ than we have to buy component one by value 100*7*0.36=252$ and component two by value 100*7*0.64=448$ summing up our portfolio to 700$ according to leverage 7.

What if we calculate leverage using Kelly formula for each component separately before building portfolio? And then use leveraged returns as input into portfolio optimization with constraint on weights summing them to 1? My intuition tells me that in this case we will get different weights (from the first approach) and as we used different leverages for each component, the overall leverage of portfolio will remain the same (as in the first approach where it was 7 in the example). What do you think?

thank you
Pavel

Ernie Chan said...

Hi Pavel,
No, F=C^(-1)*M most certainly does not imply equal leverage. The whole point of using this formula is to find out what the optimal leverages are for each instrument in the portfolio. Those leverage are given by F. So if F=(2.5, 4.5), you should apply 2.5 leverage to the first stock, and 4.5 leverage to the second. Remember, leverage is with respect to account equity, which of course is the same for all stocks in the account.

Assuming that account has $1M, you would trade stock 1 with market value $2.5M, and stock 2 with market value $4.5M.

Ernie

Emiil said...

David Varadi posted a study using the kelly maximization formula for multiple trading strategies with spearman's rank correlation along-side pearson. Apperently spearman worked better, which I can see the reasoning for (taking non-linear relationships into account) but seems to lack a mathematical justification. Do you think it would make sense to use spearman in the same formula derived using standard correlation?

Ernie Chan said...

Emil,
The simple formula for Kelly assumes normality of returns. If one has to use Spearman correlation because returns are not normal, then the entire derivation won't work.
Ernie

Franco said...

Hello Ernie, I got to your blog through some Youtube videos I saw. I must say that its the first time I see someone with such a firm theoretical mathematical approach to these topics, I really like the style and for us "engineer-minded" folks it really helps.

Sorry to come back to such an old post but by no means outdated, let me cut to the chase.
I am trying to find good information about asset allocation for active strategies, but I am stuck with buy and hold approach...

How would you adapt this analysis for strategies such as mid-term trend following, where the distribution of assets keep changing constantly, but still want to have a large invested ratio of the total capital?
If I optimize the portfolio (either Kelly or Markowitz) for each asset "Under surveillance", then I will get a distribution of assets that will then not be real since I may not be invested in all of them if the signals are not triggered for the strategy. On the other hand, recalculating and re-balancing too often seems impractical and not too cost effective.

I would appreciate your insight on this topic and any reference on where to get further data on how to apply asset allocation for active strategies.

Thanks!



Ernie Chan said...

Hi Franco,
The returns discussed in this post do not have to refer to "assets". It can refer to a portfolio of strategies. So the capital allocation would be to different strategies. If you only have one strategy, then Kelly optimization would just give you the optimal leverage to apply on that strategy.

I actually discuss this in some details in the first chapter of my new book "Machine Trading".

Ernie

Franco said...

Having only one strategy but applied to different assets should bring a matrix of optimal leverages for that mix, since the means and variances of returns of the same strategy applied to different assets should be substantially different if the correlations are low and the assets have different dynamics, shouldn't it? Eg, if I want to trade oil, gold, S&P on the same strategy, then each one will have its optimal leverage ratio (same strategy on different underlying is actually like a new strategy).
So this means that every time one asset has an entry signal or exit signal leverages need to be recalculated for the set of assets that are currently with open positions for the one strategy being applied?

I will try to get a hold of your book in the meanwhile...

Thanks for the comments.

Ernie Chan said...

Hi Franco,
Yes, if you apply same strategy to different assets, Kelly formula will determine both asset allocation and overall leverage simultaneously.

However, this doesn't mean that you should recalculate the leverage on each asset every time there is an entry or exit. The leverage applies to the maximum position you should hold, irrespective of whether your strategy actually recommends a position at any given moment.

Ernie

igrivin said...

I am a little confused: what if the covariance matrix is singular (or almost singular), as is usually true in "real life" market applications for multiple securities. Your formula blows up, which is a bit counterintuitive...

igrivin said...

The formula at the beginning of the post blows up if the covariance matrix is singular (or close to it). This seems counter-intuitive (in particular, if you have two strategies, with the returns of one simply equal to the returns of other plus, say 5 bp /day. You should obviously allocate all your money to the "better" strategy", there is no obvius instability created).

Ernie Chan said...

Hi igrivin,
The singular case where 2 assets are perfectly correlated is indeed interesting, but the allocation scheme in that case isn't what you described. It would be to short the asset with the lower return by an infinite amount, and use the money to long the other by an infinite amount. Hence the mathematical difficulty.

This difficulty can be avoided by taking the limit of the correlation going to 1. For e.g. correl(1, 2)=0.9, var(1)=0.1, var(2)=0.1, M=[1 0.1]', hence C=[0.1 0.9*sqrt(0.1*0.1); 0.9*sqrt(0.1*0.1) 0.1], inv(C)*M=[453 -447]'. If correl(1,2) -> 0.999, inv(C)*M -> [4503 -4497]. You see the pattern?

Ernie