## Monday, August 18, 2014

### Kelly vs. Markowitz Portfolio Optimization

In my book, I described a very simple and elegant formula for determining the optimal asset allocation among N assets:

F=C-1*M   (1)

where F is a Nx1 vector indicating the fraction of the equity to be allocated to each asset, C is the covariance matrix, and M is the mean vector for the excess returns of these assets. Note that these "assets" can in fact be "trading strategies" or "portfolios" themselves. If these are in fact real assets that incur a carry (financing) cost, then excess returns are returns minus the risk-free rate.

Notice that these fractions, or weights as they are usually called, are not normalized - they don't necessarily add up to 1. This means that F not only determines the allocation of the total equity among N assets, but it also determines the overall optimal leverage to be used. The sum of the absolute value of components of F divided by the total equity is in fact the overall leverage. Thus is the beauty of Kelly formula: optimal allocation and optimal leverage in one simple formula, which is supposed to maximize the compounded growth rate of one's equity (or equivalently the equity at the end of many periods).

However, most students of finance are not taught Kelly portfolio optimization. They are taught Markowitz mean-variance portfolio optimization. In particular, they are taught that there is a portfolio called the tangency portfolio which lies on the efficient frontier (the set of portfolios with minimum variance consistent with a certain expected return) and which maximizes the Sharpe ratio. Left unsaid are

• What's the real benefit of maximizing the Sharpe ratio?
• Is this tangency portfolio the same as the one recommended by Kelly optimal allocation?
I want to answer these questions here, and provide a connection between Kelly and Markowitz portfolio optimization.

According to Kelly and Ed Thorp (and explained in my book), F above not only maximizes the compounded growth rate, but it also maximizes the Sharpe ratio. Put another way: the maximum growth rate is achieved when the Sharpe ratio is maximized. Hence we see why the tangency portfolio is so important. And in fact, the tangency portfolio is the same as the Kelly optimal portfolio F, except for that fact that the tangency portfolio is assumed to be normalized and has a leverage of 1 whereas F goes one step further and determines the optimal leverage for us. Otherwise, the percent allocation of an asset in both are the same (assuming that we haven't imposed additional constraints in the optimization problem). How do we prove this?

The usual way Markowitz portfolio optimization is taught is by setting up a constrained quadratic optimization problem - quadratic because we want to optimize the portfolio variance which is a quadratic function of the weights of the underlying assets - and proceed to use a numerical quadratic programming (QP) program to solve this and then further maximize the Sharpe ratio to find the tangency portfolio. But this is unnecessarily tedious and actually obscures the elegant formula for F shown above. Instead, we can proceed by applying Lagrange multipliers to the following optimization problem (see http://faculty.washington.edu/ezivot/econ424/portfolioTheoryMatrix.pdf for a similar treatment):

Maximize Sharpe ratio = FT*M/(FT*C*F)1/2    (2)

subject to constraint FT*1=1   (3)

(to emphasize that the 1 on the left hand side is a column vector of one's, I used bold face.)

So we should maximize the following unconstrained quantity with respect to the weights Fof each asset i and the Lagrange multiplier λ:

FT*M/(FT*C*F)1/2  - λ(FT*1-1)  (4)

But taking the partial derivatives of this fraction with a square root in the denominator is unwieldy. So equivalently, we can maximize the logarithm of the Sharpe ratio subject to the same constraint. Thus we can take the partial derivatives of

log(FT*M)-(1/2)*log(FT*C*F)  - λ(FT*1-1)   (5)

with respect to Fi. Setting each component i to zero gives the matrix equation

(1/FT*M)M-(1/FT*C*F)C*F=λ1   (6)

Multiplying the whole equation by Fon the right gives

(1/FT*M)FT*M-(1/FT*C*F)FT*C*F=λFT*1   (7)

Remembering the constraint, we recognize the right hand side as just λ. The left hand side comes out to be exactly zero, which means that λ is zero. A Lagrange multiplier that turns out to be zero means that the constraint won't affect the solution of the optimization problem up to a proportionality constant. This is satisfying since we know that if we apply an equal leverage on all the assets, the maximum Sharpe ratio should be unaffected. So we are left with the matrix equation for the solution of the optimal F:

C*F=(FT*C*F/FT*M)M    (8)

If you know how to solve this for F using matrix algebra, I would like to hear from you. But let's try an ansatz F=C-1*M as in (1). The left hand side of (8) becomes M, the right hand side becomes (FT*M/FT*M)M = M as well. So the ansatz works, and the solution is in fact (1), up to a proportionality constant. To satisfy the normalization constraint (3), we can write

F=C-1*M / (1T*C-1*M)  (9)

So there, the tangency portfolio is the same as the Kelly optimal portfolio, up to a normalization constant, and without telling us what the optimal leverage is.

===
Workshop Update:

Based on popular demand, I have revised the dates for my online Mean Reversion Strategies workshop to be August 27-29.

=== HK said...

This is a true PHD level post:）
-HK

marco said...

I never thought about Kelly in these terms, I should have done it before. Really elegant solution at the problem.
http://nightlypatterns.wordpress.com Tom said...

Thank you for this post. I too have always wondered why Markowitz is taught but not Kelly portfolio optimization. Anonymous said...

Hi Ernie,

Great post.

I have a question about sharpe ratio. Let's say I have an intraday fx strategy and I'm using hourly data. fx trading is 24 hours but my strategy is only active 8 hours a day. Is the correct way to calculate sharpe:

sqrt(252*8) * mean(ret)/std(ret)

or

sqrt(252*24) * mean(ret)/std(ret)

where

ret = hourly return vector (24 hours). Data is actually from 5pm EST Sundays until 4pm EST Fridays

Many thanks

Ernie Chan said...

You should use sqrt(252*8)*mean(ret)/std(ret),
where the ret has only 8 hourly bars. Otherwise the zeros in the other 16 hours will distort the true picture.

Ernie Anonymous said...

Hi Ernie,

Could we get better sharpe ratio?

How long do we need to do backtesting for intraday strategies?

Thanks.

Ernie Chan said...

Sure, intraday stock pairs may improve Sharpe ratio as it may have more trades than interday pairs.

Even with intraday pairs, we need to backtest whether the strategy held up during extreme events, so it is best to start with 2007.

Ernie Anonymous said...

Hi Ernie,

For pairs trading, how do we size every trade to have a potential equal dollar impact on our portfolio? Anonymous said...

Hi Ernie,

I find there are 1 second, 5 secs, 15 secs, 30 secs, 1 min , 5 mins bars for intraday.

Do we need to test all of them?

Thanks.

Ernie Chan said...

You can set each side of a pair trade to equal dollar value, or you can use a hedge ratio derived from linear regression or Johansen test to set the number of shares. The returns and risks differ for each approach.

Ernie

Ernie Chan said...

Ideally, you can backtest at the highest frequency. If you desire longer holding period, there are many ways to accomplish that without artificially limiting yourself to lower frequency data. There is nothing magical about 5-min bars vs. 1-min bar. The latter is a superset of the former.

Ernie Anonymous said...

Hi Ernie,

Thank you for response.

I mean, to size trades from different stocks pairs in our portfolio.
It seems some pairs have higher returns while some have lower returns. Do we need to balance them to smooth equity curve?

It seems this is more important for trend following strategies.

Ernie Chan said...

Allocating capital to different pairs is similar to allocating capital to different stocks. So the topic of my current article is relevant here. Other practitioners prefer allocating capital inversely proportional to volatility, resulting in a minimum variance portfolio. All these different approach are discussed comprehensively in Prof. Ang's book, on top of my Recommended Books list on this blog.

Ernie Anonymous said...

Hi Ernie,

So we can just use Kelly formula to allocate capital to different stock pairs in our portfolio?

Ernie Chan said...

Yes.

Ernie Anonymous said...

Hi Ernie,

Could we trade strategies on IB TWS if the holding period is 2 mins? HK said...

Hi Ernie,

Would you use option in some strategies? Seem like a lot of hedge funds using option with stock/future for statistical arbitrage.

-HK

Ernie Chan said...

Yes, IB's latency is short enough for you to trade at 2-min bars.

Ernie

Ernie Chan said...

Hi HK,
Yes, I have considered using options to implement statarb strategies. (See an earlier blog article of mine on this topic.)
However, I generally find that the bid-ask spread is too large for my strategies.
Ernie Anonymous said...

Hi Ernie,

To compute returns in Kelly formula,
usually, how long is "one-period"?

Could we set risk-free rate as zero?
Or where could we get that number?

Ernie Chan said...

We typically take one period to be one trading day, but of course it can be one minute, one hour, or one week depending on your strategy.

Risk-free rate is zero if your portfolio or strategy is self-financing. Otherwise, you need to look up the Federal Reserve's website (for US investors) to look up the 3-month treasury rate.

Ernie Anonymous said...

Hi Ernie,

How long is the lookback window to compute expected returns and variance in Kelly?

Pairs trading is self-financing if I hold dollar neutral positions?

Ernie Chan said...

Minimum of 3 years. Ideally, the lookback will include periods of market stress, just as in a backtest.

Ernie Anonymous said...

Hi Ernie,

If "one-period" is one day,
we need to calculate leverage, F* every day using moving windows, 3 years? Drop one return, add one every day?

Btw, IB only have only one year BID, ASK one min bars. Where could we get longer historical BID, ASK bars?

Thanks.

Ernie Chan said...

Yes.

Follow the link called "High Frequency Historical Data" in the Links section of the right side bar of my blog. That is the cheapest source. (See also the Tech Update section of my article Short Interest as a Factor.)

Ernie HK said...

Hi Ernie,

Thanks for the reply. There are many inactive ETF in Hong Kong market which let me trade worldwide index. For example, there is Brazil index ETF. Most of the time the bid-ask spread is still reasonable with one official unit different, sometimes it would be two official unit different. I guess the ETF management companies would not take advantage in the price since they suppose to earn from management fee.

-HK

Ernie Chan said...

Hi HK,
Besides the bid-ask spread, one should pay attention to the bid-ask sizes as well. Are they big enough to support your proposed order? If so, there is no reason not to trade these ETFs.
Ernie Anonymous said...

Hi Ernie,

I just find that we can buy Quote Booster packs to get 4 years historical data in IB. So we may get 4 years 1 min BID/ASK bars in IB.

Ernie Chan said...

Good to know that - thanks!
Ernie Anonymous said...

Hi Ernie,

we need to read companies news every day?

Ernie Chan said...

Yes, if you want to avoid pairs that dis-cointegrate.

Ernie Anonymous said...

Hi Ernie,

I mean, every morning, we have stocks gap up, gap down, which generate buy/sell signals for stock pairs. HK said...

Dear Ernie,

What do you think about LPPL model for bubble burst forecasting? Is there any easier way to understand it and implement it with coding?

-HK

Ernie Chan said...

Hi HK,
I have not studied the LPPL model, and it has a low priority for me because we tend to trade market-neutral models, so bubbles in asset prices are not of major concern to us.
Ernie Anonymous said...

Hi Ernie,

For stocks sectors and industries categories, there are some different lists. I am a little bit confused. Would you please recommend one?

Ernie Chan said...

There are many more industry groups than sectors. Industry group is a fine-grained categorization, while sector is more coarse-grained.

Which one to use depends on your strategy.

Ernie Anonymous said...

Hi Ernie,

Thank you for quick response.

Would you please recommend websites or documents which provide US stocks lists for different industry groups and sectors?

I find that, to some extent, they are all different. For example,
sectors and groups in IB TWS filter are different those in Yahoo finance.
Ideally, in the same groups or sectors, companies have similar business activities.

Thanks.

Ernie Chan said...

Yahoo Finance has list of stocks in various industries or sectors. E.g. http://biz.yahoo.com/p/821conameu.html

or http://biz.yahoo.com/p/515conameu.html

Ernie Anonymous said...

Hi Ernie,

For a stock pair, I got two sets of statistic. The first has Sharpe ratio 3.36, 33% return, 147 trades a year(69% wins), max drawdown 3.95%.

The second has Sharpe ratio 3.45,20% return, 33 trades a year(79% wins),
max drawdown 1.12%.

Which one would you pick to trade in real-time?

Thanks

Ernie Chan said...

For a mean reverting strategy, I am concerned about tail risk, so I like to look at Calmar ratio as well.

The first one has Calmar ratio=8.4, the second one is 16.7. So the second one has higher Sharpe and Calmar ratio, and I prefer that.

Ernie Anonymous said...

Hi Ernie,

You only have one big drawdown.
May I ask what happened in the market on Sep. 2011 causing that drawdown?

Do you still trade ETFs pairs or stock pairs? Or you focus on fx.

Ernie Chan said...

In Sept 2011, there was a strange day for the Mexican Peso, which moved more than 2.5% in a few hours. Their central bank since then has decided to intervene in the markets to keep peso from changing more than a certain band daily and therefore this tail risk was eliminated.

We still trade long-short stock portfolios. Not trading ETFs at the moment, but may start again soon.

Ernie Anonymous said...

Hi Ernie,

What is the difference between
long-short stock portfolios and stock pairs trading?

I find some people trade stock pairs in the opposite way (not mean-reverting, but directional). Do you have any comments about that?
Thanks.

Ernie Chan said...

Long-short stock portfolios involve many long and short stocks, not just a pair. We bet on "cross-sectional" mean reversion or momentum.

A example where we can trade a pair using momentum strategy is when it is a merger arbitrage.

Ernie Anonymous said...

Hi Ernie,

IS it ok to trade 40 stock pairs at the same time if we find they are profitable in backtesting?

Ernie Chan said...

Why not?
Ernie Anonymous said...

Great post Ernie! Just what I was looking for.

Many thanks, Tom Tom said...

Hi Ernie, I don't want to waste too much of your time but if you have a minute, would you care giving a little more detail of the step where you go from the derivative of the log() to the next expression. You mention that you set Fi=0, wouldn't that set the whole expression to zero? I would be grateful for some more explanation. Thanks so much.

Ernie Chan said...

Hi Tom,
When I wrote "...we can take the partial derivatives of
log(FT*M)-(1/2)*log(FT*C*F) - λ(FT*1-1) (5)
with respect to Fi. Setting each component i to zero ..." I did not mean setting Fi to zero. I meant setting the expression that results from taking the partial derivative w.r.t. Fi to zero.

Hope this helps.

Ernie Tom said...

Doh! Of course, now I get it. Thank you so much for your help. Really good work!

Tom Anonymous said...

Hi Ernie,
I am reading through your 2nd book. Like your style, code and details. You have provided nice examples with code. However, I was sad to see no code for example 8.1, though it was implicitly referred in another part of the book. I wanted to see your implementation.
Regards
Anon

Ernie Chan said...

Example 8.1 does not require code. I have already displayed the entire arithmetic calculation on what amount of stock to sell under constant leverage. The arithmetic calculation can be done on a simple calculator or by hand.

Ernie Anonymous said...

I meant something on p164 for, "The APR of trading xxx is 15 percent with a Sharpe ratio of 1.8 from October 12, 2011, to October 25, 2012"
Regards,
Anon

Ernie Chan said...

The strategy described on p. 164 is very simple to backtest - hence no code was provided. You can do this on Excel.

Ernie

yifan said...

Hi Ernie

What happens if there is a positive value constraint for portfolio allocation?

Ernie Chan said...

Hi Yifan,
Generally speaking, imposing inequality constraints makes the optimization problem insolvable analytically. So you would have to resort to numerical solutions. See http://en.wikipedia.org/wiki/Karush%E2%80%93Kuhn%E2%80%93Tucker_conditions

Ernie

manuka said...

After reading some theory (e.g. Estrada, 2010, "Geometric mean optimisation"), it seems to me that the Kelly criterion does not lead to sharpe maximisation, but to growth maximisation, and that the two generally lead to different outcomes. I.e. if you use CAPM, you maximise sharpe, if you use Kelly, you maximise the expected geometric return.

Ernie Chan said...

Hi manuka,
The formula for the maximum growth rate is g=r+S^2/2, where r is the risk free rate and S is the Sharpe ratio (see http://www.edwardothorp.com/sitebuildercontent/sitebuilderfiles/KellyCriterion2007.pdf). So maximum growth rate coincides with maximum Sharpe ratio.

Ernie

manuka said...

Ernie,
Thanks for your excellent comment, and sorry for my very tardy reply - I was travelling and could not access the blog. Having read the paper you refer to, you are absolutely right. Intuition failed me: I thought if Kelly tells us how much to allocate to each investment, then surely we should just look to maximise Kelly, i.e. we could use Kelly both for sizing and for choosing between investments...no doubt this is wrong. In fact, playing around with some simple examples shows that a higher sharpe ratio does not always lead to a higher Kelly: for example suppose the risk free rate is 0, and we have two strategies A and B. mu A = 0.3, std A=0.15; mu B=0.2, std B=0.11. Here srategy A has a higher sharpe, but Kelly guides us to invest a higher portion into B. And this is the path that leads to maximised expected geometric growth. Another very interesting thing about the link you referred me to is that Thorpe's derivation shows a reason for maximising the Sharpe ratio (i.e. maximum geometric growth of wealth) which has nothing to do with minimum variance portfolio theory...i.e. he derives the capital asset pricing model key result with a whole different set of assumptions. Thanks for the help.

Ernie Chan said...

Hi manuka,
I think you may be a little confused. We can indeed use Kelly to choose the optimal leverage, and to optimally allocate investments (see my book Quantitative Trading's chapter 6 for examples.)

The maximum growth rate formula I quoted before only works when we are levered at exactly the Kelly leverage. Otherwise maximizing Sharpe ratio does not in general lead to maximum growth. You can definitely simultaneously optimize leverage and the Sharpe ratio: Kelly ratio of a portfolio will tell you how much you need to leverage, it is independent of the internal asset allocations. But if you compute the Kelly ratio of each individual asset taking into account their covariances of returns, then each individual Kelly ratio will tell you the asset allocation as well as the individual (and portfolio level) leverage to use.

Ernie

manuka said...

Ernie,
Many Thanks again for your insightful reply. I did the derivation for half Kelly and found that it implies a geometric (per period) growth of r+3/8*S^2, i.e. a bit lower than the one for the full Kelly, but still implying that the growth is maximised when sharpe ratio is maximised. Hence my current understanding is that in order to maximise the geometric growth: 1. Always pick investment strategies with the highest sharpe ratios 2. Allocate between them using Kelly (or fractional Kelly), 3. Leverage according to Kelly (relatively). I think this is in line with what you are saying in both your books, but part of my confusion was the wrong view that I can forget about sharpe ratios altogether and just pick those strategies that show the highest Kelly. Thanks again, and I hope I got it right now.

Ernie Chan said...

Hi Manuka,
Yes, I agree with your latest statements.
Ernie

Wen said...

Hi Ernie,

The Kelly formula may give negative weight to a strategy. For a long-only portfolio, can the Kelly formula still apply with minor revision? Thank you!

Wen

Ernie Chan said...

Hi Wen,
Yes, you can add the positivity constraint when you maximize the Sharpe ratio in formulas 2-3 above.
Ernie

PavelB said...

Hi Ernie,

Am I right that this formula F=C-1*M implies equal leverage for different components of a portfolio? For example, if we have two components with F = (2.5, 4.5) than overall leverage will be 7.0 and weights 2.5/7.0=0.36 and 4.5/7.0=0.64. So each component has the same leverage 7 and weights 0.36 and 0.64 summing up to 1. If we have 100\$ than we have to buy component one by value 100*7*0.36=252\$ and component two by value 100*7*0.64=448\$ summing up our portfolio to 700\$ according to leverage 7.

What if we calculate leverage using Kelly formula for each component separately before building portfolio? And then use leveraged returns as input into portfolio optimization with constraint on weights summing them to 1? My intuition tells me that in this case we will get different weights (from the first approach) and as we used different leverages for each component, the overall leverage of portfolio will remain the same (as in the first approach where it was 7 in the example). What do you think?

thank you
Pavel

Ernie Chan said...

Hi Pavel,
No, F=C^(-1)*M most certainly does not imply equal leverage. The whole point of using this formula is to find out what the optimal leverages are for each instrument in the portfolio. Those leverage are given by F. So if F=(2.5, 4.5), you should apply 2.5 leverage to the first stock, and 4.5 leverage to the second. Remember, leverage is with respect to account equity, which of course is the same for all stocks in the account.

Assuming that account has \$1M, you would trade stock 1 with market value \$2.5M, and stock 2 with market value \$4.5M.

Ernie Emiil said...

David Varadi posted a study using the kelly maximization formula for multiple trading strategies with spearman's rank correlation along-side pearson. Apperently spearman worked better, which I can see the reasoning for (taking non-linear relationships into account) but seems to lack a mathematical justification. Do you think it would make sense to use spearman in the same formula derived using standard correlation?

Ernie Chan said...

Emil,
The simple formula for Kelly assumes normality of returns. If one has to use Spearman correlation because returns are not normal, then the entire derivation won't work.
Ernie Franco said...

Hello Ernie, I got to your blog through some Youtube videos I saw. I must say that its the first time I see someone with such a firm theoretical mathematical approach to these topics, I really like the style and for us "engineer-minded" folks it really helps.

Sorry to come back to such an old post but by no means outdated, let me cut to the chase.
I am trying to find good information about asset allocation for active strategies, but I am stuck with buy and hold approach...

How would you adapt this analysis for strategies such as mid-term trend following, where the distribution of assets keep changing constantly, but still want to have a large invested ratio of the total capital?
If I optimize the portfolio (either Kelly or Markowitz) for each asset "Under surveillance", then I will get a distribution of assets that will then not be real since I may not be invested in all of them if the signals are not triggered for the strategy. On the other hand, recalculating and re-balancing too often seems impractical and not too cost effective.

I would appreciate your insight on this topic and any reference on where to get further data on how to apply asset allocation for active strategies.

Thanks!

Ernie Chan said...

Hi Franco,
The returns discussed in this post do not have to refer to "assets". It can refer to a portfolio of strategies. So the capital allocation would be to different strategies. If you only have one strategy, then Kelly optimization would just give you the optimal leverage to apply on that strategy.

I actually discuss this in some details in the first chapter of my new book "Machine Trading".

Ernie Franco said...

Having only one strategy but applied to different assets should bring a matrix of optimal leverages for that mix, since the means and variances of returns of the same strategy applied to different assets should be substantially different if the correlations are low and the assets have different dynamics, shouldn't it? Eg, if I want to trade oil, gold, S&P on the same strategy, then each one will have its optimal leverage ratio (same strategy on different underlying is actually like a new strategy).
So this means that every time one asset has an entry signal or exit signal leverages need to be recalculated for the set of assets that are currently with open positions for the one strategy being applied?

I will try to get a hold of your book in the meanwhile...

Ernie Chan said...

Hi Franco,
Yes, if you apply same strategy to different assets, Kelly formula will determine both asset allocation and overall leverage simultaneously.

However, this doesn't mean that you should recalculate the leverage on each asset every time there is an entry or exit. The leverage applies to the maximum position you should hold, irrespective of whether your strategy actually recommends a position at any given moment.

Ernie Anonymous said...

I am a little confused: what if the covariance matrix is singular (or almost singular), as is usually true in "real life" market applications for multiple securities. Your formula blows up, which is a bit counterintuitive... Anonymous said...

The formula at the beginning of the post blows up if the covariance matrix is singular (or close to it). This seems counter-intuitive (in particular, if you have two strategies, with the returns of one simply equal to the returns of other plus, say 5 bp /day. You should obviously allocate all your money to the "better" strategy", there is no obvius instability created).

Ernie Chan said...

Hi igrivin,
The singular case where 2 assets are perfectly correlated is indeed interesting, but the allocation scheme in that case isn't what you described. It would be to short the asset with the lower return by an infinite amount, and use the money to long the other by an infinite amount. Hence the mathematical difficulty.

This difficulty can be avoided by taking the limit of the correlation going to 1. For e.g. correl(1, 2)=0.9, var(1)=0.1, var(2)=0.1, M=[1 0.1]', hence C=[0.1 0.9*sqrt(0.1*0.1); 0.9*sqrt(0.1*0.1) 0.1], inv(C)*M=[453 -447]'. If correl(1,2) -> 0.999, inv(C)*M -> [4503 -4497]. You see the pattern?

Ernie Anonymous said...

Hi Ernie,

I have a strategy with 20 assets. Each day a go long 10 and short 10. Any suggestions how I can optimize allocation for these assets with the long/short constraint in mind?

Many thanks

Ernie Chan said...

Hi,
Are the assets the same every day? Or are you selecting different assets to long and short each day?
Ernie Anonymous said...

I have the same 20 assets but each day I select different assets to go long and short.

Many thanks

Ernie Chan said...

Once you decided what assets to buy and short each day, you can definitely run Markowitz portfolio optimization on that to determine size. After all, this is a one-period optimization.

Alternatively, you may use a risk parity allocation: assign capital weights inversely proportional to the asset historical volatility.

Ernie

Liam said...

Suppose you have the classic biased coin which comes up heads with probability p and an opportunity to bet at evens on this (heads returns 1, tails returns -1). The mean return M is 2p-1. The variance C of the return is 4p(1-p). (C is a 1x1 covariance matrix)

The formula (C^-1)M gives the optimal stake as (2p-1)/(4p(p-1)). But this is not the same as the Kelly stake which is 2p-1.

I think the problem may be that the covariance matrix should actually be the correlation matrix. This would fix the problem neatly in this simple case and I would expect that it is the general answer too. (I will check this)

Ernie Chan said...

Liam,
The Kelly formula for discrete outcomes is inapplicable to investment finance, which has continuous outcomes.
Ernie

Liam said...

That is true, but note that I used Thorp's formula from your article to compare. If it was precise for arbitrary continuous distributions of returns, it would be precise for discrete outcomes (which can be very well approximated with continuous distributions).

I have compared examples with a continuous distribution of real-valued outcomes now as well. I chose a log normal distribution for returns with mean zero (which gives a positive mean return when sigma is non-zero). As I am sure you know, this avoids the technical difficulty with using a normal distribution for returns, which contains negative values (and the same for any non-zero leverage), which it is why it is popular in the field.

What I found is that when the sigma parameter of the log normal distribution is small (so returns are small percentages of either sign and the unleveraged returns have a roughly normal distribution (with no values below -100% * leverage), Thorp's formula, as in your article gives a close approximation to the optimal leverage. When sigma is not small, and the unleveraged returns are also quite skew (to the upside - the downside remains bounded by -100% * leverage) the approximation becomes increasingly misleading. As it happens, due to the special properties of the log normal distribution with mean zero, the optimal leverage for this is always 0.5, regardless of what sigma is. Intuitively, this is because combining a sequence of log normal distributions of return gives another with a larger sigma, because the sum of a set of gaussians is gaussian.

This does not mean the formula cannot be used, but it is important to be aware that it implicitly assumes the distribution of returns is roughly normal, which may not always be so.

Here is a python script which explores the way the approximation behaves for different values of sigma in the log normal distribution and plots a graph of the comparison for a variety of log normal distributions with different values of sigma. I would post an image of the graph, but I don't see how to here.

import numpy as np
import matplotlib.pyplot as plt

# arrays of results to be calculated for distributions of returns with different variance
empirical_best_leverage = np.zeros(20) # the empirical best leverage
average_over_variance = np.zeros(20) # the formula (C^-1)*mu

# define parameters for log normal distributions of returns
log_mu = 0
sigma = np.arange(1, 21) / 10 # 0.1, 0.2, ..., 2.0

for i in range(20):
print(".", end="")

# take a large sample of values with different values of sigma
log_return = np.random.normal(loc=log_mu, scale=sigma[i], size=1000000)

# unleveraged_return is > -1. -1 would be an entire loss of capital (the 1e-24 is for numerical stability)
unleveraged_return = np.exp(log_return) - 1 + 1e-24

variance = np.var(unleveraged_return)
average = np.mean(unleveraged_return)
average_over_variance[i] = average / variance
mean_log_leveraged_return = [np.mean(np.log(1 + leverage * unleveraged_return)) for leverage in np.arange(100) / 100]
empirical_best_leverage[i] = (np.arange(100)/100)[np.argmax(mean_log_leveraged_return)]

calc_lev, = plt.plot(sigma, average_over_variance, label='Average over variance')
emp_lev, = plt.plot(sigma, empirical_best_leverage, label='Empirical best leverage')

plt.title('Comparison of calculated leverage and empirical best leverage')
plt.xlabel("sigma for log normal distribution")
plt.ylabel("leverage")
plt.legend([calc_lev, emp_lev], ['Average over variance', 'Empirical best leverage'])
plt.show()

Ernie Chan said...

Liam,
Yes, I quite agree that there is an implicit Gaussian assumption in Thorp's Kelly formula for continuous outcomes. Which is why it isn't widely used in practice. (Imagine using Kelly leverage for the SPX just before Black Monday!)
Ernie

Liam said...

Indeed! The probability of extreme adverse events is extremely important to the optimum leverage and while it is a common mantra to traders to limit their losses, events like the financial crisis make clear that very large concerns can fail to do this (for whatever reasons) with catastrophic results.

Unknown said...

With the typical F*1 = 1 constraint one cannot never have F = {0.5, -0.5} balanced 2 asset long short as a solution. This means the connection between Markowitz and Kelly breaks. How do you fix this problem?

Ernie Chan said...

I don't see why F=[-0.5 0.5] cannot be a solution. Let's say M=[1; -0.75], C=[1 -1; -1 0.5], then F=inv(C)*M=[0.5; -0.5].

Unknown said...

Because in Markovitz case it won't satisfy the budget constraint of F*1 = 1. In this case F*1 is 0.5 - 0.5 = 0. The two scenarios (Markowitz and Kelly) unfortunately not always equivalent up to a constant.

Ernie Chan said...

Actually, in all cases, Kelly and Markowitz are only related up to normalization constant (budget contraint). That's because Kelly is really unconstrained w.r.t. leverage. But the point is that they give the same proportional allocation.

Unknown said...

No they are not. When the Kelly weights add up to 0, it creates a singularity in the Markowitz solution due to the 0 in the denominator. Hence two solutions are not constant multiple of each other anymore. Please check the math before bluntly disagreeing.

Ernie Chan said...

I see what you mean. I started the derivation with the constraint that the assets' weights sum to 1. And you wanted to explore the case when they sum to zero. Indeed this derivation does not apply to that case. But are you sure that the usual mean variance optimization will work with this singular constraint? To be precise: are you sure that quadratic optimization will converge numerical to a stable solution with this singular constraint?

Most derivations of mean variance optimization that I have seen assume the same constraint that I used. For e.g. https://en.wikipedia.org/wiki/Modern_portfolio_theory. If you insist on a singular constraint, it is possible that neither approach will work, and so it doesn't really negate their equivalence under non-singular constraints.

Unknown said...

I am not insisting on a singular constraint.

Those derivations are incorrect. If you look at the paper you referenced, http://faculty.washington.edu/ezivot/econ424/portfolioTheoryMatrix.pdf, page 10 when solving for lagrange multiplier lambda he makes a crucial mistake by assuming that portfolio weights never add up to 0 (by diving by the sum of weights). But sometimes they actually do (especially true for pairs trading). Hence it needs to be solved separately around that region.

If you use a solver it should find a solution with the given budget constraint, that is not proportional to Kelly.

Ernie Chan said...

Portfolio weights not adding up to zero is not a mistake, it is an assumption. I assert it is an assumption in most if not all mean variance derivations, just like mine above.

It is possible that you can solve it numerically with quadratic optimization with your unusual constraint ( though you haven’t demonstrated that ), but that doesn’t have anything to do with this analytical derivation. What we are saying is that under the usual constraint, Kelly and Markowitz are equivalent. Unusual constraints like the one you suggested may or may not be solved analytically, and we look forward to your demonstration that it can be done either numerically or analytically. After solving them, you can then demonstrate that the Kelly solution is not proportional to the Markowitz solution. I remain in doubt of that last proposition.

Unknown said...

I haven't communicated properly. I am not assuming that weights add up to 0. In the derivation when he is solving for lambda he implicitly assumes that (eq 1.12) 1'*C^(-1)*1 to be non-zero for min variance portfolio, or similarly 1'*C^(-1)*R for max sharpe portfolio. This may or may not be true (if it is 0 you cannot divide by this expression anymore to solve for lambda). If this condition is not satisfied then there is no solution that is proportional to Kelly in the Markowitz world.

Please look at the derivation, before again bluntly disagreeing. I am not placing a different budget constraint.

Ernie Chan said...

Yes, but isn't the assumption that 1'*C^(-1)*R = sum(F) not= 0 the same as the assumption that the sum of the allocated capital weights not=0?

Unknown said...

If what you are saying is true and if he is indeed making this assumption, then you should automatically see that Markowitz solution cannot be always a constant multiplier of Kelly, because Kelly doesn't make the same assumption, and may in fact yield a weight vector of [0.5 -0.5], which will never be a finite multiple of a Markowitz solution.

Ernie Chan said...

Yes, in those special cases, Kelly and Markowitz do not give same solutions.