Friday, April 05, 2019

The most overlooked aspect of algorithmic trading

Many algorithmic traders justifiably worship the legends of our industry, people like Jim Simons, David Shaw, or Peter Muller, but there is one aspect of their greatness most traders have overlooked. They have built their businesses and vast wealth not just by sitting in front of their trading screens or scribbling complicated equations all day long, but by collaborating and managing other talented traders and researchers. If you read the recent interview of Simons, or the book by Lopez de Prado (head of machine learning at AQR), you will notice that both emphasized a collaborative approach to quantitative investment management. Simons declared that total transparency within Renaissance Technologies is one reason of their success, and Lopez de Prado deemed the "production chain" (assembly line) approach the best meta-strategy for quantitative investment. One does not need to be a giant of the industry to practice team-based strategy development, but to do that well requires years of practice and trial and error. While this sounds no easier than developing strategies on your own, it is more sustainable and scalable - we as individual humans do get tired, overwhelmed, sick, or old sometimes. My experience in team-based strategy development falls into 3 categories: 1) pair-trading, 2) hiring researchers, and 3) hiring subadvisors. Here are my thoughts.

From Pair Programming to Pair Trading


Software developers may be familiar with the concept of "pair programming". I.e. two programmers sitting in front of the same screen staring at the same piece of code, and taking turns at the keyboard. According to software experts, this practice reduces bugs and vastly improves the quality of the code.  I have found that to work equally well in trading research and executions, which gives new meaning to the term "pair trading".

The more different the pair-traders are, the more they will learn from each other at the end of the day. One trader may be detail-oriented, while another may be bursting with ideas. One trader may be a programmer geek, and another may have a CFA. Here is an example. In financial data science and machine learning, data cleansing is a crucial step, often seriously affecting the validity of the final results. I am, unfortunately, often too impatient with this step, eager to get to the "red meat" of strategy testing. Fortunately, my colleagues at QTS Capital are much more patient and careful, leading to much better quality work and invalidating quite a few of my bogus strategies along the way. Speaking of invalidating strategies, it is crucial to have a pair-trader independently backtest a strategy before trading it, preferably in two different programming languages. As I have written in my book, I backtest with Matlab and others in my firm use Python, while the final implementation as a production system by my pair-trader Roger is always in C#. Often, subtle biases and bugs in a strategy will be revealed only at this last step. After the strategy is "cross-validated" by your pair-trader, and you have moved on to live trading, it is a good idea to have one human watching over the trading programs at all times, even for fully automated strategies.  (For the same reason, I always have my foot ready on the brake even though my car has a collision avoidance system.) Constant supervision requires two humans, at least, especially if you trade in international as well as domestic markets.

Of course, pair-trading is not just about finding bugs and monitoring live trading. It brings to you new ideas, techniques, strategies, or even completely new businesses. I have started two hedge funds in the past. In both cases, it started with me consulting for a client, and the consulting progressed to a collaboration, and the collaboration became so fruitful that we decided to start a fund to trade the resulting strategies.

For balance, I should talk about a few downsides to pair-trading. Though the final product's quality is usually higher, collaborative work often takes a lot longer. Your pair-trader's schedule may be different from yours. If the collaboration takes the form of a formal partnership in managing a fund or business, be careful not to share ultimate control of it with your pair-trading partner (sharing economic benefits is of course necessary). I had one of my funds shut down due to the early retirement of my partner. One of the reasons I started trading independently instead of working for a large firm is to avoid having my projects or strategies prematurely terminated by senior management, and having a partner involuntarily shuts you down is just as bad.

Where to find your pair-trader? Publish your ideas and knowledge to social media is the easiest way (note this blog here). Whether you blog, tweet, quora, linkedIn, podcast, or youTube, if your audience finds you knowledgeable, you can entice them to a collaboration.

Hiring Researchers

Besides pair-trading with partners on a shared intellectual property basis, I have also hired various interns and researchers, where I own all the IP. They range from undergraduates to post-doctoral researchers (and I would not hesitate to hire talented high schoolers either.) The difference with pair-traders is that as the hired quants are typically more junior in experience and hence require more supervision, and they need to be paid a guaranteed fee instead of sharing profits only. Due to the guaranteed fee, the screening criterion is more important.  I found short interviews, even one with brain teasers, to be quite unpredictive of future performance (no offence, D.E. Shaw.) We settled on giving an applicant a tough financial data science problem to be done at their leisure. I also found that there is no particular advantage to being in the same physical office with your staff. We have worked very well with interns spanning the globe from the UK to Vietnam.

Though physical meetings are unimportant, regular Google Hangouts with screen-sharing is essential in working with remote researchers. Unlike with pair-traders, there isn't time to work together on coding with all the different researchers. But it is very beneficial to walk through their codes whenever results are available. Bugs will be detected, nuances explained, and very often, new ideas come out of the video meetings. We used to have a company-wide weekly video meetings where a researcher would present his/her results using Powerpoints, but I have found that kind of high level presentation to be less useful than an in-depth code and result review. Powerpoint presentations are also much more time-consuming to prepare, whereas a code walk-through needs little preparation.

Generally, even undergraduate interns prefer to develop a brand new strategy on their own. But that is not necessarily the most productive use of their talent for the firm. It is rare to be able to develop and complete a trading strategy using machine learning within a summer internship. Also, if the goal of the strategy is to be traded as an independent managed account product (e.g. our Futures strategy), it takes a few years to build a track record for it to be marketable. On the other hand, we can often see immediate benefits from improving an existing strategy, and the improvement can be researched within 3 or 4 months. This also fits within the "production chain" meta-strategy described by Lopez de Prado above, where each quant should mainly focus on one aspect of the strategy production.

This whole idea of emphasizing improving existing strategies over creating new strategies was suggested to us by our post-doctoral researcher, which leads me to the next point.

Sometimes one hires people because we need help with something we can do ourselves but don't have time to. This would generally be the reason to hire undergraduate interns. But sometimes, I hire  people who are better than I am at something. For example, despite my theoretical physics background, my stochastic calculus isn't top notch (to put it mildly). This is remedied by hiring our postdoc Ray who found tedious mathematics a joy rather than a drudgery. While undergraduate interns improve our productivity, graduate and post-doctoral researchers are generally able to break new ground for us. For these quants, they require more freedom to pursue their projects, but that doesn't mean we can skip the code reviews and weekly video conferences, just like what we do with pair-traders.

Some firms may spend a lot of time and money to find such interns and researchers using professional recruiters. In contrast, these hires generally found their way to us, despite our minuscule size. That is because I am known as an educator (both formally as adjunct faculty in universities, as well as informally on social media and through books). Everybody likes to be educated while getting paid. If you develop a reputation of being an educator in the broadest sense, you shall find recruits coming to you too.

Hiring Subadvisors

If one decides to give up on intellectual property creation, and just go for returns on investment, finding subadvisors to trade your account isn't a bad option. After all, creating IP takes a lot of time and money, and finding a profitable subadvisor will generate that cash flow and diversify your portfolio and revenue stream while you are patiently doing research. (In contrast to Silicon Valley startups where the cash for IP creation comes from venture capital, cash flow for hedge funds like ours comes mainly from fees and expense reimbursements, which are quite limited unless the fund is large or very profitable.)

We have tried a lot of subadvisors in the past. All but one failed to deliver. Why? That is because we were cheap. We picked "emerging" subadvisors who had profitable, but short, track records, and charged lower fees. To our chagrin, their long and deep drawdown typically immediately began once we hired them. There is a name for this: it is called selection bias. If you generate 100 geometric random walks representing the equity curves of subadvisors, it is likely that one of them has a Sharpe ratio greater than 2 if the random walk has only 252 steps. 

Here, I simulated 100 normally distributed returns series with 252 bars, and sure enough, the maximum Sharpe ratio of those is 2.8 (indicated by the red curve in the graph below.)



(The first 3 readers who can email me a correct analytical expression with a valid proof that describes the cumulative probability P of obtaining a Sharpe ratio greater than or equal to S of a normally distributed returns series of length T will get a free copy of my book Machine Trading. At their option, I can also tweet their names and contact info to attract potential employment or consulting opportunities.)

These lucky subadvisors are unlikely to maintain their Sharpe ratios going forward. To overcome this selection bias, we adopted this rule: whenever a subadvisor approaches us, we time-stamp that as Day Zero. We will only pay attention to the performance thereafter. This is similar in concept to "paper trading" or "walk-forward testing". 

Subadvisors with longer profitable track records do pass this test more often than "emerging" subadvisors. But these subadvisors typically charge the full 2 and 20 fees, and the more profitable ones may charge even more. Some investors balk at those high fees. I think these investors suffer from a behavioral finance bias, which for lack of a better term I will call "Scrooge syndrome". Suppose one owns Amazon's stock that went up 92461% since IPO. Does one begrudge Jeff Bezo's wealth? Does one begrudge the many millions he rake in every day? No, the typical investor only cares about the net returns on equity. So why does this investor suddenly becomes so concerned with the difference between gross and net return of a subadvisor? As long as the net return is attractive, we shouldn't care how much fees the subadvisor is raking in. Renaissance Technologies' Medallion Fund reportedly charges 5 and 44, but most people would jump at the chance of investing if they were allowed.

Besides fees, some quant investors balk at hiring subadvisors because of pride. That is another behavioral bias, which is known as the "NIH syndrome" (Not Invented Here). Nobody would feel diminished buying AAPL even though they were not involved in creating the iPhone at Apple, why should they feel diminished paying for a service that generates uncorrelated returns? Do they think they alone can create every new strategy ever discoverable by humankind?

Epilogue


Your ultimate wealth when you are 100 years old will more likely be determined by the strategies created by your pair-traders, your consultants/employees, and your subadvisors, than the amazing strategies you created in your twenties. Hire well.

===

Industry update


1) A python package for market simulations by Techila is available here. It enables easy parallel computations.

2) A very readable new book on using R in Finance by Jonathan Regenstein, who is the Director of Financial Services Practice at RStudio.

3) PsyQuation now provides an order flow sentiment indicator.

4) Larry Connors published a new book on simple but high Sharpe ratio strategies. I enjoyed reading it very much.

5) QResearch is a backtest platform for the Chinese stock market for non-programmers. 

6) Logan Kane describes an innovative application of volatility prediction here.

7) If you aren't following @VolatilityQ on Twitter, you are missing out on a lot of quant research and alphas.

45 comments:

aagold said...

Very good post! Glad to see your back to blogging, at least for today.

Your comments on "selection bias" are well taken, so I understand why you've adopted your new policy of "whenever a subadvisor approaches us, we time-stamp that as Day Zero. We will only pay attention to the performance thereafter".

However, isn't that policy mathematically sub-optimal? For example, let's say 10 subadvisors approach you on Day Zero and they all have 1-year *live* (not backtested) track records. Let's say you monitor them all for the next year. To decide which of the 10 to invest with, wouldn't it make most sense to consider the full 2-year track record and pick the one with the best sharpe ratio? Doesn't make sense to throw away the year-1 data if it's a real (non-backtested) track record, right?

Ernie Chan said...

Hi aagold,
Actually, by tracking the forward performance of the subadvisor, we have already used the information that their past performance was good. If it wasn't good, they wouldn't even approach us (and if they did approach us, we wouldn't bother to track its forward performance.)
Ernie

aagold said...

Ok, then how about this example. Let's say two subadvisors A and B approach you with a 1-year live track record. A has Sharpe Ratio SR=1 and B has SR=2. So A is good, but B is really good that first year. During year 2, A has SR=1.3 and B has SR=1.25. Who do you choose? Mathematically, seems clear you should choose B, right?

ZHD said...

Not necessarily. You have to dig in much deeper into their respective universes and stated strategies.

One thing that's problematic in your hypothetical is the implicit assumption that you're annualizing Sharpe ratios from higher frequency returns -- which in most cases isn't legit. However, if we're not annualizing from higher frequency returns, then your B would've seen a very bad year to bring the Sharpe down that much. So the variance in the Sharpe becomes a concern on it's own.

However, it really depends on the product groups and strategy types the subadvisors are supposedly deploying. Fund of fund managers will increase allocations to funds that have had a nearly zero return year under certain circumstances including underperformance.

aagold said...

Ok, but I was really just trying to address the theoretical concept of "selection bias" Ernie brought up, and to consider the method he's using to address it. Sounds like his method is to effectively apply a threshold to the live track record at Day Zero: if the performance is greater than some threshold to be considered "good", then he starts measuring their returns from Day Zero and doesn't consider the prior track record at all. My point is that's sub-optimal. Better to analyze the entire live track record to make the decision.

ZHD said...

Haha I think he was being a bit more informal than what you're getting at. The implication is that people with trading systems don't approach others for investment unless the backtests (or short track records) are good. So by the fact that someone has approached you, you can assume the historical results are good, so you then start watching the forward ones.

Tianyi said...
This comment has been removed by the author.
Ernie Chan said...

Aagold,
It is a valid question of how much weight we should apply to the Sharpe ratio prior to Day Zero, vs the weight we apply to forward period Sharpe. I would certainly put less weight on the past due to selection bias. As to the optimal allocation, I don't have a mathematical model for that yet.

However, if the forward period Sharpe is significantly lower than the look back period, it is a red flag. It may be a result of bad luck, or it could be selection bias at work.
Ernie

aagold said...

Yeah, I got that ZHD. And I realize I probably have too much time on my hands and am making a big deal out of something people probably don't really care about.... it's happened before! :-)
However - I still think considering just the forward returns, and completely ignoring the *live* historical track record before the person approached you (other than confirming it's "good") doesn't make much sense.
IMO there's really no solution to the "selection bias" problem other than having a long enough live track record, under enough different types of market regimes, for the trading strategy being used.

aagold said...

Ernie,

Glad you at least think my comment raised a valid question! :-)

Maybe I haven't thought this through well enough, but it seems to me that as long as the track record prior to Day Zero is a valid live track record, and not backtested, then it should be given equal weight to the forward period Sharpe.

To me, this seems like a classic case of "regression to the mean" and separating out luck from skill. In his book Thinking Fast and Slow Kahneman gives some classic examples, like in a two day golf tournament, it's highly likely that the top performer in Day 1 will do worse in Day 2 than they did in Day 1. That's because their performance in Day 1 was partially due to luck and partially due to skill.

So, it's certainly true that people who did exceptionally well in a 1-year track record are likely to approach you, and like the golf tournament example above, it's highly likely they won't do as well in year 2 as they did in year 1. However, I still think it's true that there's no better way of differentiating luck from skill than simply considering the full 2-year track record..

aagold said...

Thought about this a bit more on a long drive... I find the topic quite interesting (obviously).

If we assume a stationary return distribution, so it has for example an unknown but fixed mean and variance, then throwing away all past data before Day Zero does completely eliminate selection bias. Therefore, statistics measured after Day Zero are an unbiased estimate of the true (infinite test time) statistics.

If we include the backward data, I agree it will be *biased* upwards due to selection bias.
However, it will have lower *variance* than if we throw away the backward data. So, to arrive at the best estimate of true future return, I think there's a classic bias-variance tradeoff that needs to be addressed.

Ernie Chan said...

Aagold,
The mathematical problem we are trying to solve is this: assume the returns are sampled from some unknown distribution with Sharpe of S*. We observed past year has Sharpe S1, and forward year has Sharpe S2. What is the cumulative probability that S* >= S0, where S0 is at least 0 but typically chosen to be 1?

Using Bayes' theorem: P(S*|S1, S2)=P(S2|S1, S*)*P(S1, S*)/P(S1, S2)

If we assume that the returns are distributed independently (i.e. no serial codependence), then P(S1, S2)=P(S1)*P(S2), and P(S2|S1, S*)=P(S2|S*), and P(S1, S*)=P(S1|S*)*P(S*). Hence
P(S*|S1, S2)=P(S1|S*)*P(S2|S*)*P(S*)/(P(S1)*P(S2)). Hence indeed S1 and S2 are symmetric.

However, if returns are not independent, than knowing S1 tells us something about S2. In fact, I would argue that if S1 and S2 are highly positively correlated (as we should hope for a consistent strategy's returns and Sharpe ratios), and we find S1 and S2 to be very different, it *may* imply that CDF(S* >= S0) to be small. I haven't worked out the math in this case, but I hope some reader would!

Ernie

Logan Kane said...

Very interesting post, Ernie! The subadvisor idea has me thinking differently about what's possible for a trading business. I agree with everything you wrote.

Ernie Chan said...

Thanks Logan!
Ernie

Socrates said...

You seem to have missed that Lopez de prado is leaving AQR! Bad timing.

aagold said...

Hi Ernie,

Here's a mathematical description of how I'm thinking about this problem. I think this is the simplest formulation of the type of "selection bias" you discussed in this post. It's simpler than our actual problem of interest, but I think it retains the key elements. If a solution to this probability question exists, it seems like someone should have solved it already. I may post this to stack exchange or someplace like that.
Let's say we have N independent normal random variables X_i ~ N(mu_i, sigma_i), each with unknown mean mu_i and unknown standard deviation sigma_i, i=0...N-1.

M independent random samples are drawn from each of the N random random variables: x_ij, i=0....N-1, j=0...M-1

Compute N sample means m_i = (1/M)*sum_j {x_ij}
and N sample variances v_i = (1/(M-1))*sum_j{(x_ij - m_i)^2}.

Search for index imax such that m_imax = max_i{m_i}. That is, imax is the index of the random variable with the maximum sample mean.

What is the best unbiased estimate of true mean mu_imax? Is it simply m_imax? Probably not, due to the form of "selection bias" discussed in this post.

Ernie Chan said...

Hi aagold,
Your formulation of the problem is interesting, but I am thinking of our house rule as more of a hypothesis testing.

Given the track record of a strategy, we estimate that it has a true Sharpe ratio S1. What is the probability that the forward Sharpe ratio is S2, which may be much lower than S1? If the probability is small, then we reject the hypothesis that the true Sharpe ratio is S1.

Ernie

aagold said...

Don't we need to somehow model a large number (N) of potential subadvisors that ran their trading strategy over the lookback period, and that only those few that generated exceptionally good returns will approach you and show their track record? That's the key effect I'm trying to model.

In my formulation,I simplified it to only the best 1 out of N and to estimation of the true mean of a gaussian random variable. If this simpler problem were solved, perhaps it could be extended to a hypothesis test on the true sharpe ratio, as you've described.

I'm kind of surprised I haven't been able to find a known solution to the problem as I formulated it. Seem like this sort of effect would show up in many situations. Maybe I just haven't found the right terminology and search words...


Ernie Chan said...

aagold,
Ah, now I see your point!
Yes, your model may indeed capture this effect - looking forward to the solution!
Ernie

aagold said...

Ernie,

I'm continuing to pursue the problem as I formulated it, since I find it interesting, but since there's no practical way of knowing N, I think your rule of only considering returns after Day Zero is probably the best approach. I'll certainly let you know if I make progress on my formulation though.

One caveat that occurs to me: even with your Day Zero rule, there's some potential for selection bias if you start paying attention to too many potential sub-advisors. If you pay attention to too many after Day Zero, someone will eventually have good forward returns just by getting lucky...

Regards,
aagold

Ernie Chan said...

aagold,
In this business of uncertainty, I think luck is inevitable.
So besides using statistics to screen out selection bias, we will have to take into account qualitative factors such as whether the strategy has particularly favorable/unfavorable market regimes, and whether the logic matches the regime out/under-performance.
Ernie

DeeGee said...

Hi -- Great post. Thanks, Ernie.
One thing -- Is the link to the Connors book the right book? It seems to be an older title. Thanks.

Ernie Chan said...

Thanks DeeGee!
You are right. His new book is Buy the Greed, Sell the Fear. I have updated the link.
Ernie

Pont said...

Regarding your question on the CDF for the Sharpe ratio: are you referring to the arithmetic Sharpe ratio (as described by Sharpe, arithmetic sample mean) or the geometric Sharpe ratio (geometric sample mean)? Since you simulate a geometric random walk, I guess you are searching for the CDF for the geometric Sharpe. In that case, the result is a sum of lognormal variables, which does not have an exact closed form CDF. Please let me know how to interpret this question.

Ernie Chan said...

Pont,
That is a good question. However, Sharpe ratio is conventionally computed from the arithmetic average of net (not log) returns. I actually simulated a set of Gaussian numbers as returns, and the compounded them geometrically to obtain the geometric random walk. But that does not affect the Sharpe ratio computation.

Regarding closed form CDF, you are free to just express your answer in terms of the CDF of standard distributions, whether those are expressed in terms of integrals or not.

By the way, as I just tweeted at @chanep, I have announced the 3 winners already. But I would be happy to send you the solution to this problem if you submit your answer.

Ernie

Ken said...

Hello,

I think that the problem of sub advisor selection is linked to the fact that the population of subadvisors is unknown.
Scientific papers have the same problem with the p-value at 0.05. How many hypothesis have been tested to find something behind 0.05 ?
Marcos Lopez de Prado apply it to quantitative modelling saying we can't use out of sample results to improve our modelling because then this out of sample become in sample.

Returning to our subadvisors selection, we should weight the sharp ratio of subadvisors by the likely population of subadvisors. If the subadvisor is trading trend following/smart beta stuff (mainstream strategies) on a very small account, it is likely that subadvisor population is very high and thus some subadvisors could have excellent results just by luck.

On the contrary, if the subadvisor is trading an original strategy on a big account/with many clients, the population of subadvisor is likely to be very small and the results less due to luck.










Ernie Chan said...

Hi Ken,
Indeed, we do not select subadvisors based on purely statistical criteria. The logic and nature of the strategy are important inputs, as is its co-dependence with our other strategies. Certainly the AUM of the strategy is another input.
Ernie

Watua said...

That's a very interesting article Ernie, you have brought the attribute of leading strategically through building different parts of the team that eventually win. I think the three strategies you have just highlighted are the exact working parts that any winning organization should have. That is simply remarkable.

Regarding the question you asked:
What is correct analytical expression with a valid proof that describes the cumulative probability P of obtaining a Sharpe ratio greater than or equal to S of a normally distributed returns series of length T

What is the answer to this?

Thanks!

Ernie Chan said...

Hi Watua,
Thank you for your kind words!

The solution of the challenge can be found in Bailey and Lopez de Prado 2012 "The Sharpe Ratio Efficient Frontier" equations 10-11.

Ernie

Cameron Wicentowich said...

Hi Ernie,

Now that the proverbial, "cat is out of the bag" a few of comments:

1) Prado et al. provide the asymptotic distribution of the Sharpe Ratio (SR) for
both normal (eqn. 4) and non-normal (eqn. 8) i.i.d returns

2) Eqns. 10 and 11 use asymptotic normality albeit with a correction to the standard
deviation because the statistic is the SR, and because one doesn't ever really
know the true SR, we replace it with the sample SR. (At least we have a law of
large numbers so that's quite reasonable, although i.i.d in practice seems quite
strong assumption, so one may need a larger scale parameter to estimate
confidence intervals )

3) Under the normal i.i.d returns for finite n, SR~T(n)/sqrt(n) where T(n) is
distributed as a non-central T distribution with n-1 degrees of freedom because
the denominator is the unbiased sample variance
(https://en.wikipedia.org/wiki/Noncentral_t-distribution).
The non-central parameter is sqrt(n) sr, where sr is the population Sharpe Ratio

4) The paper by BENTKUS et al., Bernoulli 13(2), 2007, 346–364
(https://projecteuclid.org/download/pdfview_1/euclid.bj/1179498752)
shows that T(n)~sqrt(n) sr + v Z, Z~N(0,1) for large n (Remark 2.1); v is a unit
standard deviation corrected for skew and kurtosis in the returns; (has to be
what Prado et.al. give in eqn. 8, but I didn't check it)

5) For large n then, SR~sr+v/sqrt(n) Z; like a law of large numbers for the SR

I should mention Steve Paz here whose does a lot of work on SR. Check out his website http://www.sharperat.io.

Cameron

Ernie Chan said...

Thanks for the extensive references and comments, Cameron!
Ernie

Unknown said...

Hey Ernie,
You mentioned in the post that you'll often give "an applicant a tough financial data science problem to be done at their leisure" for interviews. As someone pretty keen to break into the industry, any chance you'd be able to share what one such problem may look like?
Thanks,
Nick

Ernie Chan said...

Hi Nick,
Merging 2 different financial data sets, with no obvious common key.
Ernie

Anonymous said...

HI Ernie,

I did some calculations based on using a random walk, i.e. a discrete approximation of the Brownian motion.
Using a digital up/down model the Sharpe S ratio can be computed directly in term of the number np of up steps, and an analytical function for S(np) can be found,
S=-((N - 2 np)/(2 Sqrt[(N - np) np])), from which we can get
n(S)=np -> (N (1 + Sv^2 - Sqrt[Sv^2 + Sv^4]))/(2 (1 + Sv^2))}, {np -> (
N (1 + Sv^2 + Sqrt[Sv^2 + Sv^4]))/(2 (1 + Sv^2)),
where the two roots correspond to positive or negative S. S(np) is a monotonous function, so the CDF can be obtained by integrating S(np) times its probability for np>np(S*).
The probability of each path with np up steps is given by the Bernoulli distribution, but for large N it is well approximated by a Gaussian.
So we can finally compute the pdf by integrating
S(np) Gaussian(m(np),\sigma(np)) dnp between np(S*) and +inf.
I get

1/2 (Erf[Sqrt[N]/Sqrt[2]] - Erf[(S Sqrt[N/(1 + S^2)])/Sqrt[2]])

I think the interesting paper by Prado may contains several typos:

For example eq 4 and 11 are inconsistent, i.e. in the gamma3=gamma4=0 case the pdf 4 should be the derivative of the cdf 11, but it is not.
As for eq 11, in the same limit gamma3=gamma4=0, for SR>2 the cdf is not real, which is not a very nice property for a cdf ..

In fact the formula I derived differs from eq 11 exactly in the coefficient of SR^2, and my formula does not have this problem of not being real for SR>2.
In the cited paper, before eq 11, they write "we propose", which does not sound like a proof, but more some kind of not better justified ansatz.

\gamma3 and \gamma4 could be included by using random walks with 4 different values instead of 2, by tuning appropriately their probabilities.

I hope to hear some comments from you

Findango



Ernie Chan said...

Hi Findango,
Thank you for highlighting some potential problems with the paper by Lopez de Prado et al. Your calculations make sense to me. Perhaps you can tweet @lopezdeprado to seek his comments?
Ernie

Findango said...

HI Ernie,
thank you for your comment.I ll soon follow your advise.
After a more careful look I realized the Brownian motion limit, according to the definitions used in the paper is gamma3=0, gamma4=3.
In the paper eq 4 is consistent with 8, i.e. 4 is the Gaussian limit of 4.
Eq 10 and 11 are no consistent, eq 11 it should be 1-Z() by the very definition of CDF.

Beside this the real issue seems to be eq 8, which is borrowed from some other author. The binomial random walk has \gamma_2n+1=0 \gamma_2n =1 for any n>1, but eq 8 for gamma3=0,gamma4=1 is not the P(S) I get, i.e.

The problem is in sigma_S^2 in eq 8(trivially derive in sec 2.5) which should be always positive but for gamma3_=0 and gamma4<1 for example it is negative.
I get sigma_S=Sqrt[1 + Sv^2]/Sqrt[N] while for gamma3=0, gamma4=1, using the eq in section 2.5 we get something independent of SR, which is quite doubtful ad inconsistent, as explained, with the random walk approach.

I did not check how 8 is derived but my guess is that the correct expression for sigma_S in eq 8 may involve something (1+gamma4)/4*S^2.

In any case the root problem is that eq 8 is giving negative variance for certain combinations of gammas and S, which is not possible for a Gaussian, and any result based o it has the same problem.

Findango



Unknown said...

There is no problem with the asymptotics re the variance of the sharpe ratio in
"THE SHARPE RATIO EFFICIENT FRONTIER", be Lopez de Prado, et. al. There may be an issue with sample estimates for SR, skew and kurtosis for small n, but if you know the population distribution for returns (which of course we don't) the math works.

Eqn. 8 says the variance approaches the following value as n tends to inifinty:

(1/n) * [ 1+SR^2/2-g3*SR+(g4-3)/4 * SR^2]

now write (write g4-3 as g4-1 -2) simplifying the above this becomes,

(1/n) * [ 1-g3*SR+(g4-1)/4 * SR^2]

Almost equation 11 except that above I'm assuming population values, where as equation 11 uses samples. (Practically you have to because is you don't know the distribution of the returns)

Now to the concern Fandango points out:
we need [1 -g3*SR-(g4-1)/4*SR^2] >0 for this to all make senses since Var>0.

In particular, if there are real roots to [1 -g3*SR+(g4-1)/4*SR^2]=0 for a particular SR, then we have a problem.

The discriminant is g3^2-4 * (g4-1)/4 = 1+g3^2-g4,
if 1+g3^2-g4<0 the there are no real roots and there is no problem.

From Wikipedia page, (https://en.wikipedia.org/wiki/Kurtosis#cite_note-Pearson1929-3) and reference Pearson, K. (1929). "Editorial note to 'Inequalities for moments of frequency functions and for various statistical constants'". Biometrika. 21 (1–4): 361–375. doi:10.1093/biomet/21.1-4.361.

It states that 1+g3^2 <= g4. So at worst, Var is zero, but it can't be negative. I haven't read the reference, but 1+g3^2 = g4, may require something unusual in the distribution or it's parameters. There may also be issues with using sample values for small n. (I don't know if de Prado is clear enough on what he's using for sample values for skew and kurtosis, but these can be adjusted for sample size too).

Conclusion:
1) Caution is required for using sample estimates
2) If you know the distribution of returns, Var will not be negative but it could be zero so you have still be careful.


Unknown said...

A reference on the kurtosis lower bound that's readily accessible:
https://projecteuclid.org/download/pdf_1/euclid.aoms/1177731243

Unknown said...

That the consistency of the asymptotic distribution of the sharpe ratio should come down to, IMHO, an obscure identity relating the kurtosis (g4) and skew (g3),
g4 >= g3^2+1,
is surprising.

I would never think that this inequality, which before now I never new about, would be relevant. I also thought it would involve some fairly advanced math, but it doesn't.

I was able to come up with a simpler proof and easy to understand than the reference i gave.

Define the following:

m:=E[x] (Expected value of x)
v^2:=E[(x-m)^2)]
z:= (x-m)/v

We then have
E[z]=0,E[z^2]=1, E[z^3]=g3, E[z^4]=g4

Define QF:=(z^2+a*z+b)^2 with a,b arbitrary.

Now QF>=0, so
0<=E[QF]=

E[z^4]+a^2*E[z^2]+b^2+2*a*E[z^2]+2*a*b*E[z]+2*b*E[z^2]=

E[z^4]+a^2 + b^2+2*a*E[z^3]+2*b.

Since a and b are abitrary, set a and b as follows:
a=-E[z^3], b=-1. So we have that
0<=E[z^4]-(E[z^2])^2-1, and hence g4>=(g3)^2+1.


One still needs to cautious with sample limits if all sample estimates for a mean, variance, skew and kurtosis. If the only adjustment is letting n-> n-1, then it would be fluke to get the sample estimates that resulted in VaR = 0.


Anonymous said...

HI Cameron,
thank you for your post.
Net proof of the the inequality.

Still, as I pointed out previously, a binomial distribution of returns, corresponding to g3=0, g4=1, according to eq. 8 would give zero variance, which is obviously not possible. In fact this case corresponds to the limit case g4=g3+1.
My calculation on the contrary give a meaningful result.
Note that a binomial random walk is a quite common model for finance return series.

I don't know if this is the only case when g4=g3+1, and consequently the variance of SR is zero, but this definitely raise doubts about the general validity of eq 8.

Findango

Unknown said...

Hi Findango,

You raised a fair, but fine point, IMHO.

Recall Eq. 8 in Sharpe Frontier by De Prado, et al.:

“Mertens (2002) concludes that the Normality assumption on returns could be dropped, and still the estimated Sharpe ratio would follow a Normal distribution with parameters”

Mean = 0 and

VaR = (1 + SR^2/2- g3*SR + SR^2*(g4-3)/4)/n= (1-g3*SR +SR^2*(g4-1)/4)/n

For the the binomial model g3=0, and g4=1 so,

Var=1/n.

Which means the usual thing: asymptotically the sample value of the Sharpe Ratio, ^SR, converges in probability to the population value, SR. (I just want to be clear that Var = 1/n is not saying VaR=0).

Equivalently (SR^-SR)/sqrt(n)-> N(0,1) (the central limit theorem, under iid and finite variance assumptions).

However some minor care is needed on dropping Normality assumptions. At a minimum g3 and g4 need to be finite and I characterized as those distributions for which g4>g3^2+1. Sufficient condition, but not a necessary one as you point with the binomial model example.

The paper, “Limiting distributions of the non-central
t-statistic and their applications to the power of t-tests under non-normality”, by Vidmantas Bentkus, Bing-Yi Jing , Qi-Man Shao and Wang Zhou, Bernoulli 13(2), 2007, 346–364” (https://projecteuclid.org/download/pdfview_1/euclid.bj/1179498752)
gives the complete answer on distributions.

The only case when eqn. 8 is not valid is when the returns are binomially distributed such that the population Sharpe Ratio, mu/sigma, is
mu/sigma=2*sqrt(p*(1-p))/(2p-1) and p<>1 so the denominator is not zero; p is the "up" probability. (See case (i) of Remark 2.1. The proof is given in Theorem 2.2 in Benktus et al., Bernoulli 13(2), 2007, 346–364)

In the case where eqn. 8 is not valid, the sample Sharpe Ratio is distributed asymptotically as a non-central chi-squared distribution with 1 degree of freedom.

Typically, we don’t know the population distribution for single period asset returns and variance so we end up having to use sample values. Although in practice you don’t see just two values for singe period asset returns. If that did occur, it seems highly likely you could eye-ball it. In practice you can generally safely use Eqn. 11.


Conclusion:
In the situation where you shouldn’t use Eqn 11, IMHO, you’d see it in the single period asset returns, and based on Benktus et al., Bernoulli 13(2), 2007, 346–364 you’d know what the correct asymptotic distribution is.
However, Eqn 8 is generally applicable in practice and Eqn. 11 is safe to use because you don’t see just 2 possible values for single period asset returns.

Anonymous said...

Hi Cameron;
thank you for your post. I am amazed by the fact you could find that paper, despite the title is completely (apparently) unrelated, and there is no mention of SR, but indeed T_n=sqrt(n) SR.
As for Var<>0 it is an important clarification, since the zero determinant in that case does not imply Var=0.

I think there may be some language confusion. When I was talking about binomial random walk I meant that the returns follow a *uniform* "binomial" distribution(I think this is standard teminology). That means that g3=0 and g4=1 for a uniform distribution with P(x_up)=P(x_down)=1/2, x_up=-x_down, not for a binomial distribution Bin(n,p) of the type considered in remark 2.1. If the returns follow a binomial distribution g3 and g4 would be different I think, and could be computed with the MFG. The cumulative return follow a binomial distribution, but SR is defined in terms of the returns, not their cumulative sum.

Apparently the case of a uniformly distributed up/down returns should be included in case (ii)(which is Prado assumption), since it is not a binomial, but this is not what I got for the CDF (see above).
1/2 (Erf[Sqrt[N]/Sqrt[2]] - Erf[(SR Sqrt[N/(1 + SR^2)])/Sqrt[2]])
while using (ii) would give something independent of SR. The above expression corresponds to sigma_SR=Sqrt[1 + SR^2]/Sqrt[N)

It could be my calculation contain some error.

Thanks

Anonymous said...

Hi ,
just fez more thoughts.
Since var(SR)=(1-g3*SR +SR^2*(g4-1)/4), for g4-1<0 it is a concave down parabola, so there should always be some region where it is negative; for a sufficiently large SR, but if 1+g3^2 <= g4 , g4<1 would imply g^2<0; which is impossible.
So var is always a concave up function of SR and g4>=1 , and if g4=1 necessarily g3=0, which is the uniform digital return distribution used in the binomial random walk model of the prices.

If g4>1 there can exist a parabola tangent to the Var=0 horizontal axes when 1+g3^2= g4 as Cameron pointed out. So this is a real general issue, but is not the case if the digital uniform distribution when g3=0; g4=1.
As for the case of a binomial distribution of the returns the distribution is not normal but N^2; but this is not a binomial random walk; whore returns are uniformly distributed.

Unknown said...

This comment is refers to your second last comment:

I've reviewed a lot of statistical models and one is always looking at T-stats for to see if regressors are actually present: sample mean/sample std-deviation.

When Ernie asked, "what's the distribution of the Sharpe Ratio, when the single period returns are Gaussian?" I immediately recognized that it looks like a T-stat
[SR(n)=T-stat(n)/sqrt(n) ] and I knew what to search for in terms of distributional properties. [Steve Paz, a quant researcher referenced earlier in this discussion, mentions that people seem generally unaware that the Sharpe Ratio is just T-stat/sqrt(n)]

The following may just be semantics.

Re: Binomial Distribution in distribution in (ii).
----------------------------------------------
The distribution in Benktus et al. is defined as
X~B(p,mu, sigma)
where
X=sigma*Y+mu, Y~B(p,0,1) and

P(Y=Y_up)=p, P(Y=Y_dn)=1-p, Y_up=+(1-p)/D, Y_dn=-p/D, D=[p*(1-p)]^(1/2)

Benktus et al. referred to Y as a standardized Bernoulli distribution.

A general, finance type binomial model defined as follows:
P(Z=+a)=p,
P(Z=-b)=1-p,
a,b>0

m=E[Z]=a*p-b*(1-p) => (a+b)*p-b= m
VaR:=E[(Z-m)^2]=(a+b)^2*p*(1-p) => a+b=[VaR/p/(1-p)]^(1/2)

This includes the uniform binomial model with a=b=+1, p=1/2. Also, the finance type binomial model can be related to X~B(p,mu,sigma) as follows:

a= mu + Y_up * sigma
b= -mu - Y_dn * sigma

m=E[Z]=a * p - b * (1-p) = mu * p + sigma * Y_up * p +mu * (1-p) + sigma * Y_dn * (1-p) = mu = E[X]

and

VaR=E[(Z-m)^2] = (a+b)^2*p*(1-p)= sigma^2 * (Y_up-Y_dn)^2 * p*(1-p) = E[(X-mu)^2]




Re: Validity of Eqn. 8 in de Prado when g4=g3^2+1.
-----------------------------------------------------
Based on Benktus et al., X=sigma*Y+mu, Y~B(p,0,1) satisfies g4=g3^2+1, but only when mu/sigma=2*sqrt(p*(1-p))/(2p-1) and p<>1/2 is eqn. 8 in de Prado is invalid.

Also, if there are continuous distributions (I don’t know of any at the moment) where g4=g3^2+1, then one would expect eqn. 8 in de Prado to work


Re: Your Calculation on the probability
---------------------------------------
Eqn. 8 in de Prado and the formulas in Benktus require only the population values for mu and sigma.

So in your formula,
Sigma_R=(1+SR^2)/sqrt(N), isn’t mu=0 for the return based on a uniform binomial model? In which case SR=mu/sigma=0?

Perhaps you've already worked this out, I haven't, but what's the distribution of the sharp ratio for a given n when the single period returns are P(+/-X_up)=1/2?

Patrick Clensfield said...

Thank you sir, for the invaluable insights.