#### From Pair Programming to Pair Trading

Software developers may be familiar with the concept of "pair programming". I.e. two programmers sitting in front of the same screen staring at the same piece of code, and taking turns at the keyboard. According to software experts, this practice reduces bugs and vastly improves the quality of the code. I have found that to work equally well in trading research and executions, which gives new meaning to the term "pair trading".

The more different the pair-traders are, the more they will learn from each other at the end of the day. One trader may be detail-oriented, while another may be bursting with ideas. One trader may be a programmer geek, and another may have a CFA. Here is an example. In financial data science and machine learning, data cleansing is a crucial step, often seriously affecting the validity of the final results. I am, unfortunately, often too impatient with this step, eager to get to the "red meat" of strategy testing. Fortunately, my colleagues at QTS Capital are much more patient and careful, leading to much better quality work and invalidating quite a few of my bogus strategies along the way. Speaking of invalidating strategies, it is crucial to have a pair-trader independently backtest a strategy before trading it, preferably in two different programming languages. As I have written in my book, I backtest with Matlab and others in my firm use Python, while the final implementation as a production system by my pair-trader Roger is always in C#. Often, subtle biases and bugs in a strategy will be revealed only at this last step. After the strategy is "cross-validated" by your pair-trader, and you have moved on to live trading, it is a good idea to have one human watching over the trading programs at all times, even for fully automated strategies. (For the same reason, I always have my foot ready on the brake even though my car has a collision avoidance system.) Constant supervision requires two humans, at least, especially if you trade in international as well as domestic markets.

Of course, pair-trading is not just about finding bugs and monitoring live trading. It brings to you new ideas, techniques, strategies, or even completely new businesses. I have started two hedge funds in the past. In both cases, it started with me consulting for a client, and the consulting progressed to a collaboration, and the collaboration became so fruitful that we decided to start a fund to trade the resulting strategies.

For balance, I should talk about a few downsides to pair-trading. Though the final product's quality is usually higher, collaborative work often takes a lot longer. Your pair-trader's schedule may be different from yours. If the collaboration takes the form of a formal partnership in managing a fund or business, be careful not to share ultimate control of it with your pair-trading partner (sharing economic benefits is of course necessary). I had one of my funds shut down due to the early retirement of my partner. One of the reasons I started trading independently instead of working for a large firm is to avoid having my projects or strategies prematurely terminated by senior management, and having a partner involuntarily shuts you down is just as bad.

Where to find your pair-trader? Publish your ideas and knowledge to social media is the easiest way (note this blog here). Whether you blog, tweet, quora, linkedIn, podcast, or youTube, if your audience finds you knowledgeable, you can entice them to a collaboration.

#### Hiring Researchers

Besides pair-trading with partners on a shared intellectual property basis, I have also hired various interns and researchers, where I own all the IP. They range from undergraduates to post-doctoral researchers (and I would not hesitate to hire talented high schoolers either.) The difference with pair-traders is that as the hired quants are typically more junior in experience and hence require more supervision, and they need to be paid a guaranteed fee instead of sharing profits only. Due to the guaranteed fee, the screening criterion is more important. I found short interviews, even one with brain teasers, to be quite unpredictive of future performance (no offence, D.E. Shaw.) We settled on giving an applicant a tough financial data science problem to be done at their leisure. I also found that there is no particular advantage to being in the same physical office with your staff. We have worked very well with interns spanning the globe from the UK to Vietnam.Though physical meetings are unimportant, regular Google Hangouts with screen-sharing is essential in working with remote researchers. Unlike with pair-traders, there isn't time to work together on coding with all the different researchers. But it is very beneficial to walk through their codes whenever results are available. Bugs will be detected, nuances explained, and very often, new ideas come out of the video meetings. We used to have a company-wide weekly video meetings where a researcher would present his/her results using Powerpoints, but I have found that kind of high level presentation to be less useful than an in-depth code and result review. Powerpoint presentations are also much more time-consuming to prepare, whereas a code walk-through needs little preparation.

Generally, even undergraduate interns prefer to develop a brand new strategy on their own. But that is not necessarily the most productive use of their talent for the firm. It is rare to be able to develop and complete a trading strategy using machine learning within a summer internship. Also, if the goal of the strategy is to be traded as an independent managed account product (e.g. our Futures strategy), it takes a few years to build a track record for it to be marketable. On the other hand, we can often see immediate benefits from improving an existing strategy, and the improvement can be researched within 3 or 4 months. This also fits within the "production chain" meta-strategy described by Lopez de Prado above, where each quant should mainly focus on one aspect of the strategy production.

This whole idea of emphasizing improving existing strategies over creating new strategies was suggested to us by our post-doctoral researcher, which leads me to the next point.

Sometimes one hires people because we need help with something we can do ourselves but don't have time to. This would generally be the reason to hire undergraduate interns. But sometimes, I hire people who are better than I am at something. For example, despite my theoretical physics background, my stochastic calculus isn't top notch (to put it mildly). This is remedied by hiring our postdoc Ray who found tedious mathematics a joy rather than a drudgery. While undergraduate interns improve our productivity, graduate and post-doctoral researchers are generally able to break new ground for us. For these quants, they require more freedom to pursue their projects, but that doesn't mean we can skip the code reviews and weekly video conferences, just like what we do with pair-traders.

Some firms may spend a lot of time and money to find such interns and researchers using professional recruiters. In contrast, these hires generally found their way to us, despite our minuscule size. That is because I am known as an educator (both formally as adjunct faculty in universities, as well as informally on social media and through books). Everybody likes to be educated while getting paid. If you develop a reputation of being an educator in the broadest sense, you shall find recruits coming to you too.

#### Hiring Subadvisors

If one decides to give up on intellectual property creation, and just go for returns on investment, finding subadvisors to trade your account isn't a bad option. After all, creating IP takes a lot of time and money, and finding a profitable subadvisor will generate that cash flow and diversify your portfolio and revenue stream while you are patiently doing research. (In contrast to Silicon Valley startups where the cash for IP creation comes from venture capital, cash flow for hedge funds like ours comes mainly from fees and expense reimbursements, which are quite limited unless the fund is large or very profitable.)

We have tried a lot of subadvisors in the past. All but one failed to deliver. Why? That is because we were cheap. We picked "emerging" subadvisors who had profitable, but short, track records, and charged lower fees. To our chagrin, their long and deep drawdown typically immediately began once we hired them. There is a name for this: it is called selection bias. If you generate 100 geometric random walks representing the equity curves of subadvisors, it is likely that one of them has a Sharpe ratio greater than 2 if the random walk has only 252 steps.

Here, I simulated 100 normally distributed returns series with 252 bars, and sure enough, the maximum Sharpe ratio of those is 2.8 (indicated by the red curve in the graph below.)

(The first 3 readers who can email me a correct analytical expression with a valid proof that describes the cumulative probability P of obtaining a Sharpe ratio greater than or equal to S of a normally distributed returns series of length T

**will get a free copy of my book Machine Trading**. At their option, I can also tweet their names and contact info to attract potential employment or consulting opportunities.)
These lucky subadvisors are unlikely to maintain their Sharpe ratios going forward. To overcome this selection bias, we adopted this rule: whenever a subadvisor approaches us, we time-stamp that as Day Zero. We will only pay attention to the performance thereafter. This is similar in concept to "paper trading" or "walk-forward testing".

Subadvisors with longer profitable track records do pass this test more often than "emerging" subadvisors. But these subadvisors typically charge the full 2 and 20 fees, and the more profitable ones may charge even more. Some investors balk at those high fees. I think these investors suffer from a behavioral finance bias, which for lack of a better term I will call "Scrooge syndrome". Suppose one owns Amazon's stock that went up 92461% since IPO. Does one begrudge Jeff Bezo's wealth? Does one begrudge the many millions he rake in every day? No, the typical investor only cares about the net returns on equity. So why does this investor suddenly becomes so concerned with the difference between gross and net return of a subadvisor? As long as the net return is attractive, we shouldn't care how much fees the subadvisor is raking in. Renaissance Technologies' Medallion Fund reportedly charges 5 and 44, but most people would jump at the chance of investing if they were allowed.

Besides fees, some quant investors balk at hiring subadvisors because of pride. That is another behavioral bias, which is known as the "NIH syndrome" (Not Invented Here). Nobody would feel diminished buying AAPL even though they were not involved in creating the iPhone at Apple, why should they feel diminished paying for a service that generates uncorrelated returns? Do they think they alone can create every new strategy ever discoverable by humankind?

#### Epilogue

Your ultimate wealth when you are 100 years old will more likely be determined by the strategies created by your pair-traders, your consultants/employees, and your subadvisors, than the amazing strategies you created in your twenties. Hire well.

===

#### Industry update

1) A python package for market simulations by Techila is available here. It enables easy parallel computations.

2) A very readable new book on using R in Finance by Jonathan Regenstein, who is the Director of Financial Services Practice at RStudio.

3) PsyQuation now provides an order flow sentiment indicator.

4) Larry Connors published a new book on simple but high Sharpe ratio strategies. I enjoyed reading it very much.

5) QResearch is a backtest platform for the Chinese stock market for non-programmers.

6) Logan Kane describes an innovative application of volatility prediction here.

7) If you aren't following @VolatilityQ on Twitter, you are missing out on a lot of quant research and alphas.

## 31 comments:

Very good post! Glad to see your back to blogging, at least for today.

Your comments on "selection bias" are well taken, so I understand why you've adopted your new policy of "whenever a subadvisor approaches us, we time-stamp that as Day Zero. We will only pay attention to the performance thereafter".

However, isn't that policy mathematically sub-optimal? For example, let's say 10 subadvisors approach you on Day Zero and they all have 1-year *live* (not backtested) track records. Let's say you monitor them all for the next year. To decide which of the 10 to invest with, wouldn't it make most sense to consider the full 2-year track record and pick the one with the best sharpe ratio? Doesn't make sense to throw away the year-1 data if it's a real (non-backtested) track record, right?

Hi aagold,

Actually, by tracking the forward performance of the subadvisor, we have already used the information that their past performance was good. If it wasn't good, they wouldn't even approach us (and if they did approach us, we wouldn't bother to track its forward performance.)

Ernie

Ok, then how about this example. Let's say two subadvisors A and B approach you with a 1-year live track record. A has Sharpe Ratio SR=1 and B has SR=2. So A is good, but B is really good that first year. During year 2, A has SR=1.3 and B has SR=1.25. Who do you choose? Mathematically, seems clear you should choose B, right?

Not necessarily. You have to dig in much deeper into their respective universes and stated strategies.

One thing that's problematic in your hypothetical is the implicit assumption that you're annualizing Sharpe ratios from higher frequency returns -- which in most cases isn't legit. However, if we're not annualizing from higher frequency returns, then your B would've seen a very bad year to bring the Sharpe down that much. So the variance in the Sharpe becomes a concern on it's own.

However, it really depends on the product groups and strategy types the subadvisors are supposedly deploying. Fund of fund managers will increase allocations to funds that have had a nearly zero return year under certain circumstances including underperformance.

Ok, but I was really just trying to address the theoretical concept of "selection bias" Ernie brought up, and to consider the method he's using to address it. Sounds like his method is to effectively apply a threshold to the live track record at Day Zero: if the performance is greater than some threshold to be considered "good", then he starts measuring their returns from Day Zero and doesn't consider the prior track record at all. My point is that's sub-optimal. Better to analyze the entire live track record to make the decision.

Haha I think he was being a bit more informal than what you're getting at. The implication is that people with trading systems don't approach others for investment unless the backtests (or short track records) are good. So by the fact that someone has approached you, you can assume the historical results are good, so you then start watching the forward ones.

Aagold,

It is a valid question of how much weight we should apply to the Sharpe ratio prior to Day Zero, vs the weight we apply to forward period Sharpe. I would certainly put less weight on the past due to selection bias. As to the optimal allocation, I don't have a mathematical model for that yet.

However, if the forward period Sharpe is significantly lower than the look back period, it is a red flag. It may be a result of bad luck, or it could be selection bias at work.

Ernie

Yeah, I got that ZHD. And I realize I probably have too much time on my hands and am making a big deal out of something people probably don't really care about.... it's happened before! :-)

However - I still think considering just the forward returns, and completely ignoring the *live* historical track record before the person approached you (other than confirming it's "good") doesn't make much sense.

IMO there's really no solution to the "selection bias" problem other than having a long enough live track record, under enough different types of market regimes, for the trading strategy being used.

Ernie,

Glad you at least think my comment raised a valid question! :-)

Maybe I haven't thought this through well enough, but it seems to me that as long as the track record prior to Day Zero is a valid live track record, and not backtested, then it should be given equal weight to the forward period Sharpe.

To me, this seems like a classic case of "regression to the mean" and separating out luck from skill. In his book Thinking Fast and Slow Kahneman gives some classic examples, like in a two day golf tournament, it's highly likely that the top performer in Day 1 will do worse in Day 2 than they did in Day 1. That's because their performance in Day 1 was partially due to luck and partially due to skill.

So, it's certainly true that people who did exceptionally well in a 1-year track record are likely to approach you, and like the golf tournament example above, it's highly likely they won't do as well in year 2 as they did in year 1. However, I still think it's true that there's no better way of differentiating luck from skill than simply considering the full 2-year track record..

Thought about this a bit more on a long drive... I find the topic quite interesting (obviously).

If we assume a stationary return distribution, so it has for example an unknown but fixed mean and variance, then throwing away all past data before Day Zero does completely eliminate selection bias. Therefore, statistics measured after Day Zero are an unbiased estimate of the true (infinite test time) statistics.

If we include the backward data, I agree it will be *biased* upwards due to selection bias.

However, it will have lower *variance* than if we throw away the backward data. So, to arrive at the best estimate of true future return, I think there's a classic bias-variance tradeoff that needs to be addressed.

Aagold,

The mathematical problem we are trying to solve is this: assume the returns are sampled from some unknown distribution with Sharpe of S*. We observed past year has Sharpe S1, and forward year has Sharpe S2. What is the cumulative probability that S* >= S0, where S0 is at least 0 but typically chosen to be 1?

Using Bayes' theorem: P(S*|S1, S2)=P(S2|S1, S*)*P(S1, S*)/P(S1, S2)

If we assume that the returns are distributed independently (i.e. no serial codependence), then P(S1, S2)=P(S1)*P(S2), and P(S2|S1, S*)=P(S2|S*), and P(S1, S*)=P(S1|S*)*P(S*). Hence

P(S*|S1, S2)=P(S1|S*)*P(S2|S*)*P(S*)/(P(S1)*P(S2)). Hence indeed S1 and S2 are symmetric.

However, if returns are not independent, than knowing S1 tells us something about S2. In fact, I would argue that if S1 and S2 are highly positively correlated (as we should hope for a consistent strategy's returns and Sharpe ratios), and we find S1 and S2 to be very different, it *may* imply that CDF(S* >= S0) to be small. I haven't worked out the math in this case, but I hope some reader would!

Ernie

Very interesting post, Ernie! The subadvisor idea has me thinking differently about what's possible for a trading business. I agree with everything you wrote.

Thanks Logan!

Ernie

You seem to have missed that Lopez de prado is leaving AQR! Bad timing.

Hi Ernie,

Here's a mathematical description of how I'm thinking about this problem. I think this is the simplest formulation of the type of "selection bias" you discussed in this post. It's simpler than our actual problem of interest, but I think it retains the key elements. If a solution to this probability question exists, it seems like someone should have solved it already. I may post this to stack exchange or someplace like that.

Let's say we have N independent normal random variables X_i ~ N(mu_i, sigma_i), each with unknown mean mu_i and unknown standard deviation sigma_i, i=0...N-1.

M independent random samples are drawn from each of the N random random variables: x_ij, i=0....N-1, j=0...M-1

Compute N sample means m_i = (1/M)*sum_j {x_ij}

and N sample variances v_i = (1/(M-1))*sum_j{(x_ij - m_i)^2}.

Search for index imax such that m_imax = max_i{m_i}. That is, imax is the index of the random variable with the maximum sample mean.

What is the best unbiased estimate of true mean mu_imax? Is it simply m_imax? Probably not, due to the form of "selection bias" discussed in this post.

Hi aagold,

Your formulation of the problem is interesting, but I am thinking of our house rule as more of a hypothesis testing.

Given the track record of a strategy, we estimate that it has a true Sharpe ratio S1. What is the probability that the forward Sharpe ratio is S2, which may be much lower than S1? If the probability is small, then we reject the hypothesis that the true Sharpe ratio is S1.

Ernie

Don't we need to somehow model a large number (N) of potential subadvisors that ran their trading strategy over the lookback period, and that only those few that generated exceptionally good returns will approach you and show their track record? That's the key effect I'm trying to model.

In my formulation,I simplified it to only the best 1 out of N and to estimation of the true mean of a gaussian random variable. If this simpler problem were solved, perhaps it could be extended to a hypothesis test on the true sharpe ratio, as you've described.

I'm kind of surprised I haven't been able to find a known solution to the problem as I formulated it. Seem like this sort of effect would show up in many situations. Maybe I just haven't found the right terminology and search words...

aagold,

Ah, now I see your point!

Yes, your model may indeed capture this effect - looking forward to the solution!

Ernie

Ernie,

I'm continuing to pursue the problem as I formulated it, since I find it interesting, but since there's no practical way of knowing N, I think your rule of only considering returns after Day Zero is probably the best approach. I'll certainly let you know if I make progress on my formulation though.

One caveat that occurs to me: even with your Day Zero rule, there's some potential for selection bias if you start paying attention to too many potential sub-advisors. If you pay attention to too many after Day Zero, someone will eventually have good forward returns just by getting lucky...

Regards,

aagold

aagold,

In this business of uncertainty, I think luck is inevitable.

So besides using statistics to screen out selection bias, we will have to take into account qualitative factors such as whether the strategy has particularly favorable/unfavorable market regimes, and whether the logic matches the regime out/under-performance.

Ernie

Hi -- Great post. Thanks, Ernie.

One thing -- Is the link to the Connors book the right book? It seems to be an older title. Thanks.

Thanks DeeGee!

You are right. His new book is Buy the Greed, Sell the Fear. I have updated the link.

Ernie

Regarding your question on the CDF for the Sharpe ratio: are you referring to the arithmetic Sharpe ratio (as described by Sharpe, arithmetic sample mean) or the geometric Sharpe ratio (geometric sample mean)? Since you simulate a geometric random walk, I guess you are searching for the CDF for the geometric Sharpe. In that case, the result is a sum of lognormal variables, which does not have an exact closed form CDF. Please let me know how to interpret this question.

Pont,

That is a good question. However, Sharpe ratio is conventionally computed from the arithmetic average of net (not log) returns. I actually simulated a set of Gaussian numbers as returns, and the compounded them geometrically to obtain the geometric random walk. But that does not affect the Sharpe ratio computation.

Regarding closed form CDF, you are free to just express your answer in terms of the CDF of standard distributions, whether those are expressed in terms of integrals or not.

By the way, as I just tweeted at @chanep, I have announced the 3 winners already. But I would be happy to send you the solution to this problem if you submit your answer.

Ernie

Hello,

I think that the problem of sub advisor selection is linked to the fact that the population of subadvisors is unknown.

Scientific papers have the same problem with the p-value at 0.05. How many hypothesis have been tested to find something behind 0.05 ?

Marcos Lopez de Prado apply it to quantitative modelling saying we can't use out of sample results to improve our modelling because then this out of sample become in sample.

Returning to our subadvisors selection, we should weight the sharp ratio of subadvisors by the likely population of subadvisors. If the subadvisor is trading trend following/smart beta stuff (mainstream strategies) on a very small account, it is likely that subadvisor population is very high and thus some subadvisors could have excellent results just by luck.

On the contrary, if the subadvisor is trading an original strategy on a big account/with many clients, the population of subadvisor is likely to be very small and the results less due to luck.

Hi Ken,

Indeed, we do not select subadvisors based on purely statistical criteria. The logic and nature of the strategy are important inputs, as is its co-dependence with our other strategies. Certainly the AUM of the strategy is another input.

Ernie

That's a very interesting article Ernie, you have brought the attribute of leading strategically through building different parts of the team that eventually win. I think the three strategies you have just highlighted are the exact working parts that any winning organization should have. That is simply remarkable.

Regarding the question you asked:

What is correct analytical expression with a valid proof that describes the cumulative probability P of obtaining a Sharpe ratio greater than or equal to S of a normally distributed returns series of length T

What is the answer to this?

Thanks!

Hi Watua,

Thank you for your kind words!

The solution of the challenge can be found in Bailey and Lopez de Prado 2012 "The Sharpe Ratio Efficient Frontier" equations 10-11.

Ernie

Hi Ernie,

Now that the proverbial, "cat is out of the bag" a few of comments:

1) Prado et al. provide the asymptotic distribution of the Sharpe Ratio (SR) for

both normal (eqn. 4) and non-normal (eqn. 8) i.i.d returns

2) Eqns. 10 and 11 use asymptotic normality albeit with a correction to the standard

deviation because the statistic is the SR, and because one doesn't ever really

know the true SR, we replace it with the sample SR. (At least we have a law of

large numbers so that's quite reasonable, although i.i.d in practice seems quite

strong assumption, so one may need a larger scale parameter to estimate

confidence intervals )

3) Under the normal i.i.d returns for finite n, SR~T(n)/sqrt(n) where T(n) is

distributed as a non-central T distribution with n-1 degrees of freedom because

the denominator is the unbiased sample variance

(https://en.wikipedia.org/wiki/Noncentral_t-distribution).

The non-central parameter is sqrt(n) sr, where sr is the population Sharpe Ratio

4) The paper by BENTKUS et al., Bernoulli 13(2), 2007, 346–364

(https://projecteuclid.org/download/pdfview_1/euclid.bj/1179498752)

shows that T(n)~sqrt(n) sr + v Z, Z~N(0,1) for large n (Remark 2.1); v is a unit

standard deviation corrected for skew and kurtosis in the returns; (has to be

what Prado et.al. give in eqn. 8, but I didn't check it)

5) For large n then, SR~sr+v/sqrt(n) Z; like a law of large numbers for the SR

I should mention Steve Paz here whose does a lot of work on SR. Check out his website http://www.sharperat.io.

Cameron

Thanks for the extensive references and comments, Cameron!

Ernie

Post a Comment