Thursday, December 24, 2009

Selecting tradeable pairs: which measure to use?

A guest blog by Paul Farrington

One of the most important factors in statistical arbitrage pairs trading is the selection of the paired instruments.  We can use basic heuristics to guide us, such as grouping stocks by industry in the anticipation that stocks with similar fundamental characteristics will share factor risk and tend to exhibit co-movement.  But this still leaves us with potentially thousands of combinations.  There are some statistical techniques we can use to quantify the tradeability of a pair: one approach is to calculate the correlation coefficient of each pair's return series. Another is to consider cointegration measures on the ratio of the prices, to see if it remains stationary over time.

In this article I briefly summarise the alternative approaches and apply them to a universe of stock pairs in the oil and gas industry.  To measure how effective each measure is in real world trading, I back test the pairs using a simple means reversion system, then regress the generated win rate against the statistical results.  Some basic insights emerge as to the effectiveness of correlation and cointegration as tools for selecting candidate pairs.

Please visit for details of my methodology and results.


sjev said...

I'm currently busy with solving the same 'ranking' problem. Maybe Pauls idea of regressing the metrics could be taken a step futher by training a neural network on a set of metrics-sharpe ratios produced by backtesting. Then a trained nn could be used to predict the sharpe for a set of metrics, without backtesting.

Anonymous said...

Very interesting
Is there any chance you can republish the ADF v Win Rate Graph it seems (on my PC at least) to only show the top third.

Bryan said...

Thanks, Ernie. I just started to read your book last night. This is my first visit. Keep it up.

Eric D said...

Good idea to regress some of the inputs vs an output but why win rate? There seem to be a lot more metrics (outputs) that would better describe a successful system such as sharpe ratio or some other measure of risk adjusted return.


Unknown said...

Sjev: good point about the Sharpe - I like the idea of adjusting the pair return for volatility. Not sure whether neural nets would be suitable for this problem domain though. I used least squares regression because I was only modelling a single input and output variable. Also, my worry with neural nets generally is falling prey to the well known problem of overfitting the data set.

Eric: The choice of winrate as a profitability metric was partly arbitrary, but also because using the compounded return could be distorted by volatile pair returns. For example - a stock pair could be exhibit good mean reversion but lose all the profits on a single bad trade. My intention was to measure "profitable tendencies" such as winrate and relate these to statistical measurements . Actual system development would of course use risk adjusted returns.

Paul F

Unknown said...

FYI - some folks on this thread have mentioned difficulty in seeing the graphics in the article. I've uploaded it in Word '97/2003 format here.


pairs article

quantivity said...

For those interested in pairs / statarb techniques, Avellaneda and Lee recently published similar results (spanning 1997 - 2007) in their article "Statistical Arbitrage in the U.S. Equities Market" (15 July 2009), covering both the standard statarb methodologies: sector-style ETF cointegration and PCA-style decomposition.

Anonymous said...

Hi Ernie,

One pitfall I find when backtesting hundreds of stocks, ETFs and other instruments is that even a boosted PC runs out of memory and thus crashes before completion. (8 cores and 16GB of memory).

May I ask what kind of computer are you using? Have you looked into NVidia's Tesla Personal Supercomputer and Cray's CX1? Furthermore, what are your thoughts on GPU vs. CPU or are both needed to be running hundreds of positions and a sizable portfolio?

Ernie Chan said...

Hi Anonymous,
I am not sure how complicated your trading algorithm is, and how many bars you are backtesting. I use Matlab for my backtest, and it never takes more than a few minutes for a portfolio of several thousand stocks on a very modest and old desktop computer.

Anonymous said...

Dear Ernie,

The optimal strategy calls for 30-min bars and I use parfor loops in Matlab also. And object-oriented programming as well. So that leaves me puzzled as I have pretty much the top of the line retail PC and it always runs out of memory...

Ernie Chan said...

I have not used Matlab's parallel computing toolbox before, so can't really comment on the efficacy of parfor. However, I do find that using OO tends to slow things down, and perhaps require more memory.
I recommend just straight-forward procedural programming when using Matlab.

AndyW said...

Hi Ernie - Andy here from Automated Trader magazine. Just a thought re Anon's PC prob. The Accelereyes Jacket program that enables MATLAB for GPU computing can make a huge difference. I'm in the process of scribbling a review of it for our next issue and the early signs are extremely promising. I'm using a machine with 3 Tesla C1060s in it - combining that with Jacket makes a big difference. Anon - apologies if you're already running this set up or using MATLAB's own beta GPU functionality - but just thought I'd mention it. Regds A

Ernie Chan said...

Hi Andy,
Thanks for mentioning the Jacket. Yes, I think Matlab's Parallel Computing toolbox also has GPU-computing capability. I wonder whether you any comparisons of the two?