Quantitative Trading: Selecting tradeable pairs: which measure to use?

Thursday, December 24, 2009

Selecting tradeable pairs: which measure to use?

A guest blog by Paul Farrington

One of the most important factors in statistical arbitrage pairs trading is the selection of the paired instruments. We can use basic heuristics to guide us, such as grouping stocks by industry in the anticipation that stocks with similar fundamental characteristics will share factor risk and tend to exhibit co-movement. But this still leaves us with potentially thousands of combinations. There are some statistical techniques we can use to quantify the tradeability of a pair: one approach is to calculate the correlation coefficient of each pair's return series. Another is to consider cointegration measures on the ratio of the prices, to see if it remains stationary over time.

In this article I briefly summarise the alternative approaches and apply them to a universe of stock pairs in the oil and gas industry. To measure how effective each measure is in real world trading, I back test the pairs using a simple means reversion system, then regress the generated win rate against the statistical results. Some basic insights emerge as to the effectiveness of correlation and cointegration as tools for selecting candidate pairs.

Please visit http://www.paulfarrington.com/research/Selecting%20tradeable%20pairs.htm for details of my methodology and results.

13 comments:

sjevFriday, December 25, 2009 at 9:08:00 AM EST
I'm currently busy with solving the same 'ranking' problem. Maybe Pauls idea of regressing the metrics could be taken a step futher by training a neural network on a set of metrics-sharpe ratios produced by backtesting. Then a trained nn could be used to predict the sharpe for a set of metrics, without backtesting.
ReplyDelete
Replies
AnonymousFriday, December 25, 2009 at 4:17:00 PM EST
Paul
Very interesting
Is there any chance you can republish the ADF v Win Rate Graph it seems (on my PC at least) to only show the top third.
Matt
ReplyDelete
Replies
BryanFriday, December 25, 2009 at 8:29:00 PM EST
Thanks, Ernie. I just started to read your book last night. This is my first visit. Keep it up.
ReplyDelete
Replies
Eric DSaturday, December 26, 2009 at 7:45:00 AM EST
Good idea to regress some of the inputs vs an output but why win rate? There seem to be a lot more metrics (outputs) that would better describe a successful system such as sharpe ratio or some other measure of risk adjusted return.

Regards,
Eric
ReplyDelete
Replies
UnknownSaturday, December 26, 2009 at 11:29:00 AM EST
Sjev: good point about the Sharpe - I like the idea of adjusting the pair return for volatility. Not sure whether neural nets would be suitable for this problem domain though. I used least squares regression because I was only modelling a single input and output variable. Also, my worry with neural nets generally is falling prey to the well known problem of overfitting the data set.

Eric: The choice of winrate as a profitability metric was partly arbitrary, but also because using the compounded return could be distorted by volatile pair returns. For example - a stock pair could be exhibit good mean reversion but lose all the profits on a single bad trade. My intention was to measure "profitable tendencies" such as winrate and relate these to statistical measurements . Actual system development would of course use risk adjusted returns.

Regards
Paul F
ReplyDelete
Replies
UnknownSaturday, December 26, 2009 at 11:44:00 AM EST
FYI - some folks on this thread have mentioned difficulty in seeing the graphics in the article. I've uploaded it in Word '97/2003 format here.

Cheers

pairs article
ReplyDelete
Replies
quantivityMonday, December 28, 2009 at 1:50:00 AM EST
For those interested in pairs / statarb techniques, Avellaneda and Lee recently published similar results (spanning 1997 - 2007) in their article "Statistical Arbitrage in the U.S. Equities Market" (15 July 2009), covering both the standard statarb methodologies: sector-style ETF cointegration and PCA-style decomposition.
ReplyDelete
Replies
AnonymousSaturday, January 9, 2010 at 8:20:00 PM EST
Hi Ernie,

One pitfall I find when backtesting hundreds of stocks, ETFs and other instruments is that even a boosted PC runs out of memory and thus crashes before completion. (8 cores and 16GB of memory).

May I ask what kind of computer are you using? Have you looked into NVidia's Tesla Personal Supercomputer and Cray's CX1? Furthermore, what are your thoughts on GPU vs. CPU or are both needed to be running hundreds of positions and a sizable portfolio?
ReplyDelete
Replies
Ernie ChanSaturday, January 9, 2010 at 8:36:00 PM EST
Hi Anonymous,
I am not sure how complicated your trading algorithm is, and how many bars you are backtesting. I use Matlab for my backtest, and it never takes more than a few minutes for a portfolio of several thousand stocks on a very modest and old desktop computer.
Ernie
ReplyDelete
Replies
AnonymousSunday, January 10, 2010 at 9:24:00 AM EST
Dear Ernie,

The optimal strategy calls for 30-min bars and I use parfor loops in Matlab also. And object-oriented programming as well. So that leaves me puzzled as I have pretty much the top of the line retail PC and it always runs out of memory...
ReplyDelete
Replies
Ernie ChanSunday, January 10, 2010 at 9:47:00 AM EST
Anonymous,
I have not used Matlab's parallel computing toolbox before, so can't really comment on the efficacy of parfor. However, I do find that using OO tends to slow things down, and perhaps require more memory.
I recommend just straight-forward procedural programming when using Matlab.
Ernie
ReplyDelete
Replies
AndyWSaturday, January 23, 2010 at 2:19:00 AM EST
Hi Ernie - Andy here from Automated Trader magazine. Just a thought re Anon's PC prob. The Accelereyes Jacket program that enables MATLAB for GPU computing can make a huge difference. I'm in the process of scribbling a review of it for our next issue and the early signs are extremely promising. I'm using a machine with 3 Tesla C1060s in it - combining that with Jacket makes a big difference. Anon - apologies if you're already running this set up or using MATLAB's own beta GPU functionality - but just thought I'd mention it. Regds A
ReplyDelete
Replies
Ernie ChanFriday, January 27, 2012 at 11:26:00 AM EST
Hi Andy,
Thanks for mentioning the Jacket. Yes, I think Matlab's Parallel Computing toolbox also has GPU-computing capability. I wonder whether you any comparisons of the two?
Ernie
ReplyDelete
Replies