Thursday, October 24, 2013

How Useful is Order Flow and VPIN?

Can short-term price movement be predicted? (I am speaking of  seconds or minutes here.) This is a question not only relevant to high frequency traders, but to every long-term investor as well. Even if  one plans to buy and hold a stock for years,  nobody likes to suffer short-term negative P&L immediately after entry into position.

One short-term prediction method that has long found favor with academic researchers and traders alike is order flow. Order flow is just signed transaction volume: if a transaction of 100 shares is classified as a "buy", the order flow is +100; if it is classified as a "sell", the order flow is -100. This might strike some as rather strange: every transaction has a buyer and seller, so what does it mean by a "buy" or a "sell"? Well, the "buyer" is defined as the one who is the "aggressor", i.e. one that is using a market order to buy at the ask price. (And vice versa for the seller, whom I will henceforth omit in this discussion.) The intuitive reason why a series of large "buy" market orders are predictive of short-term price increase is that if someone is so eager to go long, s/he is likely to know something about the market that others don't (either due to superior fundamental knowledge or technical model), so we better join her/him! Such superior traders are often called "informed traders", and their order flow is often called "toxic flow". Toxic, that is, to the uninformed market maker.

In theory, if one has a tick data feed, one can tell whether an execution is a "buy" or "sell" by comparing the trade price with the bid and ask price: if the trade price is equal to the ask, it is a "buy". This is called the "Quote Rule". But in practice, there is a hitch. If the bid and ask prices change quickly, a buy market order may end up buying at the bid price if the market has fortuitously moved lower since the order was sent. Besides, perhaps 1/3 of trading in the US equities markets take place in dark pools or via hidden orders, so the quotes are simply invisible and order flow non-computable. So this classification scheme is not foolproof. Therefore, a number of researchers (see "Flow Toxicity and Volatility in a High Frequency World" by Easley, et. al.) proposed an alternative, "easier", method to compute order flow. Instead of checking the trade price of each tick, they just need the "open" and "close" trade prices of a bar, preferably a volume bar, and assign a fraction of the volume in that bar to "buy" or "sell" depending on whether the close price is higher or lower than the open price. (The assignment formula is based on the cumulative probability density of a Gaussian distribution, which incidentally models price changes of volume bars, but not time bars, pretty well.) The absolute difference between buy and sell volume expressed as a fraction of the total volume is called "VPIN" by the authors, or Volume-Synchronized Probability of Informed Trading. The higher VPIN is, the more likely we will experience short-term momentum due to informed trading.

Theory and intuition aside, how well does order flow work in practice as a short-term predictor in various markets? And how predictive is VPIN as compared to the old Quote Rule?  In my experience, while this indicator is predictive of price change, the change is often too small to overcome transaction costs including the bid-ask spread. And more disturbingly, in those markets where both Quote Rule and VPIN should work (e.g. futures markets), VPIN has so far underperformed Quote Rule, despite (?) it being patented and highly touted. I have informally polled other investment professionals on their experience, and the answer usually come back indifferent as well.

Do you have live experience with VPIN? Or more generally, do you find strategies built using volume bars superior to those using time bars? If so, please leave us your comments!

===

My online Quantitative Momentum Strategies workshop will be offered in December. Please visit epchan.com/my-workshops for registration details.


31 comments:

Anonymous said...

Check out the recent academic literature on this topic- the concept of v p i n has been debunked by well known econometricians.

the whole scheme is a joke

Ernie Chan said...

Anon,
Thanks for your input. Do you have a link to a relevant paper debunking it?
Ernie

Anonymous said...

Ernie,

I agree with you. I have read a lot on VPIN but have not found it to work well for trading. Signed order flow seems to work better. This is a bit disappointing since the VPIN literature is quite cool. There are plenty of VPIN papers on SSRN that supports it and that refutes it.

GekkoQuant said...

Volume bars are an interesting concept, they contain much richer information than just time bars alone. Volume bars naturally sample the price faster at important parts of the day.

One pitfall to lookout for during backtesting is to check that when a new volume bar is formed (and some relevant entry exit critria is met) it's during market hours.

Gary said...

Marcos Lopez de Prado one of the makers of the VPIN has many videos which he claims are in real time (you tube).... but I have not seen anyone replicate the same results yet

experquisite said...

Re: Volume bars, I did a brief study of SPY trades ordered by tick count (not volume sum), for normality:

http://experquisite.tumblr.com/post/62621839837/spy-1000-tick-open-close-log-returns-over-the-last

The series definitely seem better behaved when organized in some fashion of volume-clock/tick-clock/event-time/etc, but I have yet to attempt to adapt a co-integrated pairs trading scheme to in-homogeneous timescales.

DR said...

The HFT shops that have good predictive models (i.e. are frequently trading in a directional way) are way more sophisticated than VPIN. They do trade on order flow (among other things), but in a deeply complex derived from massive datasets, tons of computing power and state of the art machine learning techniques.

The hope of a simple indicator like VPIN beating this is pretty small. I'd expect you could find some evidence that it works well pre-2007, but I highly doubt there'd be any alpha left in any market of reasonable liquidity.

Ernie Chan said...

Hi all,
Hat tip to a reader Mark who has shared with us these 2 papers:

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1881731

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2062450

Ernie

Ernie Chan said...

experquisite:
Yes, I do agree that returns based on volume bars are more normally distributed than those based on time bars.
Ernie

Anonymous said...

Hi Ernie,

thank you very much for sharing your comments on this interesting topic on which, in Italy, me and other academics are working since the mid of 2000s (an example it's the "quite old" working paper published in 2010
http://www.eea-esem.com/files/papers/eea-esem/2011/885/Mosconi_Carlini_Manzoni_Oslo.pdf).

I really appreciate the fact that you point out the impossibility to generate profits net of transaction costs, considering that if you are able to gain, from a complete trading cycle, a gross profit equal to the sum of the bid ask spread plus one tick movement of the price, you are really skilled.

But I'm still doubtful on a specific evidence: if you say that a trading rule based on order imbalance is not able to generate profits because gross profits are lower than transaction costs, in which way do high frequency traders gain profits generated by their activity of arbitrage obtained from same financial products traded on different venues? I don't think that high frequency traders, making profits from this type of arbitrage trading activity, are able to generate gross profits higher than bid ask spread + one tick of price movement. Moreover I'm not really sure that the estimation of transaction costs could be easily generalized, because it depends on the characteristics of each market operator.

I would like also to share another point: actually, no one in academic literature has showed that the main intuitions of Easley, De Prado and O'Hara are not based on VPIN indicator (that is quite similar to simple indicators based on order imbalance used by other authors in the recent past), but on the innovative intuition of discretize high frequency data using volume buckets rather than clock time: the bucketing algorithm choice is the main explanation of why order imbalance hardly improves its predictive capabilities on financial returns

Ernie Chan said...

Hi Anon,
Thanks for your input!

HFT is able to take advantage of order flow plus "exploratory trading" to exploit short term returns. Please google: Clark-Joseph, 2013 “Exploratory Trading”.

Actually, many traders know of using volume bars instead of time bars prior to this paper, but I agree this may be a better way to backtest many strategies.

Ernie

Anonymous said...

Ernie, thanks for your reply.

I would like to ask you one more thing: how much you estimate a sufficient gross profit (in terms of basis point) before transaction costs per trading cycle (that's so say open buy\sell and then close sell\buy), in order to make profitable a trading rule based on order flow?

Ernie Chan said...

Anon,
The main transaction cost is the bid-ask spread, which for ES is about 2 bps for a round trip. Commissions, exchange fee, regulatory fee totals about 0.3 bps one way for a retail account. So you need a profit per trade of about 1.3 bps.

Ernie

Anonymous said...

Thank you again Ernie!

I think that if I would like to gain profits with order flow I must transform bid ask spread from a center of cost to a center of profit. Or, maybe, try to be neutral to bid-ask spread (that's to say when I open the position I can use market orders (directly hitting bid or ask on limit order book), but when I close the original opened position I must use limit orders - in this way I'll be neutral to bid-ask spread, because first time): in this way, if I'm able to statistically forecast the short term direction of price movement, gaining profit only from 1 tick price movement, if this 1 tick price movement is higher than commissions + exchange fee + regulatory fee I can have success in this kind of trading strategy.

Anonymous said...

hi Ernie,

If we get portfolio margin in IB, and trade long/short ETFs, could we get about 6 times leverage for overnight positions?

Or if we trade intraday long/short ETFs, could we get even higher leverage?

Ernie Chan said...

Hi Anon,
IB's portfolio margin is based on the exact composition of your portfolio. So yes, it is theoretically possible to get x6 intraday and/or overnight leverage with very safe constituents (e.g. long-short large cap ETF). But it all depends on running their risk model on your specific portfolio.

Ernie

Anonymous said...

hi Ernie,

In your new book,you mention about "Danger of Data errors."

You said that broker's data feed cause some losing trades, so you switched data feed to a third-party provider.
May I ask which is this 3rd party provider? Which real-time data feed is stable and cheap?
I think I cannot afford Bloomberg now.

I guess the broker you mentioned here is IB.

Ernie Chan said...

Hi Anon,
I have been told that IQFeed is a good one.
Ernie

Anonymous said...

hi Ernie,

Do you do cross validation?

For Kalman filter, as you mentioned before, there is no need to separate the data into "training" and "test" sets. Therefore, how do we do cross validation?

Ernie Chan said...

Hi Anon,
Even in Kalman Filter, you still need a separate training set for parameters that define the initial distributions for the state and observed variables.

Cross-validation is a different way to separate training and test data, since it involves dividing the data into many subsets and picking each subset as the test data in turn. This method is not particularly suitable for Kalman Filter because the time series won't be the same if you introduce gaps into the sequence of prices.

Ernie

Anonymous said...

Hi Ernie,

In your book, you mentioned "Primary vs. Consolidated stock prices."

Could we get historical prices from the primary exchanges from IQfeed?

I guess it shall be ok for tick data because they usually have "Exchange column."

I wonder if it is ok for 1 minute bars or end-of-day.

Ernie Chan said...

Hi Anon,
I personally have not used IQFeed, so I don't know if they provide primary exchange data. If they do, I am sure that it should be available both for 1-min bars and EOD prices.
Ernie

Anonymous said...

hi Ernie,

In IB, when we download historical stock data, we can choose "Primary Exchange" instead of "SMART" in contract.
Does that mean we can get historical 1 minute bars directly from "NYSE" or "NASDAQ"?

Ernie Chan said...

Hi Anon,
Actually, I don't believe IB will let you download primary exchange data. I am not sure setting exchange=NYSE will work for historical data.
Ernie

Anonymous said...

Hi Ernie,
I have a trading system based on some technical indicators and I would now like to setup an quantitative trading algorithm for it so it can trade faster. Would you suggest someone I can work with?

J

Ernie Chan said...

Hi J,
My associate does work on such projects. Please email me so I can connect you both.
Ernie

Anonymous said...

I would take the other side to all of you fools. You're always looking at the wrong thing in the wrong places. Vpin, computer models. Who programs the models you mupets. Trading. Proper trading is done by people. There is your bigest and only clue...

sunnycalif said...

Hi Ernie..i think Volume is the heart beat of the market. I am retail and do not have sophisticated platforms available to Quants. However I have been studying Speed in the markets and it is Volume & Speed which turns the markets. Have been doing some indicator with sub-second precision on retail platform such as Tradestation for number of years.
Recently came across the concept of VPIN and I decided to also look into your thoughts.
The topic of VPIN interests me quite a bit as it involves balance/imbalance and Speed and study of Volume. I would like to know if you have any development recommendation of VPIN using retail platforms such as Ninjatrader which is coded using C# >Net. And how can I try and pursue this. Any insights will be appreciated. I did see you on a previous post mention probably of an associate and hence thought of checking.

Ernie Chan said...

Hi Sunnycalif,

The best way to compute order flow accurately is if the data feed has an aggressor flag to determine if a trade is buy or sell-side initiated. But such data feed is very expensive. So for a less accurate estimate, we can use the VPIN method, which only requires the volume and the last trade prices of bars. Any brokers' data feed would provide that. The method to compute order flow using just bar data is given in the papers I cited in the article above.
Ernie

Francois Laurent said...

This boils down to adverse selection and the probability of informed trading (PIN) and subsequently how to skew prices and quantities. Any good documents/papers on this topics that have had some practical uses? Especially on the last bit which is about integrating the PIN into a strategy (I have already a good estimate of my PIN).

Ernie Chan said...

Hi Francois,
Actually the Intraday Trading chapter of my new book Machine Trading has a complete implementation of a trading strategy using VPIN. Have you taken a look?
Ernie