First, some facts about COT:

1) CFTC collects the reports of the number of long and short futures and options contracts ("open interest") held by different types of firms by Tuesdays, and reports them every Friday by 4:30 CT.

2) Options positions are added to COT as if they were futures but adjusted by their deltas.

3) COT are then broken down into contracts held by different types of firms. The most familiar types are "Commercial" (e.g. an ethanol plant) and "Non-Commercial" (i.e. speculators).

4) Other types are "Spreaders" who hold calendar spreads, "Index traders", "Money Managers", etc. There are 9 mutually exclusive types in total.

Since we only have historical COT data from csidata.com, and they do not collect data on all these types, we have to restrict our present analysis only to Commercial and Non-Commercial. Also, beware that csidata tags a COT report by its Tuesday data collection date. As noted above, that information is unactionable until the following Sunday evening when the market re-opens.

A simple strategy would be to compute the ratio of long vs short COT for Non-Commercial traders. We buy the front contract when this ratio is equal to or greater than 3, exiting when the ratio drops to or below 1. We short the front contract when this ratio is equal to or less than 1/3, exiting when the ratio rises to or above 1. Hence this is a momentum strategy: we trade in the same direction as the speculators did. As most profitable futures traders are momentum traders, it would not be surprising this strategy could be profitable.

Over the period from 1999 to 2014, applying this strategy on CME soybean futures returns about 9% per annum, though its best period seems to be behind us already. I have plotted the cumulative returns below (click to enlarge).

I have applied this strategy to a few other agricultural commodities, but it doesn't seem to work on them. It is therefore quite possible that the positive result on soybeans is a fluke. Also, it is very unsatisfactory that we do not have data on the Money Managers (which include the all important CPOs and CTAs), since they would likely to be an important source of alpha. Of course, we can go directly to the cftc.gov, download all the historical reports in .xls format, and compile the data ourselves. But that is a project for another day.

===

**My Upcoming Talks and Workshops**

3/22-∞: "Algorithmic Trading of Bitcoins" pre-recorded online workshop.

**3/24-25: "Millisecond Frequency Trading" live online workshop.**

5/13-14: "Mean Reversion Strategies", "AI techniques in Trading" and "Portfolio Optimization" at Q-Trade Bootcamp 2015, Milan, Italy.

5/18-22: "Statistical Arbitrage", 'Quantitative Momentum", and "Millisecond Frequency Trading", London, UK.

===

**Managed Account Program Update**

Our FX Managed Account program has a net return of +7.68% in February (YTD: +8.06%).

===

Follow me on Twitter: @chanep

## 99 comments:

this may be helpful

https://www.quandl.com/data/CFTC/S_FO_ALL-Commitment-of-Traders-Soybeans-Futures-and-Options

with and API to MATLAB

Thanks, Bill - very useful!

Ernie

and the roll your own version here

http://www.cftc.gov/MARKETREPORTS/COMMITMENTSOFTRADERS/INDEX.HTM

Data on Money Managers can be found on the same COT but listed under Traders in Financial Futures, no?

'money managers' are a segment of the market participants in any futures market (ie soybeans, sp500, 10 year treasuries, etc). in soybeans there are comercials (farmers and ethanol plants), money managers (CTAs), and non reportable (the little guy). so 'money mgr' is a segamnt of the market participants not the market. look at the link in the first reply for an example of the different participants in the SB market.

Dear Ernie

I am not able to make the Q-Trade Bootcamp but I would love to see your presentation on Forecasting and Artificial Intelligence Based Strategies.

Is there a chance you could make this available to see please?

Will you be running a course on AI Based Strategies in the future?

Best Regards

Nick Kirk

Hi Nick,

Thank you for your interest.

I may run that as an online workshop in the future.

Please watch this space and email me in June for that.

Ernie

I will do. Many thanks.

I used these data but I have never found anything really powerful.

One of the problem is that data are delayed and that you can find some trades in non reportable.

I agree. maybe more of a market regime type indicator. In grains there are a few major market tops when 90% of the money managers are long but that only happens a few times every 5 years.

Hi Ernie,

Have you tried Markov regime switching regression for trading?

What is your comment?

I have tried regime switching models, and they didn't work well for me, mainly because the number of switches were small, and therefore highly vulnerable to data snooping bias.

However, check out http://jonathankinlay.com/index.php/2011/04/a-practical-application-of-regime-switching-models/

Ernie

Hi Ernie,

When do we decide to abandon a momentum strategy?

For any strategy, I would start cutting capital rapidly whenever its drawdown duration approaches its historical maximum.

Ernie

Hi Ernie,

I have five different strategies which I am combining into one strategy by weighting each strategy with the inverse of its volatility.

The five individual strategies each have skewness around zero but the resulting aggregate strategy turns out to have a skewness of -0.40.

Do you know how this is possible? I.e. do you know how skewness aggregates across strategies?

Thanks.

Indeed, I don't see how the sum of symmetric distributions can be asymmetric!

Maybe the net skewness is still zero within statistical errors?

Ernie

I found the answer to the skewness issue. The skewness of a portfolio is the average of the skewness of the individual assets plus the co-skewness of the assets. If the latter are negative enough, overall skewness can be negative. For example, individual stocks tend have positive skewness but a portfolio (e.g. market index) of them often have negative skewness.

See for example:

http://smgapps.bu.edu/smgnet/Personal/Faculty/Publication/pubUploads/skewness_v9.pdf?did=861

Interesting!

That essentially means that we cannot model this portfolio of stocks with a multidimensional distribution that has symmetric correlation matrix. Otherwise the extreme co-occurred positive events will cancel out the extreme co-occurred negative events. The resulting negative skewness means that negative events co-occur more often than positive events, even if individually they occur equally often. This of course makes sense. I wonder what kind of multi-dimensional distribution the paper used to model this asymmetric tail dependence?

Ernie

Hi Ernie,

In Markov switching model,

what is the difference between filtered

probabilities and smoothed probabilities?

Btw, do you know any code which can do Markov ARMA?

Marcelo Matlab code can only do AR.

In a HMM, filtered probability refers to the probability of being in a certain hidden state at time t given the observables from time=0 to t, and the smoothed probability refers to the probability of the hidden state, but at an earlier time k given those same observables from time=0 to t. (See http://en.wikipedia.org/wiki/Hidden_Markov_model#Filtering)

I am not sure what you meant exactly by Markov ARMA. All ARMA time series models are automatically Markov, since the value at t can depend only on the past max(p, q) periods. ARMA modelling can be found in Matlab's Econometrics toolbox, or in R's arima and Arima functions.

Ernie

Hi Ernie,

Quote from Marcelo document below

"The package also comes with a simple wrapper function for estimating

a general autoregressive markov switching model. In the current ver-

sion of the package, moving average terms and error correction models

are not supported."

Therefore, we cannot do general ARMA.

I have not used Marcelo's codes before, so I suggest Marcelo is the best person to ask regarding Markov ARMA.

Ernie

Hi Ernie,

streaming ticks data is very fast.

Before we can finish computation, new data arrive again.

What is the better way to store it?

How do we handle it properly?

There is a database vendor kx.com that specializes in storing streaming tick data.

Ernie

Hi Ernie,

Thanks for quick response.

But how do we handle it in real-time API?

For example, once we receive tick data, we may want to accumulate it to create a 10 seconds bars, but I find it is too fast to do that.

I think it is because datafeed keeps feeding data to my API, so the variable is changing. Then it trigger errors.

Do you know a better way to handle it?

If your program is not fast enough to handle the tick feed, you need to consider a multi-threaded program.

Ernie

Hi Ernie,

Thank you for response.

Yes, I use Java API with IB TWS.

So it is multi-threaded.

I find that I can store streaming tick data in a dynamic array, and compute them when they are ready.

Hi Ernie,

Have you tried snapshot data in IB?

I use Java API to record both SPY and ES myself today.

I find there are many outliers for SPY (suddenly jump from 206 to 208), but ES looks quite clean.

Do you know why?

What exactly do you mean by "outliers" and how do you know that they are not in the exchange feed as well?

Ernie

Hi Ernie,

Yes, outliers could be real deals traded in exchange too.

But I wonder if people really trade these deals.

For example, last_price and last_size below.

205.27, 208.37, 205.26

1, 5011, 5

Suddenly, it goes up to 208 with large size. I check IB chart and Yahoo Finance. Today's range is

204.93 - 206.81. The price never goes up to 208.

There are more than 20 points like this. They destroy the pattern.

Once I clean the data, they look as good as ES

Are they special deals(lump sum) traded in exchange, but not released officially? It looks very wired to me.

If the high of the day was 206.81 and this tick is 208, then this was an erroneous tick. It might be in the exchange feed too, but it was later cancelled/corrected by the exchange.

Cleaning it in your backtest data may or may not be a good idea, since I don't know how you could tell in live trading that this is an outlier!

Ernie

If you want to do ARMA with Markov. Look at this:

http://www.timeseriesmodelling.com/

It uses OX, but OX is fast and can be integrated with C++/C#/Java

Hi,

That sounds interesting.

It is said that Ox is an object-oriented matrix programming language with a comprehensive mathematical and statistical function library.

But how can it be integrated with Java?

Do you mean we can call Ox function/class from Java main program?

Do you know where their document is?

Thanks.

Great post for everyone!

Hi Ernie,

Have you tried to build constant volume bars?

Do you think they have look ahead bias?

It seems people usually pass extra volume to the next bar.

Yes, we discussed VPIN here: http://epchan.blogspot.com/2013/10/how-useful-is-order-flow-and-vpin.html

using volume bars.

I don't think splitting volume into 2 bars induce look-ahead bias. Most often volume bars are created "on-the-fly" and using historical data only to generate signals.

But check out my Twitter feed for a recent article throwing doubts on VPIN.

Ernie

Hi Ernie,

Thank you for response.

Recently, I try to build VPIN.

We can use either 1 min bars or constant-volume bars to build bulk-volume VPIN.

If we want to do higher frequency trading(intraday day), we may choose smaller volume size for build VPIN(average_trade_volume/200 or average_trade_volume/400).

However, I find that we do have some large size ticks, which are at least 2 times more than VPIN volume size.

I am a little bit confused here.

If we pass extra volume to the next VPIN, we get immediately 2,3,or 4 identical VPINs in a row.

Of course, if just for risk management, we can choose large size for VPIN(average_trade_volume/50), then this is not a serious problem.

Thanks.

Hi Ernie,

Moreover, when we use time bars, we can have 1s, 5s, 1 min bars (390 minutes a day).

However, when we build constant-volume bars, for example, we may want to choose smaller size, such as average_daily_volume/400 (like 1 min bars), average_daily_volume/1200, we may have problems.

I find that some tick sizes are at least 2 times larger than the volume bars size. It could be 4 times, 5 times or even more.

If we pass extra volume to the next constant-volume bar, we get immediately 2,3,or 4 identical volume bars in a row.

It is the same problem for building VPIN when we choose smaller builk size.

Thanks.

Why is it a problem to generate several consecutive volume bars due to a large tick size?

Ernie

Why is it a problem to generate several consecutive volume bars due to a large tick size?

Ernie

Hi Ernie,

I believe that it creates look ahead bias when we do some statistical analysis, such as ARMA.

The smaller the volume size we choose, the more consecutive identical VPIN bars we have. They all have the same open and close prices.

Also, even though there is only one VPIN volume bar created, the extra volume we pass to the next bar may be large (i.e. more than 1/2 size of volume).

How can there be look-ahead bias when you are creating and using volume bars in exactly this same way in live trading? Even though you are "predicting" that the prices of the next few bars are the same, it doesn't mean your algo will make any money on this "prediction", and thus no look-ahead bias is present.

Ernie

Hi Ernie,

Have you dealt with tick data (snapshot) in IB?

For Java API, IB use tickString() to return RTVolume data.

RTVolume is the API equivalent to opening the Time and Sales Window in Trader Workstation and viewing the updates in real time.

It updates fast ( 250 milliseconds).

I believe it is about multi-threading sharing the same array data and Synchronization.

I use array to collect data in tickString(), but when I want to compute it in main(), it creates errors.

I can print it, but I cannot do more complicated operations. I think it is because complicated operations take more time.

May I have your advice?

When you said "snapshot", I hope you haven't set the snapshot parameter in reqMktData() to true, otherwise subscription will be cancelled automatically after 1 tick is reported.

When a new tick arrives and calls the tickSize or tickPrice method, you need to do the analysis and take any trading action within 250ms. Otherwise, you need another thread to handle the analysis while the first thread just store the data.

Ernie

Hi Ernie

Can adding a strategy with a negative Sharpe ratio to a portfolio with positive Sharpe ratio increase overall Sharpe ratio?

Yes, if it is negatively correlated with other strategies.

Ernie

Hi Ernie,

Have you tried Scalping strategies before?

What is your comment?

Scalping is essential a market-making, mean-reverting strategy. It is very lucrative if done well and if you can avoid big losses during market crashes and risk-off periods.

Ernie

Hi Ernie,

Is IB fast enough to do 1 tick futures Scapling?

No, you cannot trade on IB if your strategy is based on ticks. You can't get lower than 1 or 2 ms latency on IB and you also don't have direct access to the futures exchanges. Try Mirus futures, Lightspeed, or Trading Technologies.

Ernie

Hi Ernie,

How about 2 or 3 ticks with US stocks scalping on IB?

Is it the same situation? Does IB have direct access to US stocks market?

Thanks.

Hi Ernie,

I just find this news

"NinjaTrader Acquires Mirus Futures, Launches NinjaTrader Brokerage."

I have little experience with NinjaTrader. Do you have any comment?

No, IB's latency is too high for any tick scalping in stocks. Lime Brokerage is more suited for that.

I heard Ninjatrader is a good platform for automated trading.

Ernie

Hi Ernie,

I just surf Trading Technologies website.

Their flagship software X_TRADER costs $500 per month, which can connect to futures exchanges all over the world.

However, I do not find information about margin account requirement(minimum $), and commission fees structure.

They said they are broker-neutral, but how does it work? I am confused.

Is their business model like Ninjatrader(both broker-neutal and data vendor neutral)? I do not find any information TT works with.

Thanks.

I myself have not used TT before, so I cannot advise you on their fees or configuration. It is best to contact their sales team directly for such detailed information.

Ernie

Hi Ernie,

Have you tried Lightspeed Gateway Co-location service? What is your comment?

What is Gateway Messaging Protocol?

I also have not traded on Lightspeed, so these questions are best directed at their sales team as well.

Ernie

Hi Ernie,

What checks in your opinion can be used to rank one mean reverting portfolio as being more like to produce higher returns than another.

Thanks

The half-life of the mean reversion can be used for such ranking. Usually the shorter the half-life, the more profitable it is.

Ernie

Thank you. How about persistence or stability of the relationship. Do you look at the stability of correlations amongst the instruments within a portfolio?

The stability of the relationship can be judged from the maximum drawdown and maximum drawdown duration of your backtest.

Ernie

Hi Ernie,

In your book you describe the Kalman filter. If I wanted to add a third parameter so for example the beta variable becomes a 3x1 matrix for each iteration is there any issue in doing so?

Thinking in terms of Multivariable regression here.

Sure, Kalman filter is applicable to multi-dimensional state vectors.

Ernie

Hi Ernie,

If the latency is less than 1 ms (Co-location), does that mean we can do 2 or 3 ticks scalping in US stocks market?

Hi Ernie,

What is the advantage if we find a stock returns follow normal distribution?

Yes, if your latency is less than 1ms, you can scalp ticks.

Ernie

The only advantage in finding a stock that has normal distribution of log returns is that Black-Scholes option pricing formula will work very well, and there shouldn't be much of a "volatility smile".

Ernie

Hi Ernie,

How many cores (CPUs) are there in your computer for real-time trading?

Thanks.

8.

Ernie

Hi Ernie,

I use IB paper account to trade SPY.

For 200 shares, I find commission is $1.00 for long, but $1.77 for short.

I thought commission should be $1.00 for both long/short.

Do you know why? Thanks.

You are right that long and short should have same commissions on IB. This is a question best sent to IB.

Ernie

Hi Ernie,

IB charges transaction fees only charged for sell orders too.

The formula is as follows.

USD 0.0000184*Value of Aggregate Sales

That is why short selling commission is higher.

https://www.interactivebrokers.com/en/index.php?f=commission&p=stocks1

Good to know the distinction between commissions and transaction fees on IB... thanks!

Ernie

Hi Ernie,

I use 250 milliseconds bars to backtesting a intraday strategy. There are about 25 trades a day.

How many days do I need to use to backtesting this strategy? Thanks.

Even for strategies that trade intraday, I recommend you backtest it since 2008 to see how it performed under extreme conditions. At least, you should see if it survived the 2010 flash crash.

Ernie

Hi Ernie,

If a stock pair only has 10 trades(6% profit) last year, would you trade it?

Why not?

You can always add more pairs later as you discover them.

Ernie

Hi Ernie,

What is uptick rule?

Do they still have this rule in US stock market? Thanks.

The latest version of uptick rule is called Regulation SHO, and is in effect since 2010. (http://www.sec.gov/news/press/2010/2010-26.htm)

"This alternative uptick rule is designed to restrict short selling from further driving down the price of a stock that has dropped more than 10 percent in one day. It will enable long sellers to stand in the front of the line and sell their shares before any short sellers once the circuit breaker is triggered. ... At that point, short selling would be permitted if the price of the security is above the current national best bid."

Ernie

Hi Ernie,

"... is above the current national best bid."

So we can short sell at best ask (market order)? Thanks.

No, a market short sell order will have resulted in an execution at best bid. The only way to short is to have a limit sell order at the best ask when the circuit breaker is triggered.

Ernie

Hi Ernie,

To compute toxic flow indicator, we need to compute CDF(VPIN).

Is this CDF an empirical CDF or a normal CDF function?

It should be a normal cdf.

Ernie

Hello!

I'm curious, are there reliable sources of past forex data to download?

I've found a few sources (google), but I'm suspicious of their legitimacy. Some sources seem kind of slimy and have a ton of adware...

It is best to get FX data from the broker you will be trading on, since FX markets are not consolidated.

But if you don't have an account at a FX broker, try http://ratedata.gaincapital.com/

Ernie

Hi Ernie,

Have you heard about ActiveTick?

They provide real-time and historical tick data, and it is not expensive.

Thanks for the tip.

How does it compare with IQFeed?

Ernie

Hi Ernie,

One IT guy compare them. Here is the link.

http://www.elitetrader.com/et/index.php?threads/activetick-vs-iqfeed-vs-ib-toftt.248841/

I think ActiveTick is cheaper than IQFeed because you need to pay extra fees (about $300, not refundable) for using IQFeed API. That is bad, and we do not know their API in advance.

Hi Ernie,

Have you read "Trade like a hedge fund" by James Altucher?

What is your comment?

Thanks.

No, I haven't. Do you recommend it? Is there any strategy that looks encouraging?

Ernie

Hi Ernie,

It looks interesting to me.

He introduces several different trading strategies (18) in a simple way, and he provides some backtesting results. The first strategy is gaps which is different from your version.

It may only take you 2 hours to read it. It is published in 2004, so we can test it like out-of-sample now.

Great - thanks for the pointers!

Ernie

Hi Erine

I m currently developing a pairs trading program by using Python and R and facing some difficulties and would like to seek your advice.

I find that the size of the historical tick data is huge, I have 3 years of sales and Traded quote, and I only choose to test the the active future contracts (say Feb, June, Oct etc), but the size is still too large to be load into the database in one go which makes me unable to run through a 3 years simulation in one go, someone suggests me to break a 3 years test into a 2 to 3 months test each time, and reload the next 2-3 months test data into the database until a 3 years test is done, but I find this solution is too troublesome and slow, especially when I have to simulate multiple times to try different settings

Would you mind sharing how you usually do in in C++ or Matlab.

Have you ever heard HDF5 format? it seems can handle huge amounts of numerical data

Thanks

Francis

Hi Erine

What is the reason of the high drawdown of your FX managed Fund at Yr2011?

Thank you

Yvonne

Hi Yvonne,

The steep drawdown in 2011 occurred before we imposed any stop loss, and in fact used an inferior trading system in many respects. We backtested our current system using the new stop loss on the 2011 data, and it wasn't hit. So if we were to trade the current system in 2011, the max drawdown will be less than 20%. Of course, hindsight is 20/20!

Ernie

Hi Francis,

Your friend's advice is correct: you do need to load the data, say, 3 months chunks, into memory and backtest them individually.

This need not be tedious since your program should just run a loop to read in these chunks, and save the backtest positions and returns for each chuck in-memory. After all the chunks are tested, you can run performance metrics on these aggregated returns.

You don't need any fancy data format for millisecond backtests.

Ernie

Hi Ernie,

Do you mind having a look at my strategy?

It is not so automatic but still relatively more than macro funds or alike

So. My strategy is:

1) Every Sunday check the COT report.

Mark the currencies as long or short.

2) Find inconsistency between the COT and the actual currency pair.

For example,if the traders positioning on a currency pair has peaked and sharply fallen yet the pair has risen - this is a short idea with 0.9% stop loss and profit at 2%.

I am a recent gradaute and unemployed and I do not yet have enough confidence that this strategy works, but I would appreciate if you left some of your experienced feedback and input.

Many thanks!

Hi Vahe,

It is not easy to say whether or not your strategy would work. Why not backtest it with historical data?

Ernie

Hello everybody,

I have found a few interesting things on YouTube about Order Flow Trading.

Here are the links below :

https://www.youtube.com/channel/UCW-m-lEeboc7OQ8pNTuWpBg

https://www.youtube.com/channel/UCEJQmAIhc83eqP6ymuwfL8g

https://www.youtube.com/channel/UC3HKlZ_7gxRgef9SCxu54Lw

https://www.youtube.com/user/OrderFlowAnalytics

Post a Comment