Friday, June 06, 2025

Cross-Attention for Cross-Asset Applications: How to mash up features from multiple assets

In the previous blog post, we saw how we can apply self-attention transformers to a matrix of time series features of a single stock. The output of that transformer is a transformed feature vector r of dimension 768 × 1. 768 is the result of 12 × 64: all the lagged features are concatenated / flattened into one vector. 12 is the number of lagged months, and 64 is the dimension of the embedding space for the 52 features we constructed.

What if we have a portfolio of many stocks whose returns we want to predict, or whose allocations we want to optimize? That’s where cross-attention transformers come in. The purpose of cross-attention is that a feature in an asset i may be relevant as a context to a feature of an asset j. 

... Read more ...

No comments: