Attention-Augmented Multilinear Networks For Time-series Classification
Tran, Thanh Dat
Permanent address of the item is
Time-series analysis has long been a challenging problem and has been studied extensively over the past decades. In fact, several phenomena possess the dynamic nature of time, with related data collected and expressed in the form of time-series data. While the current development of hardware and software infrastructures provides us a tremendous amount of data to build and validate our models, the noisy and stochastic nature observed in many data modalities still prevent us from having definite solutions. This is especially true for chaotic systems such as the stock market in which the involvement of actors with different goals in the feedback loop leads to complex behaviors. In this thesis, the author proposes a neural network layer design that incorporates the intuitive idea of bilinear mapping to multivariate time-series, as well as an attention module that enables the layer to automatically calculate and focus on important temporal instances. The contribution of the new design is two-fold. Firstly, the proposed layer is highly interpretable thanks to its ability to quantify the contribution of different instances encoding temporal information. In the post-training and inference phase, the attention quantities can be visualized to highlight the time instances of interest, opening up the opportunity for further analysis. Secondly, the new layer design requires both lower memory and fewer computations compared to the popular attention-based Long-Short-Term-Memory design, which is the state-of-the-art solution. In order to validate the proposed architecture, the author has conducted experiments on the problem of stock mid-price movement prediction using information available in Limit Order Book. In the algorithmic trading regime, an automated forecasting system is required to be both accurate and efficient since the market operates on nanosecond resolution. Our experimental results demonstrate that the proposed architecture establishes new state-of-the-art forecasting performances in the problem of interest while running much faster than previously proposed solutions.