Fine-grained Attention in Hierarchical Transformers for Tabular Time-series
This work addresses a specific bottleneck in modeling tabular time-series data, such as financial or healthcare records, for researchers and practitioners in machine learning, though it is incremental as it builds on existing hierarchical transformer methods.
The authors tackled the problem of limited attention granularity in hierarchical transformers for tabular time-series data by proposing Fieldy, a fine-grained model that contextualizes fields at both row and column levels, resulting in improved performance on regression and classification tasks without increasing model size.
Tabular data is ubiquitous in many real-life systems. In particular, time-dependent tabular data, where rows are chronologically related, is typically used for recording historical events, e.g., financial transactions, healthcare records, or stock history. Recently, hierarchical variants of the attention mechanism of transformer architectures have been used to model tabular time-series data. At first, rows (or columns) are encoded separately by computing attention between their fields. Subsequently, encoded rows (or columns) are attended to one another to model the entire tabular time-series. While efficient, this approach constrains the attention granularity and limits its ability to learn patterns at the field-level across separate rows, or columns. We take a first step to address this gap by proposing Fieldy, a fine-grained hierarchical model that contextualizes fields at both the row and column levels. We compare our proposal against state of the art models on regression and classification tasks using public tabular time-series datasets. Our results show that combining row-wise and column-wise attention improves performance without increasing model size. Code and data are available at https://github.com/raphaaal/fieldy.