From Hawkes Processes to Attention: Time-Modulated Mechanisms for Event Sequences
This work addresses the problem of modeling event sequences with heterogeneous temporal effects for applications in medical, social, commercial, and financial domains, representing an incremental improvement over existing methods.
The paper tackled the limitation of Transformer-based methods in capturing heterogeneous and type-specific temporal effects in Marked Temporal Point Processes (MTPPs) by introducing Hawkes Attention, a novel attention operator derived from multivariate Hawkes process theory, which achieved better performance compared to baselines.
Marked Temporal Point Processes (MTPPs) arise naturally in medical, social, commercial, and financial domains. However, existing Transformer-based methods mostly inject temporal information only via positional encodings, relying on shared or parametric decay structures, which limits their ability to capture heterogeneous and type-specific temporal effects. Inspired by this observation, we derive a novel attention operator called Hawkes Attention from the multivariate Hawkes process theory for MTPP, using learnable per-type neural kernels to modulate query, key and value projections, thereby replacing the corresponding parts in the traditional attention. Benefited from the design, Hawkes Attention unifies event timing and content interaction, learning both the time-relevant behavior and type-specific excitation patterns from the data. The experimental results show that our method achieves better performance compared to the baselines. In addition to the general MTPP, our attention mechanism can also be easily applied to specific temporal structures, such as time series forecasting.