Attention Mechanism in Randomized Time Warping
This work provides a novel interpretation linking RTW to attention mechanisms, potentially benefiting researchers in motion recognition and sequential pattern analysis, though it is incremental as it builds on existing RTW and Transformer methods.
The paper shows that Randomized Time Warping (RTW) can be interpreted as a self-attention mechanism, similar to Transformers, and demonstrates that RTW achieves a 5% performance improvement over Transformer on the Something-Something V2 dataset.
This paper reveals that we can interpret the fundamental function of Randomized Time Warping (RTW) as a type of self-attention mechanism, a core technology of Transformers in motion recognition. The self-attention is a mechanism that enables models to identify and weigh the importance of different parts of an input sequential pattern. On the other hand, RTW is a general extension of Dynamic Time Warping (DTW), a technique commonly used for matching and comparing sequential patterns. In essence, RTW searches for optimal contribution weights for each element of the input sequential patterns to produce discriminative features. Although the two approaches look different, these contribution weights can be interpreted as self-attention weights. In fact, the two weight patterns look similar, producing a high average correlation of 0.80 across the ten smallest canonical angles. However, they work in different ways: RTW attention operates on an entire input sequential pattern, while self-attention focuses on only a local view which is a subset of the input sequential pattern because of the computational costs of the self-attention matrix. This targeting difference leads to an advantage of RTW against Transformer, as demonstrated by the 5\% performance improvement on the Something-Something V2 dataset.