LGNEJan 31, 2024

Positional Encoding Helps Recurrent Neural Networks Handle a Large Vocabulary

arXiv:2402.00236v59 citationsh-index: 1Trans. Mach. Learn. Res.
Originality Incremental advance
AI Analysis

This addresses a specific problem in natural language processing for researchers and practitioners by showing an incremental benefit in RNN performance.

The study found that adding positional encoding to recurrent neural networks (RNNs) improves their learning, particularly for large vocabularies with low-frequency tokens, by stabilizing gradients where vanilla RNNs struggle.

This study reports an unintuitive finding that positional encoding enhances learning of recurrent neural networks (RNNs). Positional encoding is a high-dimensional representation of time indices on input data. Most famously, positional encoding complements the capabilities of Transformer neural networks, which lack an inherent mechanism for representing the data order. By contrast, RNNs can encode the temporal information of data points on their own, rendering their use of positional encoding seemingly redundant/unnecessary. Nonetheless, investigations through synthetic benchmarks reveal an advantage of coupling positional encoding and RNNs, especially for handling a large vocabulary that yields low-frequency tokens. Further scrutinization unveils that these low-frequency tokens destabilizes the gradients of vanilla RNNs, and the positional encoding resolves this instability. These results shed a new light on the utility of positional encoding beyond its canonical role as a timekeeper for Transformers.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes