Dispatcher: A Message-Passing Approach To Language Modelling
This addresses efficiency bottlenecks in language modeling for NLP applications, though it appears incremental as it builds on existing paradigms.
The paper tackles language modeling by introducing a message-passing mechanism to replace self-attention for unidirectional sequence generation, achieving comparable perplexity to prior methods with improved computational efficiency (O(N logN) complexity vs. O(N) memory).
This paper proposes a message-passing mechanism to address language modelling. A new layer type is introduced that aims to substitute self-attention for unidirectional sequence generation tasks. The system is shown to be competitive with existing methods: Given N tokens, the computational complexity is O(N logN) and the memory complexity is O(N) under reasonable assumptions. In the end, the Dispatcher layer is seen to achieve comparable perplexity to prior results while being more efficient.