Attention Mechanism with Energy-Friendly Operations
This work addresses energy efficiency for NLP models, which is an incremental improvement focusing on reducing computational costs.
The paper tackles the high energy consumption of attention mechanisms in NLP by replacing multiplications with energy-friendly operations like selective operations or additions, achieving competitive accuracy while saving 99% energy in alignment calculation and 66% overall on machine translation tasks.
Attention mechanism has become the dominant module in natural language processing models. It is computationally intensive and depends on massive power-hungry multiplications. In this paper, we rethink variants of attention mechanism from the energy consumption aspects. After reaching the conclusion that the energy costs of several energy-friendly operations are far less than their multiplication counterparts, we build a novel attention model by replacing multiplications with either selective operations or additions. Empirical results on three machine translation tasks demonstrate that the proposed model, against the vanilla one, achieves competitable accuracy while saving 99\% and 66\% energy during alignment calculation and the whole attention procedure. Code is available at: https://github.com/NLP2CT/E-Att.