CLApr 28, 2022

Attention Mechanism with Energy-Friendly Operations

Yu Wan, Baosong Yang, Dayiheng Liu, Rong Xiao, Derek F. Wong, Haibo Zhang, Boxing Chen, Lidia S. Chao

arXiv:2204.13353v131.9638 citationsh-index: 31Has Code

Originality Incremental advance

AI Analysis

This work addresses energy efficiency for NLP models, which is an incremental improvement focusing on reducing computational costs.

The paper tackles the high energy consumption of attention mechanisms in NLP by replacing multiplications with energy-friendly operations like selective operations or additions, achieving competitive accuracy while saving 99% energy in alignment calculation and 66% overall on machine translation tasks.

Attention mechanism has become the dominant module in natural language processing models. It is computationally intensive and depends on massive power-hungry multiplications. In this paper, we rethink variants of attention mechanism from the energy consumption aspects. After reaching the conclusion that the energy costs of several energy-friendly operations are far less than their multiplication counterparts, we build a novel attention model by replacing multiplications with either selective operations or additions. Empirical results on three machine translation tasks demonstrate that the proposed model, against the vanilla one, achieves competitable accuracy while saving 99\% and 66\% energy during alignment calculation and the whole attention procedure. Code is available at: https://github.com/NLP2CT/E-Att.

View on arXiv PDF Code

Similar