LGCLAug 3, 2020

DeLighT: Deep and Light-weight Transformer

arXiv:2008.00623v236 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the computational efficiency issue for users of large transformer models, though it is incremental as it builds on existing transformer architectures.

The paper tackles the problem of reducing the parameter count in transformer models while maintaining performance, achieving similar or better results with 2 to 3 times fewer parameters on average in machine translation and language modeling tasks.

We introduce a deep and light-weight transformer, DeLighT, that delivers similar or better performance than standard transformer-based models with significantly fewer parameters. DeLighT more efficiently allocates parameters both (1) within each Transformer block using the DeLighT transformation, a deep and light-weight transformation, and (2) across blocks using block-wise scaling, which allows for shallower and narrower DeLighT blocks near the input and wider and deeper DeLighT blocks near the output. Overall, DeLighT networks are 2.5 to 4 times deeper than standard transformer models and yet have fewer parameters and operations. Experiments on benchmark machine translation and language modeling tasks show that DeLighT matches or improves the performance of baseline Transformers with 2 to 3 times fewer parameters on average. Our source code is available at: \url{https://github.com/sacmehta/delight}

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes