CLAIFeb 19, 2021

Learning Dynamic BERT via Trainable Gate Variables and a Bi-modal Regularizer

arXiv:2102.09727v1
AI Analysis

This addresses deployment challenges for BERT on resource-limited devices, though it is incremental as it builds on existing dynamic inference techniques.

The paper tackles the high latency and computational cost of BERT models by proposing a dynamic inference method using trainable gate variables and a bi-modal regularizer, achieving reduced computational cost on the GLUE dataset with minimal performance drop.

The BERT model has shown significant success on various natural language processing tasks. However, due to the heavy model size and high computational cost, the model suffers from high latency, which is fatal to its deployments on resource-limited devices. To tackle this problem, we propose a dynamic inference method on BERT via trainable gate variables applied on input tokens and a regularizer that has a bi-modal property. Our method shows reduced computational cost on the GLUE dataset with a minimal performance drop. Moreover, the model adjusts with a trade-off between performance and computational cost with the user-specified hyperparameter.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes