CLSep 15, 2021

ARCH: Efficient Adversarial Regularized Training with Caching

Simiao Zuo, Chen Liang, Haoming Jiang, Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, Tuo Zhao

arXiv:2109.07048v230.7662 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses efficiency issues for researchers and practitioners using adversarial regularization in NLP tasks, though it is incremental as it builds on existing methods.

The paper tackles the computational expense of adversarial regularization in NLP by proposing ARCH, which caches perturbations every few epochs and uses a KNN strategy to reduce memory usage, achieving up to 70% time savings and often better model generalization.

Adversarial regularization can improve model generalization in many natural language processing tasks. However, conventional approaches are computationally expensive since they need to generate a perturbation for each sample in each epoch. We propose a new adversarial regularization method ARCH (adversarial regularization with caching), where perturbations are generated and cached once every several epochs. As caching all the perturbations imposes memory usage concerns, we adopt a K-nearest neighbors-based strategy to tackle this issue. The strategy only requires caching a small amount of perturbations, without introducing additional training time. We evaluate our proposed method on a set of neural machine translation and natural language understanding tasks. We observe that ARCH significantly eases the computational burden (saves up to 70% of computational time in comparison with conventional approaches). More surprisingly, by reducing the variance of stochastic gradients, ARCH produces a notably better (in most of the tasks) or comparable model generalization. Our code is available at https://github.com/SimiaoZuo/Caching-Adv.

View on arXiv PDF Code

Similar