LG CL CVOct 11, 2022

Designing Robust Transformers using Robust Kernel Density Estimation

Xing Han, Tongzheng Ren, Tan Minh Nguyen, Khai Nguyen, Joydeep Ghosh, Nhat Ho

arXiv:2210.05794v311.811 citationsh-index: 31

Originality Incremental advance

AI Analysis

This addresses robustness issues in Transformers for practitioners in NLP and computer vision, though it is incremental as it builds on existing robust KDE methods.

The paper tackles the problem of Transformer architectures being vulnerable to contaminated data by introducing robust self-attention mechanisms based on robust kernel density estimation, demonstrating robust performance in language modeling and image classification tasks while maintaining competitive results on clean datasets.

Recent advances in Transformer architectures have empowered their empirical success in a variety of tasks across different domains. However, existing works mainly focus on predictive accuracy and computational cost, without considering other practical issues, such as robustness to contaminated samples. Recent work by Nguyen et al., (2022) has shown that the self-attention mechanism, which is the center of the Transformer architecture, can be viewed as a non-parametric estimator based on kernel density estimation (KDE). This motivates us to leverage a set of robust kernel density estimation methods for alleviating the issue of data contamination. Specifically, we introduce a series of self-attention mechanisms that can be incorporated into different Transformer architectures and discuss the special properties of each method. We then perform extensive empirical studies on language modeling and image classification tasks. Our methods demonstrate robust performance in multiple scenarios while maintaining competitive results on clean datasets.

View on arXiv PDF

Similar