An Empirical Study of Memorization in NLP
This work addresses the gap in empirical verification of memorization for NLP practitioners, though it is incremental as it builds on existing long-tail theory.
This paper tackles the problem of verifying memorization behavior in NLP models, finding that top-memorized training instances are atypical and their removal causes a more significant drop in test accuracy compared to random removal, with a method developed to attribute memorization to features negatively correlated with class labels.
A recent study by Feldman (2020) proposed a long-tail theory to explain the memorization behavior of deep learning models. However, memorization has not been empirically verified in the context of NLP, a gap addressed by this work. In this paper, we use three different NLP tasks to check if the long-tail theory holds. Our experiments demonstrate that top-ranked memorized training instances are likely atypical, and removing the top-memorized training instances leads to a more serious drop in test accuracy compared with removing training instances randomly. Furthermore, we develop an attribution method to better understand why a training instance is memorized. We empirically show that our memorization attribution method is faithful, and share our interesting finding that the top-memorized parts of a training instance tend to be features negatively correlated with the class label.