Measuring Memorization Effect in Word-Level Neural Networks Probing
This work addresses a reliability issue in NLP probing studies, offering a tool to improve result interpretation, though it is incremental as it builds on existing efforts to minimize memorization.
The authors tackled the problem of measuring memorization in word-level neural network probing, where classifiers may memorize labels rather than extract linguistic abstractions, leading to false positives; they proposed a method to quantify memorization using symmetric test sets of seen versus unseen words, demonstrating it on a part-of-speech probing case study for a neural machine translation encoder.
Multiple studies have probed representations emerging in neural networks trained for end-to-end NLP tasks and examined what word-level linguistic information may be encoded in the representations. In classical probing, a classifier is trained on the representations to extract the target linguistic information. However, there is a threat of the classifier simply memorizing the linguistic labels for individual words, instead of extracting the linguistic abstractions from the representations, thus reporting false positive results. While considerable efforts have been made to minimize the memorization problem, the task of actually measuring the amount of memorization happening in the classifier has been understudied so far. In our work, we propose a simple general method for measuring the memorization effect, based on a symmetric selection of comparable sets of test words seen versus unseen in training. Our method can be used to explicitly quantify the amount of memorization happening in a probing setup, so that an adequate setup can be chosen and the results of the probing can be interpreted with a reliability estimate. We exemplify this by showcasing our method on a case study of probing for part of speech in a trained neural machine translation encoder.