Probing What Different NLP Tasks Teach Machines about Function Word Comprehension
This work addresses the need for better evaluation of NLP models' understanding of function words, which is crucial for improving language processing tasks, but it is incremental as it builds on existing probing methods and datasets.
The authors tackled the problem of understanding how different NLP pretraining objectives affect comprehension of function words by creating nine challenge tasks through structural mutations of sentences, and found that language modeling performed best on average, with CCG supertagging and NLI showing comparable results and specific strengths like NLI aiding negation comprehension.
We introduce a set of nine challenge tasks that test for the understanding of function words. These tasks are created by structurally mutating sentences from existing datasets to target the comprehension of specific types of function words (e.g., prepositions, wh-words). Using these probing tasks, we explore the effects of various pretraining objectives for sentence encoders (e.g., language modeling, CCG supertagging and natural language inference (NLI)) on the learned representations. Our results show that pretraining on language modeling performs the best on average across our probing tasks, supporting its widespread use for pretraining state-of-the-art NLP models, and CCG supertagging and NLI pretraining perform comparably. Overall, no pretraining objective dominates across the board, and our function word probing tasks highlight several intuitive differences between pretraining objectives, e.g., that NLI helps the comprehension of negation.