Still a Pain in the Neck: Evaluating Text Representations on Lexical Composition
This work addresses a fundamental problem in natural language processing for researchers and practitioners, though it is incremental as it evaluates existing methods rather than proposing new ones.
The paper tackled the challenge of building meaningful phrase representations due to lexical composition effects, finding that contextualized word embeddings outperform static ones but still fall far short of human performance, especially in recovering implicit information.
Building meaningful phrase representations is challenging because phrase meanings are not simply the sum of their constituent meanings. Lexical composition can shift the meanings of the constituent words and introduce implicit information. We tested a broad range of textual representations for their capacity to address these issues. We found that as expected, contextualized word representations perform better than static word embeddings, more so on detecting meaning shift than in recovering implicit information, in which their performance is still far from that of humans. Our evaluation suite, including 5 tasks related to lexical composition effects, can serve future research aiming to improve such representations.