Distributional Analysis of Polysemous Function Words
This work addresses a long-standing problem in computational linguistics for researchers, showing that modern methods can overcome traditional limitations in analyzing function words, though it is incremental as it builds on existing distributional frameworks.
The paper tackled the challenge of analyzing polysemous function words using distributional semantics, specifically applying contextualized word embeddings to the German reflexive pronoun 'sich' and found that these embeddings capture theoretically motivated word senses based on systematic linguistic usage patterns.
In this paper, we are concerned with the phenomenon of function word polysemy. We adopt the framework of distributional semantics, which characterizes word meaning by observing occurrence contexts in large corpora and which is in principle well situated to model polysemy. Nevertheless, function words were traditionally considered as impossible to analyze distributionally due to their highly flexible usage patterns. We establish that contextualized word embeddings, the most recent generation of distributional methods, offer hope in this regard. Using the German reflexive pronoun 'sich' as an example, we find that contextualized word embeddings capture theoretically motivated word senses for 'sich' to the extent to which these senses are mirrored systematically in linguistic usage.