Learning to Represent Words in Context with Multilingual Supervision
This work addresses the challenge of representing words in context for NLP applications, offering improvements in multiple downstream tasks, though it is incremental as it builds on existing neural methods.
The paper tackles the problem of learning context-sensitive word representations using a bidirectional LSTM architecture, achieving state-of-the-art results in semantic supersense prediction, low-resource machine translation, and lexical substitution tasks.
We present a neural network architecture based on bidirectional LSTMs to compute representations of words in the sentential contexts. These context-sensitive word representations are suitable for, e.g., distinguishing different word senses and other context-modulated variations in meaning. To learn the parameters of our model, we use cross-lingual supervision, hypothesizing that a good representation of a word in context will be one that is sufficient for selecting the correct translation into a second language. We evaluate the quality of our representations as features in three downstream tasks: prediction of semantic supersenses (which assign nouns and verbs into a few dozen semantic classes), low resource machine translation, and a lexical substitution task, and obtain state-of-the-art results on all of these.