CLApr 8, 2020
Deep daxes: Mutual exclusivity arises through both learning biases and pragmatic strategies in neural networksKristina Gulordava, Thomas Brochhagen, Gemma Boleda
Children's tendency to associate novel words with novel referents has been taken to reflect a bias toward mutual exclusivity. This tendency may be advantageous both as (1) an ad-hoc referent selection heuristic to single out referents lacking a label and as (2) an organizing principle of lexical acquisition. This paper investigates under which circumstances cross-situational neural models can come to exhibit analogous behavior to children, focusing on these two possibilities and their interaction. To this end, we evaluate neural networks' on both symbolic data and, as a first, on large-scale image data. We find that constraints in both learning and selection can foster mutual exclusivity, as long as they put words in competition for lexical meaning. For computational models, these findings clarify the role of available options for better performance in tasks where mutual exclusivity is advantageous. For cognitive research, they highlight latent interactions between word learning, referent selection mechanisms, and the structure of stimuli of varying complexity: symbolic and visual.
CLJun 12, 2019
Putting words in context: LSTM language models and lexical ambiguityLaura Aina, Kristina Gulordava, Gemma Boleda
In neural network models of language, words are commonly represented using context-invariant representations (word embeddings) which are then put in context in the hidden layers. Since words are often ambiguous, representing the contextually relevant information is not trivial. We investigate how an LSTM language model deals with lexical ambiguity in English, designing a method to probe its hidden representations for lexical and contextual information about words. We find that both types of information are represented to a large extent, but also that there is room for improvement for contextual information.
CLMar 29, 2018
Colorless green recurrent networks dream hierarchicallyKristina Gulordava, Piotr Bojanowski, Edouard Grave et al.
Recurrent neural networks (RNNs) have achieved impressive results in a variety of linguistic processing tasks, suggesting that they can induce non-trivial properties of language. We investigate here to what extent RNNs learn to track abstract hierarchical syntactic structure. We test whether RNNs trained with a generic language modeling objective in four languages (Italian, English, Hebrew, Russian) can predict long-distance number agreement in various constructions. We include in our evaluation nonsensical sentences where RNNs cannot rely on semantic or lexical cues ("The colorless green ideas I ate with the chair sleep furiously"), and, for Italian, we compare model performance to human intuitions. Our language-model-trained RNNs make reliable predictions about long-distance agreement, and do not lag much behind human performance. We thus bring support to the hypothesis that RNNs are not just shallow-pattern extractors, but they also acquire deeper grammatical competence.