Stochastic dynamics of lexicon learning in an uncertain and nonuniform world
This addresses the problem of lexicon acquisition in computational linguistics and cognitive science, offering insights into efficient learning mechanisms, though it is incremental in refining existing models.
The paper investigates the time required for a language learner to correctly identify all word meanings in a lexicon under conditions of uncertainty and nonuniform meaning distributions, showing that basic cross-situational learning can be inefficient, but with an assumption of no shared meanings, a phase transition leads to maximally-efficient learning with minimal time.
We study the time taken by a language learner to correctly identify the meaning of all words in a lexicon under conditions where many plausible meanings can be inferred whenever a word is uttered. We show that the most basic form of cross-situational learning - whereby information from multiple episodes is combined to eliminate incorrect meanings - can perform badly when words are learned independently and meanings are drawn from a nonuniform distribution. If learners further assume that no two words share a common meaning, we find a phase transition between a maximally-efficient learning regime, where the learning time is reduced to the shortest it can possibly be, and a partially-efficient regime where incorrect candidate meanings for words persist at late times. We obtain exact results for the word-learning process through an equivalence to a statistical mechanical problem of enumerating loops in the space of word-meaning mappings.