Arijit Gupta

0.2CLMay 11, 2021Code

Towards Using Diachronic Distributed Word Representations as Models of Lexical Development

Arijit Gupta, Rajaswa Patil, Veeky Baths

Recent work has shown that distributed word representations can encode abstract information from child-directed speech. In this paper, we use diachronic distributed word representations to perform temporal modeling and analysis of lexical development in children. Unlike all previous work, we use temporally sliced corpus to learn distributed word representations of child-speech and child-directed speech under a curriculum-learning setting. In our experiments, we perform a lexical categorization task to plot the semantic and syntactic knowledge acquisition trajectories in children. Next, we perform linear mixed-effects modeling over the diachronic representational changes to study the role of input word frequencies in the rate of word acquisition in children. We also perform a fine-grained analysis of lexical knowledge transfer from adults to children using Representational Similarity Analysis. Finally, we perform a qualitative analysis of the diachronic representations from our model, which reveals the grounding and word associations in the mental lexicon of children. Our experiments demonstrate the ease of usage and effectiveness of diachronic distributed word representations in modeling lexical development.

5.6IRJul 21, 2015

Random mappings designed for commercial search engines

Roger Donaldson, Arijit Gupta, Yaniv Plan et al.

We give a practical random mapping that takes any set of documents represented as vectors in Euclidean space and then maps them to a sparse subset of the Hamming cube while retaining ordering of inter-vector inner products. Once represented in the sparse space, it is natural to index documents using commercial text-based search engines which are specialized to take advantage of this sparse and discrete structure for large-scale document retrieval. We give a theoretical analysis of the mapping scheme, characterizing exact asymptotic behavior and also giving non-asymptotic bounds which we verify through numerical simulations. We balance the theoretical treatment with several practical considerations; these allow substantial speed up of the method. We further illustrate the use of this method on search over two real data sets: a corpus of images represented by their color histograms, and a corpus of daily stock market index values.

Arijit Gupta

2 Papers