CLMar 13, 2025

Using Context to Improve Word Segmentation

arXiv:2503.10023v1

Originality Synthesis-oriented

AI Analysis

This work addresses how infants might learn to segment words from speech, but it is incremental as it extends existing models without introducing new methods.

The study tackled the problem of word segmentation in language acquisition by implementing unigram and bigram models, finding that the bigram model outperformed the unigram model in predicting word segmentation, consistent with prior research.

An important step in understanding how children acquire languages is studying how infants learn word segmentation. It has been established in previous research that infants may use statistical regularities in speech to learn word segmentation. The research of Goldwater et al., demonstrated that incorporating context in models improves their ability to learn word segmentation. We implemented two of their models, a unigram and bigram model, to examine how context can improve statistical word segmentation. The results are consistent with our hypothesis that the bigram model outperforms the unigram model at predicting word segmentation. Extending the work of Goldwater et al., we also explored basic ways to model how young children might use previously learned words to segment new utterances.

View on arXiv PDF

Similar