A learning perspective on the emergence of abstractions: the curious case of phonemes
This research addresses the fundamental question of how abstract linguistic units emerge from raw sensory input, which is significant for understanding language acquisition in linguistically untrained users.
This paper investigates whether abstract phonemes can emerge from speech sound exposure using Memory-Based Learning (MBL) and Error-Correction Learning (ECL) models. The study found that ECL models can learn abstractions, reliably identifying part of the phone inventory and its grouping into traditional types from the input.
In the present paper we use a range of modeling techniques to investigate whether an abstract phone could emerge from exposure to speech sounds. In effect, the study represents an attempt for operationalize a theoretical device of Usage-based Linguistics of emergence of an abstraction from language use. Our quest focuses on the simplest of such hypothesized abstractions. We test two opposing principles regarding the development of language knowledge in linguistically untrained language users: Memory-Based Learning (MBL) and Error-Correction Learning (ECL). A process of generalization underlies the abstractions linguists operate with, and we probed whether MBL and ECL could give rise to a type of language knowledge that resembles linguistic abstractions. Each model was presented with a significant amount of pre-processed speech produced by one speaker. We assessed the consistency or stability of what these simple models have learned and their ability to give rise to abstract categories. Both types of models fare differently with regard to these tests. We show that ECL models can learn abstractions and that at least part of the phone inventory and grouping into traditional types can be reliably identified from the input.