I Have No Mouth, and I Must Rhyme: Uncovering Internal Phonetic Representations in LLaMA 3.2
This work addresses the problem of understanding internal representations in large language models for researchers in AI and linguistics, but it is incremental as it focuses on a specific model and task.
The study investigated how Llama-3.2-1B-Instruct represents token-level phonetic information, finding that it uses a rich internal model of phonemes for tasks like rhyming and learns a vowel model similar to the human IPA chart without direct supervision.
Large language models demonstrate proficiency on phonetic tasks, such as rhyming, without explicit phonetic or auditory grounding. In this work, we investigate how \verb|Llama-3.2-1B-Instruct| represents token-level phonetic information. Our results suggest that Llama uses a rich internal model of phonemes to complete phonetic tasks. We provide evidence for high-level organization of phoneme representations in its latent space. In doing so, we also identify a ``phoneme mover head" which promotes phonetic information during rhyming tasks. We visualize the output space of this head and find that, while notable differences exist, Llama learns a model of vowels similar to the standard IPA vowel chart for humans, despite receiving no direct supervision to do so.