Toshiki Nakai

CL
h-index17
3papers
5citations
Novelty43%
AI Score41

3 Papers

CLApr 2
Generation-Step-Aware Framework for Cross-Modal Representation and Control in Multilingual Speech-Text Models

Toshiki Nakai, Varsha Suresh, Vera Demberg

Multilingual speech-text models rely on cross-modal language alignment to transfer knowledge between speech and text, but it remains unclear whether this reflects shared computation for the same language or modality-specific processing. We introduce a generation-step-aware framework for evaluating cross-modal computation that (i) identifies language-selective neurons for each modality at different decoding steps, (ii) decomposes them into language-representation and language-control roles, and (iii) enables cross-modal comparison via overlap measures and causal intervention, including cross-modal steering of output language. Applying our framework to SeamlessM4T v2, we find that cross-modal language alignment is strongest at the first decoding step, where language-representation neurons are shared across modalities, but weakens as generation proceeds, indicating a shift toward modality-specific autoregressive processing. In contrast, language-control neurons identified from speech transfer causally to text generation, revealing partially shared circuitry for output-language control that strengthens at later decoding steps. These results show that cross-modal processing is both time- and function-dependent, providing a more nuanced view of multilingual computation in speech-text models.

CLJul 27, 2025
What Language(s) Does Aya-23 Think In? How Multilinguality Affects Internal Language Representations

Katharina Trinley, Toshiki Nakai, Tatiana Anikina et al.

Large language models (LLMs) excel at multilingual tasks, yet their internal language processing remains poorly understood. We analyze how Aya-23-8B, a decoder-only LLM trained on balanced multilingual data, handles code-mixed, cloze, and translation tasks compared to predominantly monolingual models like Llama 3 and Chinese-LLaMA-2. Using logit lens and neuron specialization analyses, we find: (1) Aya-23 activates typologically related language representations during translation, unlike English-centric models that rely on a single pivot language; (2) code-mixed neuron activation patterns vary with mixing rates and are shaped more by the base language than the mixed-in one; and (3) Aya-23's languagespecific neurons for code-mixed inputs concentrate in final layers, diverging from prior findings on decoder-only models. Neuron overlap analysis further shows that script similarity and typological relations impact processing across model types. These findings reveal how multilingual training shapes LLM internals and inform future cross-lingual transfer research.

CLOct 3, 2025
TRepLiNa: Layer-wise CKA+REPINA Alignment Improves Low-Resource Machine Translation in Aya-23 8B

Toshiki Nakai, Ravi Kiran Chikkala, Lena Sophie Oberkircher et al.

The 2025 Multimodal Models for Low-Resource Contexts and Social Impact (MMLoSo) Language Challenge addresses one of India's most pressing linguistic gaps: the lack of resources for its diverse low-resource languages (LRLs). In this study, we investigate whether enforcing cross-lingual similarity in specific internal layers of a decoder-only multilingual large language model (LLM) can improve translation quality from LRL to high-resource language (HRL). Specifically, we combine Centered Kernel Alignment (CKA), a similarity metric that encourages representations of different languages to align, with REPINA, a regularization method that constrains parameter updates to remain close to the pretrained model, into a joint method we call TRepLiNa. In this research project, we experiment with zero-shot, few-shot, and fine-tuning settings using Aya-23 8B with QLoRA across MMLoSo shared task language pairs (Mundari, Santali, Bhili) with Hindi/English pivots. Our results show that aligning mid-level layers using TRepLiNa (CKA+REPINA) is a low-cost, practical approach to improving LRL translation, especially in data-scarce settings.