CLSep 15, 2025

SENSE models: an open source solution for multilingual and multimodal semantic-based tasks

arXiv:2509.12093v16 citationsh-index: 11Has Code
Originality Synthesis-oriented
AI Analysis

This work provides an incremental improvement for researchers and practitioners in natural language processing and speech processing by offering an open-source tool for semantic alignment across languages and modalities.

The paper tackles the problem of multilingual and multimodal semantic tasks by introducing SENSE, an open-source model that updates the SAMU-XLSR framework with a stronger teacher text model and better initial speech encoder, achieving highly competitive performance in experiments.

This paper introduces SENSE (Shared Embedding for N-lingual Speech and tExt), an open-source solution inspired by the SAMU-XLSR framework and conceptually similar to Meta AI's SONAR models. These approaches rely on a teacher-student framework to align a self-supervised speech encoder with the language-agnostic continuous representations of a text encoder at the utterance level. We describe how the original SAMU-XLSR method has been updated by selecting a stronger teacher text model and a better initial speech encoder. The source code for training and using SENSE models has been integrated into the SpeechBrain toolkit, and the first SENSE model we trained has been publicly released. We report experimental results on multilingual and multimodal semantic tasks, where our SENSE model achieves highly competitive performance. Finally, this study offers new insights into how semantics are captured in such semantically aligned speech encoders.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes