CLMMSDNov 13, 2025

HI-TransPA: Hearing Impairments Translation Personal Assistant

arXiv:2511.09915v2h-index: 2
Originality Incremental advance
AI Analysis

This work addresses daily communication challenges for hearing-impaired people by providing an assistive technology solution, though it appears incremental as it builds on existing multimodal and curriculum learning approaches.

The paper tackles the problem of communication barriers for hearing-impaired individuals by introducing HI-TransPA, an audio-visual personal assistant that fuses indistinct speech with lip dynamics for translation and dialogue, achieving state-of-the-art performance in literal accuracy and semantic fidelity on the HI-Dialogue dataset.

Hearing-impaired individuals often face significant barriers in daily communication due to the inherent challenges of producing clear speech. To address this, we introduce the Omni-Model paradigm into assistive technology and present HI-TransPA, an instruction-driven audio-visual personal assistant. The model fuses indistinct speech with lip dynamics, enabling both translation and dialogue within a single multimodal framework. To address the distinctive pronunciation patterns of hearing-impaired speech and the limited adaptability of existing models, we develop a multimodal preprocessing and curation pipeline that detects facial landmarks, stabilizes the lip region, and quantitatively evaluates sample quality. These quality scores guide a curriculum learning strategy that first trains on clean, high-confidence samples and progressively incorporates harder cases to strengthen model robustness. Architecturally, we employs a novel unified 3D-Resampler to efficiently encode the lip dynamics, which is critical for accurate interpretation. Experiments on purpose-built HI-Dialogue dataset show that HI-TransPA achieves state-of-the-art performance in both literal accuracy and semantic fidelity. Our work establishes a foundation for applying Omni-Models to assistive communication technology, providing an end-to-end modeling framework and essential processing tools for future research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes