CLSDASJun 4, 2025

A Novel Data Augmentation Approach for Automatic Speaking Assessment on Opinion Expressions

arXiv:2506.04077v2h-index: 1Slate
Originality Incremental advance
AI Analysis

This work addresses low-resource constraints in automated speaking assessment for opinion expressions, enabling more reliable scoring with cross-modal information, though it is incremental in its approach.

The paper tackles the problem of automated speaking assessment on opinion expressions by addressing the scarcity of labeled recordings, proposing a novel training paradigm that uses LLMs and text-to-speech synthesis to generate diverse responses, and achieves improved performance over methods using real data or conventional augmentation on the LTTC dataset.

Automated speaking assessment (ASA) on opinion expressions is often hampered by the scarcity of labeled recordings, which restricts prompt diversity and undermines scoring reliability. To address this challenge, we propose a novel training paradigm that leverages a large language models (LLM) to generate diverse responses of a given proficiency level, converts responses into synthesized speech via speaker-aware text-to-speech synthesis, and employs a dynamic importance loss to adaptively reweight training instances based on feature distribution differences between synthesized and real speech. Subsequently, a multimodal large language model integrates aligned textual features with speech signals to predict proficiency scores directly. Experiments conducted on the LTTC dataset show that our approach outperforms methods relying on real data or conventional augmentation, effectively mitigating low-resource constraints and enabling ASA on opinion expressions with cross-modal information.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes