SD AI ASOct 10, 2025

Serial-Parallel Dual-Path Architecture for Speaking Style Recognition

Guojian Li, Qijie Shao, Zhixian Zhao, Shuiyuan Wang, Zhonghua Fu, Lei Xie

arXiv:2510.11732v14.0h-index: 7

Originality Incremental advance

AI Analysis

This work addresses the limitation of existing approaches that rely primarily on linguistic information, offering a more efficient and accurate method for recognizing speaking styles in speech processing.

The paper tackles the problem of speaking style recognition by proposing a serial-parallel dual-path architecture that integrates acoustic and linguistic information, resulting in an 88.4% reduction in parameter size and a 30.3% improvement in accuracy for eight styles compared to the baseline.

Speaking Style Recognition (SSR) identifies a speaker's speaking style characteristics from speech. Existing style recognition approaches primarily rely on linguistic information, with limited integration of acoustic information, which restricts recognition accuracy improvements. The fusion of acoustic and linguistic modalities offers significant potential to enhance recognition performance. In this paper, we propose a novel serial-parallel dual-path architecture for SSR that leverages acoustic-linguistic bimodal information. The serial path follows the ASR+STYLE serial paradigm, reflecting a sequential temporal dependency, while the parallel path integrates our designed Acoustic-Linguistic Similarity Module (ALSM) to facilitate cross-modal interaction with temporal simultaneity. Compared to the existing SSR baseline -- the OSUM model, our approach reduces parameter size by 88.4% and achieves a 30.3% improvement in SSR accuracy for eight styles on the test set.

View on arXiv PDF

Similar