SDAIASOct 10, 2025

Serial-Parallel Dual-Path Architecture for Speaking Style Recognition

arXiv:2510.11732v1h-index: 7
Originality Incremental advance
AI Analysis

This work addresses the limitation of existing approaches that rely primarily on linguistic information, offering a more efficient and accurate method for recognizing speaking styles in speech processing.

The paper tackles the problem of speaking style recognition by proposing a serial-parallel dual-path architecture that integrates acoustic and linguistic information, resulting in an 88.4% reduction in parameter size and a 30.3% improvement in accuracy for eight styles compared to the baseline.

Speaking Style Recognition (SSR) identifies a speaker's speaking style characteristics from speech. Existing style recognition approaches primarily rely on linguistic information, with limited integration of acoustic information, which restricts recognition accuracy improvements. The fusion of acoustic and linguistic modalities offers significant potential to enhance recognition performance. In this paper, we propose a novel serial-parallel dual-path architecture for SSR that leverages acoustic-linguistic bimodal information. The serial path follows the ASR+STYLE serial paradigm, reflecting a sequential temporal dependency, while the parallel path integrates our designed Acoustic-Linguistic Similarity Module (ALSM) to facilitate cross-modal interaction with temporal simultaneity. Compared to the existing SSR baseline -- the OSUM model, our approach reduces parameter size by 88.4% and achieves a 30.3% improvement in SSR accuracy for eight styles on the test set.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes