AS CL LG SDSep 18, 2025

Breathing and Semantic Pause Detection and Exertion-Level Classification in Post-Exercise Speech

Yuyu Wang, Wuyue Xia, Huaxiu Yao, Jingping Nie

arXiv:2509.15473v12.32 citationsh-index: 8Proceedings of the 3rd ACM International Workshop on Intelligent Acoustic Systems and Applications

Originality Synthesis-oriented

AI Analysis

This work addresses the limited research on pause detection in post-exercise speech, which is important for assessing recovery and lung function, but it is incremental as it builds on existing datasets and models.

The paper tackled the problem of detecting breathing and semantic pauses and classifying exertion levels in post-exercise speech, achieving up to 89% accuracy for semantic pauses and 90.5% for exertion-level classification.

Post-exercise speech contains rich physiological and linguistic cues, often marked by semantic pauses, breathing pauses, and combined breathing-semantic pauses. Detecting these events enables assessment of recovery rate, lung function, and exertion-related abnormalities. However, existing works on identifying and distinguishing different types of pauses in this context are limited. In this work, building on a recently released dataset with synchronized audio and respiration signals, we provide systematic annotations of pause types. Using these annotations, we systematically conduct exploratory breathing and semantic pause detection and exertion-level classification across deep learning models (GRU, 1D CNN-LSTM, AlexNet, VGG16), acoustic features (MFCC, MFB), and layer-stratified Wav2Vec2 representations. We evaluate three setups-single feature, feature fusion, and a two-stage detection-classification cascade-under both classification and regression formulations. Results show per-type detection accuracy up to 89$\%$ for semantic, 55$\%$ for breathing, 86$\%$ for combined pauses, and 73$\%$overall, while exertion-level classification achieves 90.5$\%$ accuracy, outperformin prior work.

View on arXiv PDF

Similar