AS CLJul 21, 2023

Topic Identification For Spontaneous Speech: Enriching Audio Features With Embedded Linguistic Information

Dejan Porjazovski, Tamás Grósz, Mikko Kurimo

arXiv:2307.11450v11.21 citationsh-index: 35Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of degraded performance in topic identification for spontaneous speech in low-resource settings, offering incremental improvements over traditional text-based methods.

The paper tackled topic identification from spontaneous speech in low-resource scenarios by comparing audio-only and hybrid models, finding that hybrid multi-modal solutions achieved the best results while audio-only methods were viable when ASR was unavailable.

Traditional topic identification solutions from audio rely on an automatic speech recognition system (ASR) to produce transcripts used as input to a text-based model. These approaches work well in high-resource scenarios, where there are sufficient data to train both components of the pipeline. However, in low-resource situations, the ASR system, even if available, produces low-quality transcripts, leading to a bad text-based classifier. Moreover, spontaneous speech containing hesitations can further degrade the performance of the ASR model. In this paper, we investigate alternatives to the standard text-only solutions by comparing audio-only and hybrid techniques of jointly utilising text and audio features. The models evaluated on spontaneous Finnish speech demonstrate that purely audio-based solutions are a viable option when ASR components are not available, while the hybrid multi-modal solutions achieve the best results.

View on arXiv PDF Code

Similar