CLSDASApr 12, 2021

CNN Encoding of Acoustic Parameters for Prominence Detection

arXiv:2104.05488v31 citations
Originality Synthesis-oriented
AI Analysis

This work addresses prominence detection for evaluating children's reading fluency, but it appears incremental as it builds on existing methods with new architectures.

The paper tackled the problem of detecting prominent words in children's oral reading using acoustic and linguistic features, replacing a random forest with an RNN and deep learning for feature extraction, but no concrete performance numbers were provided.

Expressive reading, considered the defining attribute of oral reading fluency, comprises the prosodic realization of phrasing and prominence. In the context of evaluating oral reading, it helps to establish the speaker's comprehension of the text. We consider a labeled dataset of children's reading recordings for the speaker-independent detection of prominent words using acoustic-prosodic and lexico-syntactic features. A previous well-tuned random forest ensemble predictor is replaced by an RNN sequence classifier to exploit potential context dependency across the longer utterance. Further, deep learning is applied to obtain word-level features from low-level acoustic contours of fundamental frequency, intensity and spectral shape in an end-to-end fashion. Performance comparisons are presented across the different feature types and across different feature learning architectures for prominent word prediction to draw insights wherever possible.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes