CLSDASMar 2, 2018

Age Group Classification with Speech and Metadata Multimodality Fusion

arXiv:1803.00721v11086 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of customizing TV experiences for children, though it appears incremental as it builds on existing methods with added metadata.

The paper tackled the problem of identifying children in TV audiences from short audio commands by combining speech with user metadata, achieving a 9.2% absolute improvement over the baseline for state-of-the-art performance.

Children comprise a significant proportion of TV viewers and it is worthwhile to customize the experience for them. However, identifying who is a child in the audience can be a challenging task. Identifying gender and age from audio commands is a well-studied problem but is still very challenging to get good accuracy when the utterances are typically only a couple of seconds long. We present initial studies of a novel method which combines utterances with user metadata. In particular, we develop an ensemble of different machine learning techniques on different subsets of data to improve child detection. Our initial results show a 9.2\% absolute improvement over the baseline, leading to a state-of-the-art performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes