Infant Cry Detection Using Causal Temporal Representation
This work addresses data scarcity in infant cry detection for infant care applications, but it is incremental as it builds on existing acoustic event detection methods.
The paper tackles infant cry detection in noisy environments by introducing an annotated dataset for cry segmentation that achieves state-of-the-art performance in supervised models, and proposes an unsupervised method, CRSTC, to address data scarcity, improving downstream classification.
This paper addresses a major challenge in acoustic event detection, in particular infant cry detection in the presence of other sounds and background noises: the lack of precise annotated data. We present two contributions for supervised and unsupervised infant cry detection. The first is an annotated dataset for cry segmentation, which enables supervised models to achieve state-of-the-art performance. Additionally, we propose a novel unsupervised method, Causal Representation Spare Transition Clustering (CRSTC), based on causal temporal representation, which helps address the issue of data scarcity more generally. By integrating the detected cry segments, we significantly improve the performance of downstream infant cry classification, highlighting the potential of this approach for infant care applications.