SDAIDec 18, 2025

Domain-Agnostic Causal-Aware Audio Transformer for Infant Cry Classification

arXiv:2512.16271v1h-index: 212025 8th International Conference of Computer and Informatics Engineering (IC2IE)
Originality Incremental advance
AI Analysis

This work addresses the need for reliable and interpretable infant cry analysis for neonatal distress detection, offering incremental improvements in robustness and generalization for clinical applications.

The paper tackled the problem of robust infant cry classification by proposing DACH-TIC, a domain-agnostic causal-aware audio transformer, which improved accuracy by 2.6% and macro-F1 by 2.2 points over baselines and reduced domain performance gaps to 2.4%.

Accurate and interpretable classification of infant cry paralinguistics is essential for early detection of neonatal distress and clinical decision support. However, many existing deep learning methods rely on correlation-driven acoustic representations, which makes them vulnerable to noise, spurious cues, and domain shifts across recording environments. We propose DACH-TIC, a Domain-Agnostic Causal-Aware Hierarchical Audio Transformer for robust infant cry classification. The model integrates causal attention, hierarchical representation learning, multi-task supervision, and adversarial domain generalization within a unified framework. DACH-TIC employs a structured transformer backbone with local token-level and global semantic encoders, augmented by causal attention masking and controlled perturbation training to approximate counterfactual acoustic variations. A domain-adversarial objective promotes environment-invariant representations, while multi-task learning jointly optimizes cry type recognition, distress intensity estimation, and causal relevance prediction. The model is evaluated on the Baby Chillanto and Donate-a-Cry datasets, with ESC-50 environmental noise overlays for domain augmentation. Experimental results show that DACH-TIC outperforms state-of-the-art baselines, including HTS-AT and SE-ResNet Transformer, achieving improvements of 2.6 percent in accuracy and 2.2 points in macro-F1 score, alongside enhanced causal fidelity. The model generalizes effectively to unseen acoustic environments, with a domain performance gap of only 2.4 percent, demonstrating its suitability for real-world neonatal acoustic monitoring systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes