SDAIMMASOct 28, 2024

Knowledge Distillation for Real-Time Classification of Early Media in Voice Communications

arXiv:2410.21478v11 citationsh-index: 1MASCOTS
Originality Incremental advance
AI Analysis

This addresses the industrial need for efficient early media classification in voice communications, though it is incremental as it builds on existing knowledge distillation and gradient-boosting techniques.

This paper tackles the problem of real-time classification of early media in voice calls by proposing a novel approach using gradient-boosted trees with knowledge distillation and class aggregation, which achieves comparable accuracy while substantially improving runtime performance, as demonstrated on proprietary and public datasets including a case study in India.

This paper investigates the industrial setting of real-time classification of early media exchanged during the initialization phase of voice calls. We explore the application of state-of-the-art audio tagging models and highlight some limitations when applied to the classification of early media. While most existing approaches leverage convolutional neural networks, we propose a novel approach for low-resource requirements based on gradient-boosted trees. Our approach not only demonstrates a substantial improvement in runtime performance, but also exhibits a comparable accuracy. We show that leveraging knowledge distillation and class aggregation techniques to train a simpler and smaller model accelerates the classification of early media in voice calls. We provide a detailed analysis of the results on a proprietary and publicly available dataset, regarding accuracy and runtime performance. We additionally report a case study of the achieved performance improvements at a regional data center in India.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes