Kemal Altwlkany

h-index10
2papers

2 Papers

SDOct 28, 2024
Knowledge Distillation for Real-Time Classification of Early Media in Voice Communications

Kemal Altwlkany, Hadžem Hadžić, Amar Kurić et al.

This paper investigates the industrial setting of real-time classification of early media exchanged during the initialization phase of voice calls. We explore the application of state-of-the-art audio tagging models and highlight some limitations when applied to the classification of early media. While most existing approaches leverage convolutional neural networks, we propose a novel approach for low-resource requirements based on gradient-boosted trees. Our approach not only demonstrates a substantial improvement in runtime performance, but also exhibits a comparable accuracy. We show that leveraging knowledge distillation and class aggregation techniques to train a simpler and smaller model accelerates the classification of early media in voice calls. We provide a detailed analysis of the results on a proprietary and publicly available dataset, regarding accuracy and runtime performance. We additionally report a case study of the achieved performance improvements at a regional data center in India.

SDAug 15, 2025
Pretrained Conformers for Audio Fingerprinting and Retrieval

Kemal Altwlkany, Elmedin Selmanovic, Sead Delalic

Conformers have shown great results in speech processing due to their ability to capture both local and global interactions. In this work, we utilize a self-supervised contrastive learning framework to train conformer-based encoders that are capable of generating unique embeddings for small segments of audio, generalizing well to previously unseen data. We achieve state-of-the-art results for audio retrieval tasks while using only 3 seconds of audio to generate embeddings. Our models are almost completely immune to temporal misalignments and achieve state-of-the-art results in cases of other audio distortions such as noise, reverb or extreme temporal stretching. Code and models are made publicly available and the results are easy to reproduce as we train and test using popular and freely available datasets of different sizes.