MM CRApr 30

RoboKA: KAN Informed Multimodal Learning for RoboCall Surveillance System

Nitin Choudhury, Nikhil Kumar, Aditya Kumar Sinha, Abhijeet Anand, Hossein Salemi, Orchid Chetia Phukan, Hemant Purohit, Arun Balaji Buduru

arXiv:2605.0015624.3h-index: 12

Predicted impact top 87% in MM · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the lack of public datasets for robocall surveillance and provides a novel method for detecting adversarial robocalls, but the results are on synthetic data only.

The authors created Robo-SAr, a synthetic robocall dataset with ~1400 samples across three adversarial axes, and proposed RoboKA, a KAN-based multimodal fusion framework that outperforms baselines in recall and F1-score for robocall surveillance.

Wide exploration on robocall surveillance research is hindered due to limited access to public datasets, due to privacy concerns. In this work, we first curate Robo-SAr, a synthetic robocall dataset designed for robocall surveillance research. Robo-SAr comprises of ~200 unwanted and ~1200 legitimate synthetic robocall samples across three realistic adversarial axes: psycholinguistics-manipulated transcripts, emotion-eliciting speech, and cloned voices. We further propose RoboKA, a Kolmogorov-Arnold Network (KAN)-based multimodal fusion framework designed to model structured nonlinear interactions between acoustic and linguistic cues that characterize diverse adversarial robocall strategies. RoboKA first leverages cross-modal contrastive learning to align latent modality representations and feeds the resulting embeddings to a KAN-projection head for final classification. We benchmark RoboKA against strong unimodal and multimodal baselines in both in-domain and out-of-domain setups, finding RoboKA to surpass all baselines in terms of recall and F1-score.

View on arXiv PDF

Similar