SDAINov 14, 2025

MSMT-FN: Multi-segment Multi-task Fusion Network for Marketing Audio Classification

arXiv:2511.11006v1h-index: 8ADMA
Originality Incremental advance
AI Analysis

This addresses a business demand for analyzing customer attitudes in marketing phone calls, but it appears incremental as it builds on existing audio classification methods.

The paper tackles the problem of efficiently categorizing customer purchasing propensity from marketing audio data by proposing the MSMT-FN model, which consistently outperforms or matches state-of-the-art methods on proprietary and established benchmarks.

Audio classification plays an essential role in sentiment analysis and emotion recognition, especially for analyzing customer attitudes in marketing phone calls. Efficiently categorizing customer purchasing propensity from large volumes of audio data remains challenging. In this work, we propose a novel Multi-Segment Multi-Task Fusion Network (MSMT-FN) that is uniquely designed for addressing this business demand. Evaluations conducted on our proprietary MarketCalls dataset, as well as established benchmarks (CMU-MOSI, CMU-MOSEI, and MELD), show MSMT-FN consistently outperforms or matches state-of-the-art methods. Additionally, our newly curated MarketCalls dataset will be available upon request, and the code base is made accessible at GitHub Repository MSMT-FN, to facilitate further research and advancements in audio classification domain.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes