SD AI LG ASNov 4, 2023

TACNET: Temporal Audio Source Counting Network

Amirreza Ahmadnejad, Ahmad Mahmmodian Darviishani, Mohmmad Mehrdad Asadi, Sajjad Saffariyeh, Pedram Yousef, Emad Fatemizadeh

arXiv:2311.02369v15.84 citationsh-index: 3

Originality Incremental advance

AI Analysis

This provides a state-of-the-art solution for audio source counting tasks, with cross-lingual applications, though it appears incremental as it builds on existing methods for a known bottleneck.

The paper tackles the problem of counting audio sources in real-time by introducing TaCNet, which operates directly on raw audio without preprocessing and achieves an average accuracy of 74.18% across 11 classes on the LibriCount dataset.

In this paper, we introduce the Temporal Audio Source Counting Network (TaCNet), an innovative architecture that addresses limitations in audio source counting tasks. TaCNet operates directly on raw audio inputs, eliminating complex preprocessing steps and simplifying the workflow. Notably, it excels in real-time speaker counting, even with truncated input windows. Our extensive evaluation, conducted using the LibriCount dataset, underscores TaCNet's exceptional performance, positioning it as a state-of-the-art solution for audio source counting tasks. With an average accuracy of 74.18 percentage over 11 classes, TaCNet demonstrates its effectiveness across diverse scenarios, including applications involving Chinese and Persian languages. This cross-lingual adaptability highlights its versatility and potential impact.

View on arXiv PDF

Similar