SDLGMMASMar 5, 2023

Heterogeneous Graph Learning for Acoustic Event Classification

arXiv:2303.02665v26 citationsh-index: 21
Originality Incremental advance
AI Analysis

This work addresses the difficulty of modeling audiovisual data with heterogeneous graphs for acoustic event classification, offering an incremental improvement over existing methods.

The paper tackles the problem of manually constructing graphs for audiovisual data by proposing a parametric graph construction strategy for intra-modal edges and learning crossmodal edges, resulting in a state-of-the-art model achieving 0.53 mean average precision on AudioSet.

Heterogeneous graphs provide a compact, efficient, and scalable way to model data involving multiple disparate modalities. This makes modeling audiovisual data using heterogeneous graphs an attractive option. However, graph structure does not appear naturally in audiovisual data. Graphs for audiovisual data are constructed manually which is both difficult and sub-optimal. In this work, we address this problem by (i) proposing a parametric graph construction strategy for the intra-modal edges, and (ii) learning the crossmodal edges. To this end, we develop a new model, heterogeneous graph crossmodal network (HGCN) that learns the crossmodal edges. Our proposed model can adapt to various spatial and temporal scales owing to its parametric construction, while the learnable crossmodal edges effectively connect the relevant nodes across modalities. Experiments on a large benchmark dataset (AudioSet) show that our model is state-of-the-art (0.53 mean average precision), outperforming transformer-based models and other graph-based models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes