SDLGASJul 28, 2025

Combolutional Neural Networks

arXiv:2507.21202v1h-index: 46WASPAA
Originality Incremental advance
AI Analysis

This addresses the need for efficient and interpretable audio frontends in machine learning, particularly for tasks requiring precise harmonic analysis, though it appears incremental as it builds on existing convolutional methods.

The paper tackled the problem of selecting inductive biases for audio machine learning by proposing a combolutional layer, a learned-delay IIR comb filter and fused envelope detector, which extracts harmonic features in the time domain; the result showed it is an effective replacement for convolutional layers in tasks like piano transcription, speaker classification, and key detection, with benefits including low parameter count, efficient CPU inference, real-valued computations, and improved interpretability.

Selecting appropriate inductive biases is an essential step in the design of machine learning models, especially when working with audio, where even short clips may contain millions of samples. To this end, we propose the combolutional layer: a learned-delay IIR comb filter and fused envelope detector, which extracts harmonic features in the time domain. We demonstrate the efficacy of the combolutional layer on three information retrieval tasks, evaluate its computational cost relative to other audio frontends, and provide efficient implementations for training. We find that the combolutional layer is an effective replacement for convolutional layers in audio tasks where precise harmonic analysis is important, e.g., piano transcription, speaker classification, and key detection. Additionally, the combolutional layer has several other key benefits over existing frontends, namely: low parameter count, efficient CPU inference, strictly real-valued computations, and improved interpretability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes