ASLGSDMay 6, 2021

Point Cloud Audio Processing

arXiv:2105.02469v25 citations
AI Analysis

This work addresses a bottleneck in audio processing for researchers and practitioners by enabling more flexible and efficient models, though it is incremental as it adapts existing point cloud methods to audio.

The paper tackles the problem of audio machine learning models being constrained by fixed-dimensional input representations, which limits their adaptability to different sampling rates or representations. It introduces a point cloud approach that achieves invariance to representation parameters and results in smaller models with minimal performance loss from input subsampling.

Most audio processing pipelines involve transformations that act on fixed-dimensional input representations of audio. For example, when using the Short Time Fourier Transform (STFT) the DFT size specifies a fixed dimension for the input representation. As a consequence, most audio machine learning models are designed to process fixed-size vector inputs which often prohibits the repurposing of learned models on audio with different sampling rates or alternative representations. We note, however, that the intrinsic spectral information in the audio signal is invariant to the choice of the input representation or the sampling rate. Motivated by this, we introduce a novel way of processing audio signals by treating them as a collection of points in feature space, and we use point cloud machine learning models that give us invariance to the choice of representation parameters, such as DFT size or the sampling rate. Additionally, we observe that these methods result in smaller models, and allow us to significantly subsample the input representation with minimal effects to a trained model performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes