SD ASJun 3

nnAudio 2: Overcoming Dynamic Compilation Barriers and Transform Inconsistencies

Abhinaba Roy, Junyi Liang, Dorien Herremans

arXiv:2606.0539438.2Has Code

Predicted impact top 21% in SD · last 90 daysOriginality Synthesis-oriented

AI Analysis

This is an incremental update for researchers using nnAudio, resolving compatibility issues with current software stacks.

nnAudio, an audio feature extraction toolbox, was updated to fix TorchScript compilation failures in STFT/iSTFT and inverse-transform inconsistencies, ensuring robust differentiable audio analysis in modern PyTorch environments.

nnAudio is an open-source audio feature extraction toolbox for deep learning, but its use in current environments is hindered by TorchScript incompatibilities, inverse-transform edge cases, and dependency drift. We present a targeted modernization for modern PyTorch and scientific Python. We resolve TorchScript compilation failures in STFT and iSTFT by removing dynamic state mutation and module construction from scripted code paths and tightening argument handling in inverse-related helpers. We clarify inverse-STFT behavior by restricting reliable inversion to the uniform-bin setting (freq_scale=`no') and raising explicit runtime errors for unsupported frequency scales, preventing silently degraded reconstructions. We restore CFP compatibility with modern SciPy and ensure VQT reduces to CQT when gamma = 0. Regression tests cover the new STFT/iSTFT behaviors, and the updated codebase passes the full repository test suite in a modern Python environment. These improvements provide a more robust foundation for differentiable audio analysis in research and deployment.

View on arXiv PDF

Similar