A Perceptually Motivated Filter Bank with Perfect Reconstruction for Audio Signal Processing
This work addresses a specific bottleneck in audio applications like analysis and synthesis, offering incremental improvements in filter bank design for better signal fidelity.
The paper tackled the problem of perceptually motivated filter banks (FBs) in audio signal processing, which often have poor reconstruction and resistance to processing, by introducing an oversampled FB that enables perfect reconstruction, adaptable redundancy, and efficient design, achieving lower reconstruction errors and better resistance compared to the gammatone FB, especially at low redundancies.
Many audio applications rely on filter banks (FBs) to analyze, process, and re-synthesize sounds. To approximate the auditory frequency resolution in the signal chain, some applications rely on perceptually motivated FBs, the gammatone FB being a popular example. However, most perceptually motivated FBs only allow partial signal reconstruction at high redundancies and/or do not have good resistance to sub-channel processing. This paper introduces an oversampled perceptually motivated FB enabling perfect reconstruction, efficient FB design, and adaptable redundancy. The filters are directly constructed in the frequency domain and linearly distributed on a perceptual frequency scale (e.g. ERB, Bark, or Mel scale). The proposed design allows for various filter shapes, uniform or non-uniform FB setting, and large down-sampling factors. For redundancies $\geq$ 3 perfect reconstruction is achieved by computing the canonical dual FB analytically. For lower redundancies perfect reconstruction is achieved using an iterative method. Experiments show performance improvements of the proposed approach when compared to the gammatone FB in terms of reconstruction error and resistance to sub-channel processing, especially at low redundancies.