Independent Vector Analysis via Log-Quadratically Penalized Quadratic Minimization
This is an incremental improvement for blind source separation tasks, particularly in speech processing, offering faster convergence without sacrificing performance.
The paper tackles blind source separation by proposing a new independent vector analysis algorithm called AuxIVA-IPA, which updates demixing filters with a log-quadratically penalized quadratic minimization problem, achieving up to 8.5 times faster runtime compared to existing methods.
We propose a new algorithm for blind source separation (BSS) using independent vector analysis (IVA). This is an improvement over the popular auxiliary function based IVA (AuxIVA) with iterative projection (IP) or iterative source steering (ISS). We introduce iterative projection with adjustment (IPA), where we update one demixing filter and jointly adjust all the other sources along its current direction. Each update involves solving a non-convex minimization problem that we term log-quadratically penalized quadratic minimization (LQPQM), that we think is of interest beyond this work. In the general case, we show that its global minimum corresponds to the largest root of a univariate function, reminiscent of modified eigenvalue problems. We propose a simple procedure based on Newton-Raphson to efficiently compute it. Numerical experiments demonstrate the effectiveness of the proposed method. First, we show that it efficiently decreases the value of the surrogate function. In further experiments on synthetic mixtures, we study the probability of finding the true demixing matrix and convergence speed. We show that the proposed method combines high success rate and fast convergence. Finally, we validate the performance on a reverberant blind speech separation task. We find that all the AuxIVA-based methods perform similarly in terms of acoustic BSS metrics. However, AuxIVA-IPA converges faster. We measure up to 8.5 times speed-up in terms of runtime compared to the next best AuxIVA-based method, depending on the number of channels and the signal-to-noise ratio (SNR).