High-Performance FPGA Implementation of Equivariant Adaptive Separation via Independence Algorithm for Independent Component Analysis
This work addresses the hardware inefficiency of adaptive ICA for machine learning applications like Bayesian neural networks, offering a domain-specific improvement.
The paper tackles the slow convergence and low clock frequency of adaptive Independent Component Analysis (ICA) algorithms by proposing a new algorithm that enables efficient hardware implementation, achieving at least a 10x improvement in clock frequency and a 100x improvement in throughput on FPGA.
Independent Component Analysis (ICA) is a dimensionality reduction technique that can boost efficiency of machine learning models that deal with probability density functions, e.g. Bayesian neural networks. Algorithms that implement adaptive ICA converge slower than their nonadaptive counterparts, however, they are capable of tracking changes in underlying distributions of input features. This intrinsically slow convergence of adaptive methods combined with existing hardware implementations that operate at very low clock frequencies necessitate fundamental improvements in both algorithm and hardware design. This paper presents an algorithm that allows efficient hardware implementation of ICA. Compared to previous work, our FPGA implementation of adaptive ICA improves clock frequency by at least one order of magnitude and throughput by at least two orders of magnitude. Our proposed algorithm is not limited to ICA and can be used in various machine learning problems that use stochastic gradient descent optimization.