Efficient Majority Voting in Digital Hardware
This work addresses the computational inefficiency of majority voting in real-time systems, offering a domain-specific improvement for hardware accelerators in ensemble learning.
The paper tackles the bottleneck of majority voting in hardware-accelerated ensemble learning by introducing a novel architecture that reduces decision time to logarithmic clock cycles, achieving over seven million images per second in handwritten digit recognition on an FPGA.
In recent years, machine learning methods became increasingly important for a manifold number of applications. However, they often suffer from high computational requirements impairing their efficient use in real-time systems, even when employing dedicated hardware accelerators. Ensemble learning methods are especially suitable for hardware acceleration since they can be constructed from individual learners of low complexity and thus offer large parallelization potential. For classification, the outputs of these learners are typically combined by majority voting, which often represents the bottleneck of a hardware accelerator for ensemble inference. In this work, we present a novel architecture that allows obtaining a majority decision in a number of clock cycles that is logarithmic in the number of inputs. We show, that for the example application of handwritten digit recognition a random forest processing engine employing this majority decision architecture implemented on an FPGA allows the classification of more than seven million images per second.