Exploring Parallelism in FPGA-Based Accelerators for Machine Learning Applications
This work addresses the need for faster training in machine learning applications, particularly for hardware acceleration on FPGAs, but it is incremental as it applies an existing technique to a specific dataset and platform.
The paper tackled the problem of accelerating neural network training by implementing speculative backpropagation on the MNIST dataset using OpenMP, achieving a maximum speedup of 24% in execution time and 35% in step execution time while maintaining accuracy within 3-4% of the baseline.
Speculative backpropagation has emerged as a promising technique to accelerate the training of neural networks by overlapping the forward and backward passes. Leveraging speculative weight updates when error gradients fall within a specific threshold reduces training time without substantially compromising accuracy. In this work, we implement speculative backpropagation on the MNIST dataset using OpenMP as the parallel programming platform. OpenMP's multi-threading capabilities enable simultaneous execution of forward and speculative backpropagation steps, significantly improving training speed. The application is planned for synthesis on a state-of-the-art FPGA to demonstrate its potential for hardware acceleration. Our CPU-based experimental results demonstrate that speculative backpropagation achieves a maximum speedup of 24% in execution time when using a threshold of 0.25, and accuracy remaining within 3-4% of the baseline across various epochs. Additionally, when comparing individual step execution time, speculative backpropagation yields a maximum speedup of 35% over the baseline, demonstrating the effectiveness of overlapping forward and backward passes.