QForce-RL: Quantized FPGA-Optimized Reinforcement Learning Compute Engine
This work addresses resource-constrained FPGA deployment for reinforcement learning, offering incremental improvements in efficiency for hardware acceleration.
The paper tackles the problem of high resource costs in FPGA deployment for reinforcement learning by proposing QForce-RL, which uses quantization and a lightweight architecture to achieve up to 2.3x performance enhancement and 2.6x better FPS compared to state-of-the-art methods without significant performance degradation.
Reinforcement Learning (RL) has outperformed other counterparts in sequential decision-making and dynamic environment control. However, FPGA deployment is significantly resource-expensive, as associated with large number of computations in training agents with high-quality images and possess new challenges. In this work, we propose QForce-RL takes benefits of quantization to enhance throughput and reduce energy footprint with light-weight RL architecture, without significant performance degradation. QForce-RL takes advantages from E2HRL to reduce overall RL actions to learn desired policy and QuaRL for quantization based SIMD for hardware acceleration. We have also provided detailed analysis for different RL environments, with emphasis on model size, parameters, and accelerated compute ops. The architecture is scalable for resource-constrained devices and provide parametrized efficient deployment with flexibility in latency, throughput, power, and energy efficiency. The proposed QForce-RL provides performance enhancement up to 2.3x and better FPS - 2.6x compared to SoTA works.