Learning Machines Implemented on Non-Deterministic Hardware
This work addresses hardware/software co-design for machine learning practitioners, but it appears incremental as it revisits existing stochastic circuit methods without clear SOTA advancements.
The paper tackles the problem of improving speed and energy efficiency in large-scale machine learning systems by deploying compute-intensive kernels onto non-deterministic hardware, using digital stochastic circuits to approximate matrix computations. As a proof-of-concept, it demonstrates training deep neural networks for image recognition with a stochastic hardware simulator, though no concrete performance numbers are provided.
This paper highlights new opportunities for designing large-scale machine learning systems as a consequence of blurring traditional boundaries that have allowed algorithm designers and application-level practitioners to stay -- for the most part -- oblivious to the details of the underlying hardware-level implementations. The hardware/software co-design methodology advocated here hinges on the deployment of compute-intensive machine learning kernels onto compute platforms that trade-off determinism in the computation for improvement in speed and/or energy efficiency. To achieve this, we revisit digital stochastic circuits for approximating matrix computations that are ubiquitous in machine learning algorithms. Theoretical and empirical evaluation is undertaken to assess the impact of the hardware-induced computational noise on algorithm performance. As a proof-of-concept, a stochastic hardware simulator is employed for training deep neural networks for image recognition problems.