Mobile Machine Learning Hardware at ARM: A Systems-on-Chip (SoC) Perspective
This addresses efficiency issues for mobile hardware designers in applications like AR/VR and ADAS, but it is incremental as it builds on existing SoC optimization concepts.
The paper tackles the problem of suboptimal efficiency in mobile machine learning by arguing that hardware architects should optimize at the entire Systems-on-Chip (SoC) level rather than focusing only on ML accelerators, and demonstrates this through a case-study in continuous computer vision that achieves optimal system-level efficiency.
Machine learning is playing an increasingly significant role in emerging mobile application domains such as AR/VR, ADAS, etc. Accordingly, hardware architects have designed customized hardware for machine learning algorithms, especially neural networks, to improve compute efficiency. However, machine learning is typically just one processing stage in complex end-to-end applications, involving multiple components in a mobile Systems-on-a-chip (SoC). Focusing only on ML accelerators loses bigger optimization opportunity at the system (SoC) level. This paper argues that hardware architects should expand the optimization scope to the entire SoC. We demonstrate one particular case-study in the domain of continuous computer vision where camera sensor, image signal processor (ISP), memory, and NN accelerator are synergistically co-designed to achieve optimal system-level efficiency.