CV IVJun 12, 2019

Visual Wake Words Dataset

Aakanksha Chowdhery, Pete Warden, Jonathon Shlens, Andrew Howard, Rocky Rhodes

arXiv:1906.05721v120.2122 citations

Originality Synthesis-oriented

AI Analysis

This addresses the problem of enabling intelligent IoT applications on resource-constrained microcontrollers, though it is incremental as it provides a new dataset rather than a novel method.

The authors tackled the challenge of deploying computer vision on microcontrollers with limited memory by introducing the Visual Wake Words dataset, which benchmarks tiny vision models for person detection, achieving 85-90% accuracy within a 250 KB memory footprint.

The emergence of Internet of Things (IoT) applications requires intelligence on the edge. Microcontrollers provide a low-cost compute platform to deploy intelligent IoT applications using machine learning at scale, but have extremely limited on-chip memory and compute capability. To deploy computer vision on such devices, we need tiny vision models that fit within a few hundred kilobytes of memory footprint in terms of peak usage and model size on device storage. To facilitate the development of microcontroller friendly models, we present a new dataset, Visual Wake Words, that represents a common microcontroller vision use-case of identifying whether a person is present in the image or not, and provides a realistic benchmark for tiny vision models. Within a limited memory footprint of 250 KB, several state-of-the-art mobile models achieve accuracy of 85-90% on the Visual Wake Words dataset. We anticipate the proposed dataset will advance the research on tiny vision models that can push the pareto-optimal boundary in terms of accuracy versus memory usage for microcontroller applications.

View on arXiv PDF

Similar