Satyam: Democratizing Groundtruth for Machine Vision
This work addresses the bottleneck of groundtruth collection for machine vision applications like autonomous driving and surveillance, making it more accessible, though it is incremental in automating existing crowdsourcing methods.
The paper tackles the problem of simplifying groundtruth collection for machine vision systems by introducing Satyam, a system that enables laypersons to launch tasks via Amazon Mechanical Turk with automated quality control and pricing, and demonstrates that the collected groundtruth is comparable to expert data and yields matching ML performance on benchmark datasets.
The democratization of machine learning (ML) has led to ML-based machine vision systems for autonomous driving, traffic monitoring, and video surveillance. However, true democratization cannot be achieved without greatly simplifying the process of collecting groundtruth for training and testing these systems. This groundtruth collection is necessary to ensure good performance under varying conditions. In this paper, we present the design and evaluation of Satyam, a first-of-its-kind system that enables a layperson to launch groundtruth collection tasks for machine vision with minimal effort. Satyam leverages a crowdtasking platform, Amazon Mechanical Turk, and automates several challenging aspects of groundtruth collection: creating and launching of custom web-UI tasks for obtaining the desired groundtruth, controlling result quality in the face of spammers and untrained workers, adapting prices to match task complexity, filtering spammers and workers with poor performance, and processing worker payments. We validate Satyam using several popular benchmark vision datasets, and demonstrate that groundtruth obtained by Satyam is comparable to that obtained from trained experts and provides matching ML performance when used for training.