Scanner: Efficient Video Analysis at Scale
This addresses the problem of efficient and productive video analysis at scale for researchers and developers in visual computing, enabling new applications with big video data, though it is incremental as it builds on existing systems and hardware optimizations.
The authors tackled the challenge of scaling video analysis applications to large datasets by developing Scanner, a system that organizes video collections in an optimized data store and executes pixel processing computations on heterogeneous hardware, achieving near-expert performance on a single machine and scaling efficiently to hundreds of machines, reducing long-running tasks to minutes or hours.
A growing number of visual computing applications depend on the analysis of large video collections. The challenge is that scaling applications to operate on these datasets requires efficient systems for pixel data access and parallel processing across large numbers of machines. Few programmers have the capability to operate efficiently at these scales, limiting the field's ability to explore new applications that leverage big video data. In response, we have created Scanner, a system for productive and efficient video analysis at scale. Scanner organizes video collections as tables in a data store optimized for sampling frames from compressed video, and executes pixel processing computations, expressed as dataflow graphs, on these frames. Scanner schedules video analysis applications expressed using these abstractions onto heterogeneous throughput computing hardware, such as multi-core CPUs, GPUs, and media processing ASICs, for high-throughput pixel processing. We demonstrate the productivity of Scanner by authoring a variety of video processing applications including the synthesis of stereo VR video streams from multi-camera rigs, markerless 3D human pose reconstruction from video, and data-mining big video datasets such as hundreds of feature-length films or over 70,000 hours of TV news. These applications achieve near-expert performance on a single machine and scale efficiently to hundreds of machines, enabling formerly long-running big video data analysis tasks to be carried out in minutes to hours.