Playing for Benchmarks
This provides a standardized dataset for researchers in computer vision to evaluate and compare state-of-the-art methods across diverse tasks, though it is incremental as it builds on existing benchmark concepts with new data and collection techniques.
The authors tackled the lack of comprehensive benchmarks for visual perception by creating a benchmark suite based on over 250K annotated video frames from a virtual world, covering multiple tasks like optical flow and object detection, and validated its realism through statistical analyses and perceptual experiments.
We present a benchmark suite for visual perception. The benchmark is based on more than 250K high-resolution video frames, all annotated with ground-truth data for both low-level and high-level vision tasks, including optical flow, semantic instance segmentation, object detection and tracking, object-level 3D scene layout, and visual odometry. Ground-truth data for all tasks is available for every frame. The data was collected while driving, riding, and walking a total of 184 kilometers in diverse ambient conditions in a realistic virtual world. To create the benchmark, we have developed a new approach to collecting ground-truth data from simulated worlds without access to their source code or content. We conduct statistical analyses that show that the composition of the scenes in the benchmark closely matches the composition of corresponding physical environments. The realism of the collected data is further validated via perceptual experiments. We analyze the performance of state-of-the-art methods for multiple tasks, providing reference baselines and highlighting challenges for future research. The supplementary video can be viewed at https://youtu.be/T9OybWv923Y