Learning from THEODORE: A Synthetic Omnidirectional Top-View Indoor Dataset for Deep Transfer Learning
This work addresses the problem of data scarcity for omnidirectional indoor object detection, particularly for researchers and practitioners in computer vision, though it is incremental as it builds on existing synthetic data methods.
The authors tackled the lack of large-scale synthetic omnidirectional indoor datasets by introducing THEODORE, a dataset with 100,000 high-resolution fisheye images and 14 classes, and demonstrated its effectiveness by achieving an AP of up to 0.84 for person detection on a real-world dataset through fine-tuning.
Recent work about synthetic indoor datasets from perspective views has shown significant improvements of object detection results with Convolutional Neural Networks(CNNs). In this paper, we introduce THEODORE: a novel, large-scale indoor dataset containing 100,000 high-resolution diversified fisheye images with 14 classes. To this end, we create 3D virtual environments of living rooms, different human characters and interior textures. Beside capturing fisheye images from virtual environments we create annotations for semantic segmentation, instance masks and bounding boxes for object detection tasks. We compare our synthetic dataset to state of the art real-world datasets for omnidirectional images. Based on MS COCO weights, we show that our dataset is well suited for fine-tuning CNNs for object detection. Through a high generalization of our models by means of image synthesis and domain randomization, we reach an AP up to 0.84 for class person on High-Definition Analytics dataset.