CV LGJun 28, 2021

Efficient Realistic Data Generation Framework leveraging Deep Learning-based Human Digitization

C. Symeonidis, P. Nousi, P. Tosidis, K. Tsampazis, N. Passalis, A. Tefas, N. Nikolaidis

arXiv:2106.15409v2Has Code

Originality Incremental advance

AI Analysis

This addresses data scarcity and privacy issues for researchers and developers in computer vision, though it is incremental as it builds on existing synthetic data generation methods.

The paper tackles the problem of costly and privacy-restricted data collection for human-centric computer vision tasks by proposing a framework that automatically generates realistic synthetic data with annotations for person detection, face recognition, and human pose estimation, showing that synthetic data can effectively supplement real data in benchmarking.

The performance of supervised deep learning algorithms depends significantly on the scale, quality and diversity of the data used for their training. Collecting and manually annotating large amount of data can be both time-consuming and costly tasks to perform. In the case of tasks related to visual human-centric perception, the collection and distribution of such data may also face restrictions due to legislation regarding privacy. In addition, the design and testing of complex systems, e.g., robots, which often employ deep learning-based perception models, may face severe difficulties as even state-of-the-art methods trained on real and large-scale datasets cannot always perform adequately due to not having been adapted to the visual differences between the virtual and the real world data. As an attempt to tackle and mitigate the effect of these issues, we present a method that automatically generates realistic synthetic data with annotations for a) person detection, b) face recognition, and c) human pose estimation. The proposed method takes as input real background images and populates them with human figures in various poses. Instead of using hand-made 3D human models, we propose the use of models generated through deep learning methods, further reducing the dataset creation costs, while maintaining a high level of realism. In addition, we provide open-source and easy to use tools that implement the proposed pipeline, allowing for generating highly-realistic synthetic datasets for a variety of tasks. A benchmarking and evaluation in the corresponding tasks shows that synthetic data can be effectively used as a supplement to real data.

View on arXiv PDF

Similar