CVDec 15, 2020

Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data Augmentation

arXiv:2012.08055v20.10Has Code
AI Analysis70

This work addresses a critical gap in autonomous driving perception by enabling fine-grained understanding of vehicle part states, which is an incremental but important step for ensuring the safety and interaction capabilities of self-driving vehicles.

This paper tackles the problem of fine-grained vehicle perception, specifically understanding the dynamics and states of vehicle parts like doors, trunk, and bonnet, which are crucial for autonomous driving safety. The authors propose an automatic 3D part-guided visual data augmentation method to generate a large dataset of vehicles in uncommon states (VUS) and human-vehicle interaction (VHI) scenarios. Their approach achieves over 8% improvement in 2D detection and instance segmentation compared to baseline methods.

Holistically understanding an object and its 3D movable parts through visual perception models is essential for enabling an autonomous agent to interact with the world. For autonomous driving, the dynamics and states of vehicle parts such as doors, the trunk, and the bonnet can provide meaningful semantic information and interaction states, which are essential to ensuring the safety of the self-driving vehicle. Existing visual perception models mainly focus on coarse parsing such as object bounding box detection or pose estimation and rarely tackle these situations. In this paper, we address this important autonomous driving problem by solving three critical issues. First, to deal with data scarcity, we propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images before reconstructing human-vehicle interaction (VHI) scenarios. Our approach is fully automatic without any human interaction, which can generate a large number of vehicles in uncommon states (VUS) for training deep neural networks (DNNs). Second, to perform fine-grained vehicle perception, we present a multi-task network for VUS parsing and a multi-stream network for VHI parsing. Third, to quantitatively evaluate the effectiveness of our data augmentation approach, we build the first VUS dataset in real traffic scenarios (e.g., getting on/out or placing/removing luggage). Experimental results show that our approach advances other baseline methods in 2D detection and instance segmentation by a big margin (over 8%). In addition, our network yields large improvements in discovering and understanding these uncommon cases. Moreover, we have released the source code, the dataset, and the trained model on Github (https://github.com/zongdai/EditingForDNN).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes