CVMar 2, 2017

BoxCars: Improving Fine-Grained Recognition of Vehicles using 3-D Bounding Boxes in Traffic Surveillance

Jakub Sochor, Jakub Špaňhel, Adam Herout

arXiv:1703.00686v311.1140 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the problem of identifying specific vehicle models from varied viewpoints in surveillance, offering a domain-specific improvement.

The paper tackles fine-grained vehicle recognition in traffic surveillance by using 3-D bounding boxes to normalize viewpoints and augmenting training data, resulting in up to a 12% increase in accuracy and 50% error reduction compared to baseline CNNs.

In this paper, we focus on fine-grained recognition of vehicles mainly in traffic surveillance applications. We propose an approach that is orthogonal to recent advancements in fine-grained recognition (automatic part discovery and bilinear pooling). In addition, in contrast to other methods focused on fine-grained recognition of vehicles, we do not limit ourselves to a frontal/rear viewpoint, but allow the vehicles to be seen from any viewpoint. Our approach is based on 3-D bounding boxes built around the vehicles. The bounding box can be automatically constructed from traffic surveillance data. For scenarios where it is not possible to use precise construction, we propose a method for an estimation of the 3-D bounding box. The 3-D bounding box is used to normalize the image viewpoint by "unpacking" the image into a plane. We also propose to randomly alter the color of the image and add a rectangle with random noise to a random position in the image during the training of convolutional neural networks (CNNs). We have collected a large fine-grained vehicle data set BoxCars116k, with 116k images of vehicles from various viewpoints taken by numerous surveillance cameras. We performed a number of experiments, which show that our proposed method significantly improves CNN classification accuracy (the accuracy is increased by up to 12% points and the error is reduced by up to 50% compared with CNNs without the proposed modifications). We also show that our method outperforms the state-of-the-art methods for fine-grained recognition.

View on arXiv PDF Code

Similar