CVApr 27, 2020

Localizing Grouped Instances for Efficient Detection in Low-Resource Scenarios

arXiv:2004.12623v12.31 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses computational efficiency for object detection in resource-constrained applications like remote sensing, though it is incremental as it builds on existing cascade and grouping ideas.

The paper tackles object detection in low-resource scenarios like remote sensing, where images contain few small objects of a single class scattered across large areas, by proposing a multi-stage detection scheme that predicts groups and individuals to save computation. It shows the method is as accurate as standard detectors while being more efficient across three backbone architectures.

State-of-the-art detection systems are generally evaluated on their ability to exhaustively retrieve objects densely distributed in the image, across a wide variety of appearances and semantic categories. Orthogonal to this, many real-life object detection applications, for example in remote sensing, instead require dealing with large images that contain only a few small objects of a single class, scattered heterogeneously across the space. In addition, they are often subject to strict computational constraints, such as limited battery capacity and computing power. To tackle these more practical scenarios, we propose a novel flexible detection scheme that efficiently adapts to variable object sizes and densities: We rely on a sequence of detection stages, each of which has the ability to predict groups of objects as well as individuals. Similar to a detection cascade, this multi-stage architecture spares computational effort by discarding large irrelevant regions of the image early during the detection process. The ability to group objects provides further computational and memory savings, as it allows working with lower image resolutions in early stages, where groups are more easily detected than individuals, as they are more salient. We report experimental results on two aerial image datasets, and show that the proposed method is as accurate yet computationally more efficient than standard single-shot detectors, consistently across three different backbone architectures.

View on arXiv PDF Code

Similar