CVApr 25, 2024

Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection

arXiv:2404.16944v12 citationsh-index: 8
Originality Synthesis-oriented
AI Analysis

This addresses the need for curated data to benchmark object detection in high-altitude urban scenarios, but it is incremental as it focuses on dataset creation and evaluation of existing methods.

The paper tackles the problem of detecting small objects like pedestrians in dense urban streetscapes from high-elevation cameras by introducing the Constellation dataset of 13K images, and finds that state-of-the-art methods have a 10% lower average precision for pedestrians compared to vehicles, with the best model achieving 92.0% pedestrian AP and 95.4% mAP.

We introduce Constellation, a dataset of 13K images suitable for research on detection of objects in dense urban streetscapes observed from high-elevation cameras, collected for a variety of temporal conditions. The dataset addresses the need for curated data to explore problems in small object detection exemplified by the limited pixel footprint of pedestrians observed tens of meters from above. It enables the testing of object detection models for variations in lighting, building shadows, weather, and scene dynamics. We evaluate contemporary object detection architectures on the dataset, observing that state-of-the-art methods have lower performance in detecting small pedestrians compared to vehicles, corresponding to a 10% difference in average precision (AP). Using structurally similar datasets for pretraining the models results in an increase of 1.8% mean AP (mAP). We further find that incorporating domain-specific data augmentations helps improve model performance. Using pseudo-labeled data, obtained from inference outcomes of the best-performing models, improves the performance of the models. Finally, comparing the models trained using the data collected in two different time intervals, we find a performance drift in models due to the changes in intersection conditions over time. The best-performing model achieves a pedestrian AP of 92.0% with 11.5 ms inference time on NVIDIA A100 GPUs, and an mAP of 95.4%.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes