CVDec 1, 2016

TorontoCity: Seeing the World with a Million Eyes

arXiv:1612.00423v1185 citations
Originality Synthesis-oriented
AI Analysis

This provides a comprehensive benchmark for computer vision research in urban environments, but it is incremental as it builds on existing datasets by scaling up and integrating multiple data sources.

The authors introduced the TorontoCity benchmark, a large-scale dataset covering 712.5 km² of land with 400,000 buildings, to tackle urban scene understanding tasks like building height estimation and road extraction, and found that modern convolutional neural networks still struggle with most of these tasks.

In this paper we introduce the TorontoCity benchmark, which covers the full greater Toronto area (GTA) with 712.5 $km^2$ of land, 8439 $km$ of road and around 400,000 buildings. Our benchmark provides different perspectives of the world captured from airplanes, drones and cars driving around the city. Manually labeling such a large scale dataset is infeasible. Instead, we propose to utilize different sources of high-precision maps to create our ground truth. Towards this goal, we develop algorithms that allow us to align all data sources with the maps while requiring minimal human supervision. We have designed a wide variety of tasks including building height estimation (reconstruction), road centerline and curb extraction, building instance segmentation, building contour extraction (reorganization), semantic labeling and scene type classification (recognition). Our pilot study shows that most of these tasks are still difficult for modern convolutional neural networks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes