MapAnything: Mapping Urban Assets using Single Street-View Images
This addresses the need for up-to-date urban condition databases for city administrations, though it is incremental as it builds on existing metric depth estimation models.
The paper tackles the problem of automatically mapping urban assets like traffic signs and road damage from single street-view images to reduce manual database maintenance, achieving validated accuracy against LiDAR point clouds across different distance intervals and semantic areas.
To maintain an overview of urban conditions, city administrations manage databases of objects like traffic signs and trees, complete with their geocoordinates. Incidents such as graffiti or road damage are also relevant. As digitization increases, so does the need for more data and up-to-date databases, requiring significant manual effort. This paper introduces MapAnything, a module that automatically determines the geocoordinates of objects using individual images. Utilizing advanced Metric Depth Estimation models, MapAnything calculates geocoordinates based on the object's distance from the camera, geometric principles, and camera specifications. We detail and validate the module, providing recommendations for automating urban object and incident mapping. Our evaluation measures the accuracy of estimated distances against LiDAR point clouds in urban environments, analyzing performance across distance intervals and semantic areas like roads and vegetation. The module's effectiveness is demonstrated through practical use cases involving traffic signs and road damage.