Holistic Multi-View Building Analysis in the Wild with Projection Pooling
This work addresses remote building analysis for urban planning and monitoring, but it is incremental as it builds on existing datasets and methods with a novel layer.
The paper tackles the problem of fine-grained building attribute classification from multi-view images by introducing a new projection pooling layer that creates a unified top-view representation, improving classification accuracy compared to baseline models.
We address six different classification tasks related to fine-grained building attributes: construction type, number of floors, pitch and geometry of the roof, facade material, and occupancy class. Tackling such a remote building analysis problem became possible only recently due to growing large-scale datasets of urban scenes. To this end, we introduce a new benchmarking dataset, consisting of 49426 images (top-view and street-view) of 9674 buildings. These photos are further assembled, together with the geometric metadata. The dataset showcases various real-world challenges, such as occlusions, blur, partially visible objects, and a broad spectrum of buildings. We propose a new projection pooling layer, creating a unified, top-view representation of the top-view and the side views in a high-dimensional space. It allows us to utilize the building and imagery metadata seamlessly. Introducing this layer improves classification accuracy -- compared to highly tuned baseline models -- indicating its suitability for building analysis.