CVAIFeb 23

HeatPrompt: Zero-Shot Vision-Language Modeling of Urban Heat Demand from Satellite Images

arXiv:2602.20066v1h-index: 2
Originality Highly original
AI Analysis

This work addresses the challenge of decarbonizing space heating for municipalities in data-scarce regions by providing a lightweight tool for heat planning.

The paper tackles the problem of estimating urban heat demand without detailed building-level data by introducing HeatPrompt, a zero-shot vision-language framework that uses satellite images and domain-specific prompts to extract visual attributes for thermal load prediction. The method achieves a 93.7% R^2 uplift and reduces mean absolute error by 30% compared to a baseline model.

Accurate heat-demand maps play a crucial role in decarbonizing space heating, yet most municipalities lack detailed building-level data needed to calculate them. We introduce HeatPrompt, a zero-shot vision-language energy modeling framework that estimates annual heat demand using semantic features extracted from satellite images, basic Geographic Information System (GIS), and building-level features. We feed pretrained Large Vision Language Models (VLMs) with a domain-specific prompt to act as an energy planner and extract the visual attributes such as roof age, building density, etc, from the RGB satellite image that correspond to the thermal load. A Multi-Layer Perceptron (MLP) regressor trained on these captions shows an $R^2$ uplift of 93.7% and shrinks the mean absolute error (MAE) by 30% compared to the baseline model. Qualitative analysis shows that high-impact tokens align with high-demand zones, offering lightweight support for heat planning in data-scarce regions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes