Towards Temporal Change Explanations from Bi-Temporal Satellite Images
This work addresses the costly manual dataset construction for urban planning and environmental monitoring, representing an incremental improvement in human-AI collaboration for satellite image analysis.
The paper tackled the problem of explaining temporal changes between bi-temporal satellite images by investigating the ability of Large-scale Vision-Language Models (LVLMs) and proposing three prompting methods, with human evaluation showing the effectiveness of step-by-step reasoning prompting.
Explaining temporal changes between satellite images taken at different times is important for urban planning and environmental monitoring. However, manual dataset construction for the task is costly, so human-AI collaboration is promissing. Toward the direction, in this paper, we investigate the ability of Large-scale Vision-Language Models (LVLMs) to explain temporal changes between satellite images. While LVLMs are known to generate good image captions, they receive only a single image as input. To deal with a par of satellite images as input, we propose three prompting methods. Through human evaluation, we found the effectiveness of our step-by-step reasoning based prompting.