A crowdsourced dataset of aerial images with annotated solar photovoltaic arrays and installation metadata
This dataset addresses the challenge of transferring PV mapping models across regions for public authorities and researchers, but it is incremental as it builds on existing data collection efforts.
The authors tackled the problem of domain shift in machine learning models for mapping solar photovoltaic (PV) arrays from aerial images by creating a crowdsourced dataset with annotations and metadata. They provided over 28,000 installations with metadata, 13,000 segmentation masks, and 7,000 annotations from two image providers to support robust PV mapping pipelines.
Photovoltaic (PV) energy generation plays a crucial role in the energy transition. Small-scale PV installations are deployed at an unprecedented pace, and their integration into the grid can be challenging since public authorities often lack quality data about them. Overhead imagery is increasingly used to improve the knowledge of residential PV installations with machine learning models capable of automatically mapping these installations. However, these models cannot be easily transferred from one region or data source to another due to differences in image acquisition. To address this issue known as domain shift and foster the development of PV array mapping pipelines, we propose a dataset containing aerial images, annotations, and segmentation masks. We provide installation metadata for more than 28,000 installations. We provide ground truth segmentation masks for 13,000 installations, including 7,000 with annotations for two different image providers. Finally, we provide installation metadata that matches the annotation for more than 8,000 installations. Dataset applications include end-to-end PV registry construction, robust PV installations mapping, and analysis of crowdsourced datasets.