CVJul 15, 2022Code
Towards unsupervised assessment with open-source data of the accuracy of deep learning-based distributed PV mappingGabriel Kasmi, Laurent Dubus, Philippe Blanc et al.
Photovoltaic (PV) energy is rapidly growing and key to mitigating the energy crisis. However, distributed PV generation, which amounts to half of the PV installed capacity, is typically unavailable to transmission system operators (TSOs), making it increasingly difficult to balance the load and supply and avoid grid congestions. To assess distributed PV generation, TSOs need precise knowledge regarding the metadata of distributed PV installations. Many remote sensing-based approaches have been proposed to map these installations in recent years. However, to use these methods in industrial processes, assessing their accuracy over the mapping area, i.e., the area covered by the model during deployment, is necessary. We define the downstream task accuracy (DTA) as the accuracy over the mapping area, automatically computed using publicly available data sources and the model's outputs and expressed in an interpretable way for operators. We benchmark existing models for distributed PV mapping and show how they perform in terms of DTA. We show that the accuracy computed on the test set overestimates by about 30 percentage points the accuracy on the mapping area. Our approach paves the way for safer integration of deep-learning-based pipelines for remote PV mapping. Code is available at https://github.com/gabrielkasmi/deeppvmapper.
CVSep 8, 2022
A crowdsourced dataset of aerial images with annotated solar photovoltaic arrays and installation metadataGabriel Kasmi, Yves-Marie Saint-Drenan, David Trebosc et al.
Photovoltaic (PV) energy generation plays a crucial role in the energy transition. Small-scale PV installations are deployed at an unprecedented pace, and their integration into the grid can be challenging since public authorities often lack quality data about them. Overhead imagery is increasingly used to improve the knowledge of residential PV installations with machine learning models capable of automatically mapping these installations. However, these models cannot be easily transferred from one region or data source to another due to differences in image acquisition. To address this issue known as domain shift and foster the development of PV array mapping pipelines, we propose a dataset containing aerial images, annotations, and segmentation masks. We provide installation metadata for more than 28,000 installations. We provide ground truth segmentation masks for 13,000 installations, including 7,000 with annotations for two different image providers. Finally, we provide installation metadata that matches the annotation for more than 8,000 installations. Dataset applications include end-to-end PV registry construction, robust PV installations mapping, and analysis of crowdsourced datasets.
CVSep 21, 2023
Can We Reliably Improve the Robustness to Image Acquisition of Remote Sensing of PV Systems?Gabriel Kasmi, Laurent Dubus, Yves-Marie Saint-Drenan et al.
Photovoltaic (PV) energy is crucial for the decarbonization of energy systems. Due to the lack of centralized data, remote sensing of rooftop PV installations is the best option to monitor the evolution of the rooftop PV installed fleet at a regional scale. However, current techniques lack reliability and are notably sensitive to shifts in the acquisition conditions. To overcome this, we leverage the wavelet scale attribution method (WCAM), which decomposes a model's prediction in the space-scale domain. The WCAM enables us to assess on which scales the representation of a PV model rests and provides insights to derive methods that improve the robustness to acquisition conditions, thus increasing trust in deep learning systems to encourage their use for the safe integration of clean energy in electric systems.
LGMar 6, 2025
On the Importance of Clearsky Model in Short-Term Solar Radiation ForecastingCyril Voyant, Milan Despotovic, Gilles Notton et al.
Clearsky models are widely used in solar energy for many applications such as quality control, resource assessment, satellite-base irradiance estimation and forecasting. However, their use in forecasting and nowcasting is associated with a number of challenges. Synchronization errors, reliance on the Clearsky index (ratio of the global horizontal irradiance to its cloud-free counterpart) and high sensitivity of the clearsky model to errors in aerosol optical depth at low solar elevation limit their added value in real-time applications. This paper explores the feasibility of short-term forecasting without relying on a clearsky model. We propose a Clearsky-Free forecasting approach using Extreme Learning Machine (ELM) models. ELM learns daily periodicity and local variability directly from raw Global Horizontal Irradiance (GHI) data. It eliminates the need for Clearsky normalization, simplifying the forecasting process and improving scalability. Our approach is a non-linear adaptative statistical method that implicitely learns the irradiance in cloud-free conditions removing the need for an clear-sky model and the related operational issues. Deterministic and probabilistic results are compared to traditional benchmarks, including ARMA with McClear-generated Clearsky data and quantile regression for probabilistic forecasts. ELM matches or outperforms these methods, providing accurate predictions and robust uncertainty quantification. This approach offers a simple, efficient solution for real-time solar forecasting. By overcoming the stationarization process limitations based on usual multiplicative scheme Clearsky models, it provides a flexible and reliable framework for modern energy systems.
LGAug 18, 2025
Short-Term Forecasting of Energy Production and Consumption Using Extreme Learning Machine: A Comprehensive MIMO based ELM ApproachCyril Voyant, Milan Despotovic, Luis Garcia-Gutierrez et al.
A novel methodology for short-term energy forecasting using an Extreme Learning Machine ($\mathtt{ELM}$) is proposed. Using six years of hourly data collected in Corsica (France) from multiple energy sources (solar, wind, hydro, thermal, bioenergy, and imported electricity), our approach predicts both individual energy outputs and total production (including imports, which closely follow energy demand, modulo losses) through a Multi-Input Multi-Output ($\mathtt{MIMO}$) architecture. To address non-stationarity and seasonal variability, sliding window techniques and cyclic time encoding are incorporated, enabling dynamic adaptation to fluctuations. The $\mathtt{ELM}$ model significantly outperforms persistence-based forecasting, particularly for solar and thermal energy, achieving an $\mathtt{nRMSE}$ of $17.9\%$ and $5.1\%$, respectively, with $\mathtt{R^2} > 0.98$ (1-hour horizon). The model maintains high accuracy up to five hours ahead, beyond which renewable energy sources become increasingly volatile. While $\mathtt{MIMO}$ provides marginal gains over Single-Input Single-Output ($\mathtt{SISO}$) architectures and offers key advantages over deep learning methods such as $\mathtt{LSTM}$, it provides a closed-form solution with lower computational demands, making it well-suited for real-time applications, including online learning. Beyond predictive accuracy, the proposed methodology is adaptable to various contexts and datasets, as it can be tuned to local constraints such as resource availability, grid characteristics, and market structures.
SPDec 9, 2019
Machine learning models show similar performance to Renewables.ninja for generation of long-term wind power time series even without location informationJohann Baumgartner, Katharina Gruber, Sofia Simoes et al.
Driven by climatic processes, wind power generation is inherently variable. Long-term simulated wind power time series are therefore an essential component for understanding the temporal availability of wind power and its integration into future renewable energy systems. In the recent past, mainly power curve based models such as Renewables.ninja (RN) have been used for deriving synthetic time series for wind power generation despite their need for accurate location information as well as for bias correction, and their insufficient replication of extreme events and short-term power ramps. We assess how time series generated by machine learning models (MLM) compare to RN in terms of their ability to replicate the characteristics of observed nationally aggregated wind power generation for Germany. Hence, we apply neural networks to one MERRA2 reanalysis wind speed input dataset with no location information and one with basic location information. The resulting time series and the RN time series are compared with actual generation. Both MLM time series feature equal or even better time series quality than RN depending on the characteristics considered. We conclude that MLM models can, even when reducing information on turbine locations and turbine types, produce time series of at least equal quality to RN.