Joan Lasenby

h-index36

30papers

1,480citations

Novelty47%

AI Score51

Ranked #40,334 of 201,326 authors (top 20%)#15,674 in CV (top 27%)

30 Papers

CVNov 3, 2022Code

Sky-image-based solar forecasting using deep learning with multi-location data: training models locally, globally or via transfer learning?

Yuhao Nie, Quentin Paletta, Andea Scott et al. · cambridge

Solar forecasting from ground-based sky images has shown great promise in reducing the uncertainty in solar power generation. With more and more sky image datasets open sourced in recent years, the development of accurate and reliable deep learning-based solar forecasting methods has seen a huge growth in potential. In this study, we explore three different training strategies for solar forecasting models by leveraging three heterogeneous datasets collected globally with different climate patterns. Specifically, we compare the performance of local models trained individually based on single datasets and global models trained jointly based on the fusion of multiple datasets, and further examine the knowledge transfer from pre-trained solar forecasting models to a new dataset of interest. The results suggest that the local models work well when deployed locally, but significant errors are observed when applied offsite. The global model can adapt well to individual locations at the cost of a potential increase in training efforts. Pre-training models on a large and diversified source dataset and transferring to a target dataset generally achieves superior performance over the other two strategies. With 80% less training data, it can achieve comparable performance as the local baseline trained using the entire dataset.

LGJun 16, 2022

Evaluating Self-Supervised Learning for Molecular Graph Embeddings

Hanchen Wang, Jean Kaddour, Shengchao Liu et al. · stanford

Graph Self-Supervised Learning (GSSL) provides a robust pathway for acquiring embeddings without expert labelling, a capability that carries profound implications for molecular graphs due to the staggering number of potential molecules and the high cost of obtaining labels. However, GSSL methods are designed not for optimisation within a specific domain but rather for transferability across a variety of downstream tasks. This broad applicability complicates their evaluation. Addressing this challenge, we present "Molecular Graph Representation Evaluation" (MOLGRAPHEVAL), generating detailed profiles of molecular graph embeddings with interpretable and diversified attributes. MOLGRAPHEVAL offers a suite of probing tasks grouped into three categories: (i) generic graph, (ii) molecular substructure, and (iii) embedding space properties. By leveraging MOLGRAPHEVAL to benchmark existing GSSL methods against both current downstream datasets and our suite of tasks, we uncover significant inconsistencies between inferences drawn solely from existing datasets and those derived from more nuanced probing. These findings suggest that current evaluation methodologies fail to capture the entirety of the landscape.

CVJun 7, 2022

Omnivision forecasting: combining satellite observations with sky images for improved intra-hour solar energy predictions

Quentin Paletta, Guillaume Arbod, Joan Lasenby · cambridge

Integration of intermittent renewable energy sources into electric grids in large proportions is challenging. A well-established approach aimed at addressing this difficulty involves the anticipation of the upcoming energy supply variability to adapt the response of the grid. In solar energy, short-term changes in electricity production caused by occluding clouds can be predicted at different time scales from all-sky cameras (up to 30-min ahead) and satellite observations (up to 6h ahead). In this study, we integrate these two complementary points of view on the cloud cover in a single machine learning framework to improve intra-hour (up to 60-min ahead) irradiance forecasting. Both deterministic and probabilistic predictions are evaluated in different weather conditions (clear-sky, cloudy, overcast) and with different input configurations (sky images, satellite observations and/or past irradiance values). Our results show that the hybrid model benefits predictions in clear-sky conditions and improves longer-term forecasting. This study lays the groundwork for future novel approaches of combining sky images and satellite observations in a single learning framework to advance solar nowcasting.

CVFeb 10, 2023

CGA-PoseNet: Camera Pose Regression via a 1D-Up Approach to Conformal Geometric Algebra

Alberto Pepe, Joan Lasenby

We introduce CGA-PoseNet, which uses the 1D-Up approach to Conformal Geometric Algebra (CGA) to represent rotations and translations with a single mathematical object, the motor, for camera pose regression. We do so starting from PoseNet, which successfully predicts camera poses from small datasets of RGB frames. State-of-the-art methods, however, require expensive tuning to balance the orientational and translational components of the camera pose.This is usually done through complex, ad-hoc loss function to be minimized, and in some cases also requires 3D points as well as images. Our approach has the advantage of unifying the camera position and orientation through the motor. Consequently, the network searches for a single object which lives in a well-behaved 4D space with a Euclidean signature. This means that we can address the case of image-only datasets and work efficiently with a simple loss function, namely the mean squared error (MSE) between the predicted and ground truth motors. We show that it is possible to achieve high accuracy camera pose regression with a significantly simpler problem formulation. This 1D-Up approach to CGA can be employed to overcome the dichotomy between translational and orientational components in camera pose regression in a compact and elegant way.

CVDec 12, 2025

Particulate: Feed-Forward 3D Object Articulation

Ruining Li, Yuxin Yao, Chuanxia Zheng et al. · oxford

We present Particulate, a feed-forward approach that, given a single static 3D mesh of an everyday object, directly infers all attributes of the underlying articulated structure, including its 3D parts, kinematic structure, and motion constraints. At its core is a transformer network, Part Articulation Transformer, which processes a point cloud of the input mesh using a flexible and scalable architecture to predict all the aforementioned attributes with native multi-joint support. We train the network end-to-end on a diverse collection of articulated 3D assets from public datasets. During inference, Particulate lifts the network's feed-forward prediction to the input mesh, yielding a fully articulated 3D model in seconds, much faster than prior approaches that require per-object optimization. Particulate can also accurately infer the articulated structure of AI-generated 3D assets, enabling full-fledged extraction of articulated 3D objects from a single (real or synthetic) image when combined with an off-the-shelf image-to-3D generator. We further introduce a new challenging benchmark for 3D articulation estimation curated from high-quality public 3D assets, and redesign the evaluation protocol to be more consistent with human preferences. Quantitative and qualitative results show that Particulate significantly outperforms state-of-the-art approaches.

CVDec 31, 2025Code

SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time

Zhening Huang, Hyeonho Jeong, Xuelin Chen et al.

We present SpaceTimePilot, a video diffusion model that disentangles space and time for controllable generative rendering. Given a monocular video, SpaceTimePilot can independently alter the camera viewpoint and the motion sequence within the generative process, re-rendering the scene for continuous and arbitrary exploration across space and time. To achieve this, we introduce an effective animation time-embedding mechanism in the diffusion process, allowing explicit control of the output video's motion sequence with respect to that of the source video. As no datasets provide paired videos of the same dynamic scene with continuous temporal variations, we propose a simple yet effective temporal-warping training scheme that repurposes existing multi-view datasets to mimic temporal differences. This strategy effectively supervises the model to learn temporal control and achieve robust space-time disentanglement. To further enhance the precision of dual control, we introduce two additional components: an improved camera-conditioning mechanism that allows altering the camera from the first frame, and CamxTime, the first synthetic space-and-time full-coverage rendering dataset that provides fully free space-time video trajectories within a scene. Joint training on the temporal-warping scheme and the CamxTime dataset yields more precise temporal control. We evaluate SpaceTimePilot on both real-world and synthetic data, demonstrating clear space-time disentanglement and strong results compared to prior work. Project page: https://zheninghuang.github.io/Space-Time-Pilot/ Code: https://github.com/ZheningHuang/spacetimepilot

LGAug 24, 2024

STAResNet: a Network in Spacetime Algebra to solve Maxwell's PDEs

Alberto Pepe, Sven Buchholz, Joan Lasenby

We introduce STAResNet, a ResNet architecture in Spacetime Algebra (STA) to solve Maxwell's partial differential equations (PDEs). Recently, networks in Geometric Algebra (GA) have been demonstrated to be an asset for truly geometric machine learning. In \cite{brandstetter2022clifford}, GA networks have been employed for the first time to solve partial differential equations (PDEs), demonstrating an increased accuracy over real-valued networks. In this work we solve Maxwell's PDEs both in GA and STA employing the same ResNet architecture and dataset, to discuss the impact that the choice of the right algebra has on the accuracy of GA networks. Our study on STAResNet shows how the correct geometric embedding in Clifford Networks gives a mean square error (MSE), between ground truth and estimated fields, up to 2.6 times lower than than obtained with a standard Clifford ResNet with 6 times fewer trainable parameters. STAREsNet demonstrates consistently lower MSE and higher correlation regardless of scenario. The scenarios tested are: sampling period of the dataset; presence of obstacles with either seen or unseen configurations; the number of channels in the ResNet architecture; the number of rollout steps; whether the field is in 2D or 3D space. This demonstrates how choosing the right algebra in Clifford networks is a crucial factor for more compact, accurate, descriptive and better generalising pipelines.

CVOct 2, 2020Code

Unsupervised Point Cloud Pre-Training via Occlusion Completion

Hanchen Wang, Qi Liu, Xiangyu Yue et al.

We describe a simple pre-training approach for point clouds. It works in three steps: 1. Mask all points occluded in a camera view; 2. Learn an encoder-decoder model to reconstruct the occluded points; 3. Use the encoder weights as initialisation for downstream point cloud tasks. We find that even when we construct a single pre-training dataset (from ModelNet40), this pre-training method improves accuracy across different datasets and encoders, on a wide range of downstream tasks. Specifically, we show that our method outperforms previous pre-training methods in object classification, and both part-based and semantic segmentation tasks. We study the pre-trained features and find that they lead to wide downstream minima, have high transformation invariance, and have activations that are highly correlated with part labels. Code and data are available at: https://github.com/hansen7/OcCo

LGNov 18, 2019Code

Neural Random Subspace

Yun-Hao Cao, Jianxin Wu, Hanchen Wang et al.

The random subspace method, known as the pillar of random forests, is good at making precise and robust predictions. However, there is not a straightforward way yet to combine it with deep learning. In this paper, we therefore propose Neural Random Subspace (NRS), a novel deep learning based random subspace method. In contrast to previous forest methods, NRS enjoys the benefits of end-to-end, data-driven representation learning, as well as pervasive support from deep learning software and hardware platforms, hence achieving faster inference speed and higher accuracy. Furthermore, as a non-linear component to be encoded into Convolutional Neural Networks (CNNs), NRS learns non-linear feature representations in CNNs more efficiently than previous higher-order pooling methods, producing good results with negligible increase in parameters, floating point operations (FLOPs) and real running time. Compared with random subspaces, random forests and gradient boosting decision trees (GBDTs), NRS achieves superior performance on 35 machine learning datasets. Moreover, on both 2D image and 3D point cloud recognition tasks, integration of NRS with CNN architectures achieves consistent improvements with minor extra cost. Code is available at https://github.com/CupidJay/NRS_pytorch.

CVJan 14

Efficient Camera-Controlled Video Generation of Static Scenes via Sparse Diffusion and 3D Rendering

Jieying Chen, Jeffrey Hu, Joan Lasenby et al.

Modern video generative models based on diffusion models can produce very realistic clips, but they are computationally inefficient, often requiring minutes of GPU time for just a few seconds of video. This inefficiency poses a critical barrier to deploying generative video in applications that require real-time interactions, such as embodied AI and VR/AR. This paper explores a new strategy for camera-conditioned video generation of static scenes: using diffusion-based generative models to generate a sparse set of keyframes, and then synthesizing the full video through 3D reconstruction and rendering. By lifting keyframes into a 3D representation and rendering intermediate views, our approach amortizes the generation cost across hundreds of frames while enforcing geometric consistency. We further introduce a model that predicts the optimal number of keyframes for a given camera trajectory, allowing the system to adaptively allocate computation. Our final method, SRENDER, uses very sparse keyframes for simple trajectories and denser ones for complex camera motion. This results in video generation that is more than 40 times faster than the diffusion-based baseline in generating 20 seconds of video, while maintaining high visual fidelity and temporal stability, offering a practical path toward efficient and controllable video synthesis.

CVFeb 23, 2024

OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding

Francis Engelmann, Ayca Takmaz, Jonas Schult et al.

This report provides an overview of the challenge hosted at the OpenSUN3D Workshop on Open-Vocabulary 3D Scene Understanding held in conjunction with ICCV 2023. The goal of this workshop series is to provide a platform for exploration and discussion of open-vocabulary 3D scene understanding tasks, including but not limited to segmentation, detection and mapping. We provide an overview of the challenge hosted at the workshop, present the challenge dataset, the evaluation methodology, and brief descriptions of the winning methods. For additional details, please see https://opensun3d.github.io/index_iccv23.html.

CVJul 3, 2025

LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans

Zhening Huang, Xiaoyang Wu, Fangcheng Zhong et al.

We propose LiteReality, a novel pipeline that converts RGB-D scans of indoor environments into compact, realistic, and interactive 3D virtual replicas. LiteReality not only reconstructs scenes that visually resemble reality but also supports key features essential for graphics pipelines -- such as object individuality, articulation, high-quality physically based rendering materials, and physically based interaction. At its core, LiteReality first performs scene understanding and parses the results into a coherent 3D layout and objects with the help of a structured scene graph. It then reconstructs the scene by retrieving the most visually similar 3D artist-crafted models from a curated asset database. Next, the Material Painting module enhances realism by recovering high-quality, spatially varying materials. Finally, the reconstructed scene is integrated into a simulation engine with basic physical properties to enable interactive behavior. The resulting scenes are compact, editable, and fully compatible with standard graphics pipelines, making them suitable for applications in AR/VR, gaming, robotics, and digital twins. In addition, LiteReality introduces a training-free object retrieval module that achieves state-of-the-art similarity performance on the Scan2CAD benchmark, along with a robust material painting module capable of transferring appearances from images of any style to 3D assets -- even under severe misalignment, occlusion, and poor lighting. We demonstrate the effectiveness of LiteReality on both real-life scans and public datasets. Project page: https://litereality.github.io; Video: https://www.youtube.com/watch?v=ecK9m3LXg2c

CVApr 22, 2025

SmallGS: Gaussian Splatting-based Camera Pose Estimation for Small-Baseline Videos

Yuxin Yao, Yan Zhang, Zhening Huang et al.

Dynamic videos with small baseline motions are ubiquitous in daily life, especially on social media. However, these videos present a challenge to existing pose estimation frameworks due to ambiguous features, drift accumulation, and insufficient triangulation constraints. Gaussian splatting, which maintains an explicit representation for scenes, provides a reliable novel view rasterization when the viewpoint change is small. Inspired by this, we propose SmallGS, a camera pose estimation framework that is specifically designed for small-baseline videos. SmallGS optimizes sequential camera poses using Gaussian splatting, which reconstructs the scene from the first frame in each video segment to provide a stable reference for the rest. The temporal consistency of Gaussian splatting within limited viewpoint differences reduced the requirement of sufficient depth variations in traditional camera pose estimation. We further incorporate pretrained robust visual features, e.g. DINOv2, into Gaussian splatting, where high-dimensional feature map rendering enhances the robustness of camera pose estimation. By freezing the Gaussian splatting and optimizing camera viewpoints based on rasterized features, SmallGS effectively learns camera poses without requiring explicit feature correspondences or strong parallax motion. We verify the effectiveness of SmallGS in small-baseline videos in TUM-Dynamics sequences, which achieves impressive accuracy in camera pose estimation compared to MonST3R and DORID-SLAM for small-baseline videos in dynamic scenes. Our project page is at: https://yuxinyao620.github.io/SmallGS

CVSep 1, 2023

OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

Zhening Huang, Xiaoyang Wu, Xi Chen et al.

In this work, we introduce OpenIns3D, a new 3D-input-only framework for 3D open-vocabulary scene understanding. The OpenIns3D framework employs a "Mask-Snap-Lookup" scheme. The "Mask" module learns class-agnostic mask proposals in 3D point clouds, the "Snap" module generates synthetic scene-level images at multiple scales and leverages 2D vision-language models to extract interesting objects, and the "Lookup" module searches through the outcomes of "Snap" to assign category names to the proposed masks. This approach, yet simple, achieves state-of-the-art performance across a wide range of 3D open-vocabulary tasks, including recognition, object detection, and instance segmentation, on both indoor and outdoor datasets. Moreover, OpenIns3D facilitates effortless switching between different 2D detectors without requiring retraining. When integrated with powerful 2D open-world models, it achieves excellent results in scene understanding tasks. Furthermore, when combined with LLM-powered 2D models, OpenIns3D exhibits an impressive capability to comprehend and process highly complex text queries that demand intricate reasoning and real-world knowledge. Project page: https://zheninghuang.github.io/OpenIns3D/

CVNov 29, 2021

SPIN: Simplifying Polar Invariance for Neural networks Application to vision-based irradiance forecasting

Quentin Paletta, Anthony Hu, Guillaume Arbod et al.

Translational invariance induced by pooling operations is an inherent property of convolutional neural networks, which facilitates numerous computer vision tasks such as classification. Yet to leverage rotational invariant tasks, convolutional architectures require specific rotational invariant layers or extensive data augmentation to learn from diverse rotated versions of a given spatial configuration. Unwrapping the image into its polar coordinates provides a more explicit representation to train a convolutional architecture as the rotational invariance becomes translational, hence the visually distinct but otherwise equivalent rotated versions of a given scene can be learnt from a single image. We show with two common vision-based solar irradiance forecasting challenges (i.e. using ground-taken sky images or satellite images), that this preprocessing step significantly improves prediction results by standardising the scene representation, while decreasing training time by a factor of 4 compared to augmenting data with rotations. In addition, this transformation magnifies the area surrounding the centre of the rotation, leading to more accurate short-term irradiance predictions.

AINov 18, 2021

Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in Artificial Intelligence

Xiang Bai, Hanchen Wang, Liya Ma et al.

Artificial intelligence (AI) provides a promising substitution for streamlining COVID-19 diagnoses. However, concerns surrounding security and trustworthiness impede the collection of large-scale representative medical data, posing a considerable challenge for training a well-generalised model in clinical practices. To address this, we launch the Unified CT-COVID AI Diagnostic Initiative (UCADI), where the AI model can be distributedly trained and independently executed at each host institution under a federated learning framework (FL) without data sharing. Here we show that our FL model outperformed all the local models by a large yield (test sensitivity /specificity in China: 0.973/0.951, in the UK: 0.730/0.942), achieving comparable performance with a panel of professional radiologists. We further evaluated the model on the hold-out (collected from another two hospitals leaving out the FL) and heterogeneous (acquired with contrast materials) data, provided visual explanations for decisions made by the model, and analysed the trade-offs between the model performance and the communication costs in the federated training process. Our study is based on 9,573 chest computed tomography scans (CTs) from 3,336 patients collected from 23 hospitals located in China and the UK. Collectively, our work advanced the prospects of utilising federated learning for privacy-preserving AI in digital health.

LGOct 7, 2021

Pre-training Molecular Graph Representation with 3D Geometry

Shengchao Liu, Hanchen Wang, Weiyang Liu et al.

Molecular graph representation learning is a fundamental problem in modern drug and material discovery. Molecular graphs are typically modeled by their 2D topological structures, but it has been recently discovered that 3D geometric information plays a more vital role in predicting molecular functionalities. However, the lack of 3D information in real-world scenarios has significantly impeded the learning of geometric graph representation. To cope with this challenge, we propose the Graph Multi-View Pre-training (GraphMVP) framework where self-supervised learning (SSL) is performed by leveraging the correspondence and consistency between 2D topological structures and 3D geometric views. GraphMVP effectively learns a 2D molecular graph encoder that is enhanced by richer and more discriminative 3D geometry. We further provide theoretical insights to justify the effectiveness of GraphMVP. Finally, comprehensive experiments show that GraphMVP can consistently outperform existing graph SSL methods.

ROSep 26, 2021

Singularities of serial robots: Identification and distance computation using geometric algebra

Isiah Zaplana, Hugo Hadfield, Joan Lasenby

The singularities of serial robotic manipulators are those configurations in which the robot loses the ability to move in at least one direction. Hence, their identification is fundamental to enhance the performance of current control and motion planning strategies. While classical approaches entail the computation of the determinant of either a 6x n or nxn matrix for an n degrees of freedom serial robot, this work addresses a novel singularity identification method based on modelling the twists defined by the joint axes of the robot as vectors of the six-dimensional and three-dimensional geometric algebras. In particular, it consists of identifying which configurations cause the exterior product of these twists to vanish. In addition, since rotors represent rotations in geometric algebra, once these singularities have been identified, a distance function is defined in the configuration space C such that its restriction to the set of singular configurations S allows us to compute the distance of any configuration to a given singularity. This distance function is used to enhance how the singularities are handled in three different scenarios, namely motion planning, motion control and bilateral teleoperation.

ROSep 25, 2021

Closed-form solutions for the inverse kinematics of serial robots using conformal geometric algebra

Isiah Zaplana, Hugo Hadfield, Joan Lasenby

This work addresses the inverse kinematics of serial robots using conformal geometric algebra. Classical approaches include either the use of homogeneous matrices, which entails high computational cost and execution time or the development of particular geometric strategies that cannot be generalized to arbitrary serial robots. In this work, we present a compact, elegant and intuitive formulation of robot kinematics based on conformal geometric algebra that provides a suitable framework for the closed-form resolution of the inverse kinematic problem for manipulators with a spherical wrist. For serial robots of this kind, the inverse kinematics problem can be split in two subproblems: the position and orientation problems. The latter is solved by appropriately splitting the rotor that defines the target orientation into three simpler rotors, while the former is solved by developing a geometric strategy for each combination of prismatic and revolute joints that forms the position part of the robot. Finally, the inverse kinematics of 7 DoF redundant manipulators with a spherical wrist is solved by extending the geometric solutions obtained in the non-redundant case.

IVAug 5, 2021

Rotaflip: A New CNN Layer for Regularization and Rotational Invariance in Medical Images

Juan P. Vigueras-Guillén, Joan Lasenby, Frank Seeliger

Regularization in convolutional neural networks (CNNs) is usually addressed with dropout layers. However, dropout is sometimes detrimental in the convolutional part of a CNN as it simply sets to zero a percentage of pixels in the feature maps, adding unrepresentative examples during training. Here, we propose a CNN layer that performs regularization by applying random rotations of reflections to a small percentage of feature maps after every convolutional layer. We prove how this concept is beneficial for images with orientational symmetries, such as in medical images, as it provides a certain degree of rotational invariance. We tested this method in two datasets, a patch-based set of histopathology images (PatchCamelyon) to perform classification using a generic DenseNet, and a set of specular microscopy images of the corneal endothelium to perform segmentation using a tailored U-net, improving the performance in both cases.

CVAug 5, 2021

Redesigning Fully Convolutional DenseUNets for Large Histopathology Images

Juan P. Vigueras-Guillén, Joan Lasenby, Frank Seeliger

The automated segmentation of cancer tissue in histopathology images can help clinicians to detect, diagnose, and analyze such disease. Different from other natural images used in many convolutional networks for benchmark, histopathology images can be extremely large, and the cancerous patterns can reach beyond 1000 pixels. Therefore, the well-known networks in the literature were never conceived to handle these peculiarities. In this work, we propose a Fully Convolutional DenseUNet that is particularly designed to solve histopathology problems. We evaluated our network in two public pathology datasets published as challenges in the recent MICCAI 2019: binary segmentation in colon cancer images (DigestPath2019), and multi-class segmentation in prostate cancer images (Gleason2019), achieving similar and better results than the winners of the challenges, respectively. Furthermore, we discussed some good practices in the training setup to yield the best performance and the main challenges in these histopathology datasets.

CVApr 26, 2021

ECLIPSE : Envisioning CLoud Induced Perturbations in Solar Energy

Quentin Paletta, Anthony Hu, Guillaume Arbod et al.

Efficient integration of solar energy into the electricity mix depends on a reliable anticipation of its intermittency. A promising approach to forecast the temporal variability of solar irradiance resulting from the cloud cover dynamics is based on the analysis of sequences of ground-taken sky images or satellite observations. Despite encouraging results, a recurrent limitation of existing deep learning approaches lies in the ubiquitous tendency of reacting to past observations rather than actively anticipating future events. This leads to a frequent temporal lag and limited ability to predict sudden events. To address this challenge, we introduce ECLIPSE, a spatio-temporal neural network architecture that models cloud motion from sky images to not only predict future irradiance levels and associated uncertainties, but also segmented images, which provide richer information on the local irradiance map. We show that ECLIPSE anticipates critical events and reduces temporal delay while generating visually realistic futures. The model characteristics and properties are investigated with an ablation study and a comparative study on the benefits and different ways to integrate auxiliary data into the modelling. The model predictions are also interpreted through an analysis of the principal spatio-temporal components learned during network training.

CVFeb 1, 2021

Benchmarking of Deep Learning Irradiance Forecasting Models from Sky Images -- an in-depth Analysis

Quentin Paletta, Guillaume Arbod, Joan Lasenby

A number of industrial applications, such as smart grids, power plant operation, hybrid system management or energy trading, could benefit from improved short-term solar forecasting, addressing the intermittent energy production from solar panels. However, current approaches to modelling the cloud cover dynamics from sky images still lack precision regarding the spatial configuration of clouds, their temporal dynamics and physical interactions with solar radiation. Benefiting from a growing number of large datasets, data driven methods are being developed to address these limitations with promising results. In this study, we compare four commonly used Deep Learning architectures trained to forecast solar irradiance from sequences of hemispherical sky images and exogenous variables. To assess the relative performance of each model, we used the Forecast Skill metric based on the smart persistence model, as well as ramp and time distortion metrics. The results show that encoding spatiotemporal aspects of the sequence of sky images greatly improved the predictions with 10 min ahead Forecast Skill reaching 20.4% on the test year. However, based on the experimental data, we conclude that, with a common setup, Deep Learning models tend to behave just as a 'very smart persistence model', temporally aligned with the persistence model while mitigating its most penalising errors. Thus, despite being captured by the sky cameras, models often miss fundamental events causing large irradiance changes such as clouds obscuring the sun. We hope that our work will contribute to a shift of this approach to irradiance forecasting, from reactive to anticipatory.

CVDec 2, 2020

A Temporally Consistent Image-based Sun Tracking Algorithm for Solar Energy Forecasting Applications

Quentin Paletta, Joan Lasenby

Improving irradiance forecasting is critical to further increase the share of solar in the energy mix. On a short time scale, fish-eye cameras on the ground are used to capture cloud displacements causing the local variability of the electricity production. As most of the solar radiation comes directly from the Sun, current forecasting approaches use its position in the image as a reference to interpret the cloud cover dynamics. However, existing Sun tracking methods rely on external data and a calibration of the camera, which requires access to the device. To address these limitations, this study introduces an image-based Sun tracking algorithm to localise the Sun in the image when it is visible and interpolate its daily trajectory from past observations. We validate the method on a set of sky images collected over a year at SIRTA's lab. Experimental results show that the proposed method provides robust smooth Sun trajectories with a mean absolute error below 1% of the image size.

CVMay 22, 2020

Convolutional Neural Networks applied to sky images for short-term solar irradiance forecasting

Quentin Paletta, Joan Lasenby

Despite the advances in the field of solar energy, improvements of solar forecasting techniques, addressing the intermittent electricity production, remain essential for securing its future integration into a wider energy supply. A promising approach to anticipate irradiance changes consists of modeling the cloud cover dynamics from ground taken or satellite images. This work presents preliminary results on the application of deep Convolutional Neural Networks for 2 to 20 min irradiance forecasting using hemispherical sky images and exogenous variables. We evaluate the models on a set of irradiance measurements and corresponding sky images collected in Palaiseau (France) over 8 months with a temporal resolution of 2 min. To outline the learning of neural networks in the context of short-term irradiance forecasting, we implemented visualisation techniques revealing the types of patterns recognised by trained algorithms in sky images. In addition, we show that training models with past samples of the same day improves their forecast skill, relative to the smart persistence model based on the Mean Square Error, by around 10% on a 10 min ahead prediction. These results emphasise the benefit of integrating previous same-day data in short-term forecasting. This, in turn, can be achieved through model fine tuning or using recurrent units to facilitate the extraction of relevant temporal features from past data.

NEApr 13, 2018

The unreasonable effectiveness of the forget gate

Jos van der Westhuizen, Joan Lasenby

Given the success of the gated recurrent unit, a natural question is whether all the gates of the long short-term memory (LSTM) network are necessary. Previous research has shown that the forget gate is one of the most important gates in the LSTM. Here we show that a forget-gate-only version of the LSTM with chrono-initialized biases, not only provides computational savings but outperforms the standard LSTM on multiple benchmark datasets and competes with some of the best contemporary models. Our proposed network, the JANET, achieves accuracies of 99% and 92.5% on the MNIST and pMNIST datasets, outperforming the standard LSTM which yields accuracies of 98.5% and 91%.

MLJun 5, 2017

Bayesian LSTMs in medicine

Jos van der Westhuizen, Joan Lasenby

The medical field stands to see significant benefits from the recent advances in deep learning. Knowing the uncertainty in the decision made by any machine learning algorithm is of utmost importance for medical practitioners. This study demonstrates the utility of using Bayesian LSTMs for classification of medical time series. Four medical time series datasets are used to show the accuracy improvement Bayesian LSTMs provide over standard LSTMs. Moreover, we show cherry-picked examples of confident and uncertain classifications of the medical time series. With simple modifications of the common practice for deep learning, significant improvements can be made for the medical practitioner and patient.

MLMay 23, 2017

Techniques for visualizing LSTMs applied to electrocardiograms

Jos van der Westhuizen, Joan Lasenby

This paper explores four different visualization techniques for long short-term memory (LSTM) networks applied to continuous-valued time series. On the datasets analysed, we find that the best visualization technique is to learn an input deletion mask that optimally reduces the true class score. With a specific focus on single-lead electrocardiograms from the MIT-BIH arrhythmia dataset, we show that salient input features for the LSTM classifier align well with medical theory.

CVMay 20, 2014

Single camera pose estimation using Bayesian filtering and Kinect motion priors

Michael Burke, Joan Lasenby

Traditional approaches to upper body pose estimation using monocular vision rely on complex body models and a large variety of geometric constraints. We argue that this is not ideal and somewhat inelegant as it results in large processing burdens, and instead attempt to incorporate these constraints through priors obtained directly from training data. A prior distribution covering the probability of a human pose occurring is used to incorporate likely human poses. This distribution is obtained offline, by fitting a Gaussian mixture model to a large dataset of recorded human body poses, tracked using a Kinect sensor. We combine this prior information with a random walk transition model to obtain an upper body model, suitable for use within a recursive Bayesian filtering framework. Our model can be viewed as a mixture of discrete Ornstein-Uhlenbeck processes, in that states behave as random walks, but drift towards a set of typically observed poses. This model is combined with measurements of the human head and hand positions, using recursive Bayesian estimation to incorporate temporal information. Measurements are obtained using face detection and a simple skin colour hand detector, trained using the detected face. The suggested model is designed with analytical tractability in mind and we show that the pose tracking can be Rao-Blackwellised using the mixture Kalman filter, allowing for computational efficiency while still incorporating bio-mechanical properties of the upper body. In addition, the use of the proposed upper body model allows reliable three-dimensional pose estimates to be obtained indirectly for a number of joints that are often difficult to detect using traditional object recognition strategies. Comparisons with Kinect sensor results and the state of the art in 2D pose estimation highlight the efficacy of the proposed approach.

CVJan 23, 2013

ChESS - Quick and Robust Detection of Chess-board Features

Stuart Bennett, Joan Lasenby

Localization of chess-board vertices is a common task in computer vision, underpinning many applications, but relatively little work focusses on designing a specific feature detector that is fast, accurate and robust. In this paper the `Chess-board Extraction by Subtraction and Summation' (ChESS) feature detector, designed to exclusively respond to chess-board vertices, is presented. The method proposed is robust against noise, poor lighting and poor contrast, requires no prior knowledge of the extent of the chess-board pattern, is computationally very efficient, and provides a strength measure of detected features. Such a detector has significant application both in the key field of camera calibration, as well as in Structured Light 3D reconstruction. Evidence is presented showing its robustness, accuracy, and efficiency in comparison to other commonly used detectors both under simulation and in experimental 3D reconstruction of flat plate and cylindrical objects