AO-PHAug 1, 2022
Probabilistic forecasts of extreme heatwaves using convolutional neural networks in a regime of lack of dataGeorge Miloshevich, Bastien Cozian, Patrice Abry et al.
Understanding extreme events and their probability is key for the study of climate change impacts, risk assessment, adaptation, and the protection of living beings. Forecasting the occurrence probability of extreme heatwaves is a primary challenge for risk assessment and attribution, but also for fundamental studies about processes, dataset and model validation, and climate change studies. In this work we develop a methodology to build forecasting models which are based on convolutional neural networks, trained on extremely long climate model outputs. We demonstrate that neural networks have positive predictive skills, with respect to random climatological forecasts, for the occurrence of long-lasting 14-day heatwaves over France, up to 15 days ahead of time for fast dynamical drivers (500 hPa geopotential height fields), and also at much longer lead times for slow physical drivers (soil moisture). This forecast is made seamlessly in time and space, for fast hemispheric and slow local drivers. We find that the neural network selects extreme heatwaves associated with a North-Hemisphere wavenumber-3 pattern. The main scientific message is that most of the time, training neural networks for predicting extreme heatwaves occurs in a regime of lack of data. We suggest that this is likely to be the case for most other applications to large scale atmosphere and climate phenomena. For instance, using one hundred years-long training sets, a regime of drastic lack of data, leads to severely lower predictive skills and general inability to extract useful information available in the 500 hPa geopotential height field at a hemispheric scale in contrast to the dataset of several thousand years long. We discuss perspectives for dealing with the lack of data regime, for instance rare event simulations and how transfer learning may play a role in this latter task.
LGMar 17, 2022
Covid19 Reproduction Number: Credibility Intervals by Blockwise Proximal Monte Carlo SamplersGersende Fort, Barbara Pascal, Patrice Abry et al.
Monitoring the Covid19 pandemic constitutes a critical societal stake that received considerable research efforts. The intensity of the pandemic on a given territory is efficiently measured by the reproduction number, quantifying the rate of growth of daily new infections. Recently, estimates for the time evolution of the reproduction number were produced using an inverse problem formulation with a nonsmooth functional minimization. While it was designed to be robust to the limited quality of the Covid19 data (outliers, missing counts), the procedure lacks the ability to output credibility interval based estimates. This remains a severe limitation for practical use in actual pandemic monitoring by epidemiologists that the present work aims to overcome by use of Monte Carlo sampling. After interpretation of the nonsmooth functional into a Bayesian framework, several sampling schemes are tailored to adjust the nonsmooth nature of the resulting posterior distribution. The originality of the devised algorithms stems from combining a Langevin Monte Carlo sampling scheme with Proximal operators. Performance of the new algorithms in producing relevant credibility intervals for the reproduction number estimates and denoised counts are compared. Assessment is conducted on real daily new infection counts made available by the Johns Hopkins University. The interest of the devised monitoring tools are illustrated on Covid19 data from several different countries.
NAMay 4, 2017
Finite resolution effects in p-leader multifractal analysisRoberto Leonarduzzi, Herwig Wendt, Patrice Abry et al.
Multifractal analysis has become a standard signal processing tool,for which a promising new formulation, the p-leader multifractal formalism, has recently been proposed. It relies on novel multiscale quantities, the p-leaders, defined as local l^p norms of sets of wavelet coefficients located at infinitely many fine scales. Computing such infinite sums from actual finite-resolution data requires truncations to the finest available scale, which results in biased p-leaders and thus in inaccurate estimates of multifractal properties. A systematic study of such finite-resolution effects leads to conjecture an explicit and universal closed-form correction that permits an accurate estimation of scaling exponents. This conjecture is formulated from the theoretical study of a particular class of models for multifractal processes, the wavelet-based cascades. The relevance and generality of the proposed conjecture is assessed by numerical simulations conducted over a large variety of multifractal processes. Finally, the relevance of the proposed corrected estimators is demonstrated on the analysis of heart rate variability data.
ASSep 3, 2024
Equivariance-based self-supervised learning for audio signal recovery from clipped measurementsVictor Sechaud, Laurent Jacques, Patrice Abry et al.
In numerous inverse problems, state-of-the-art solving strategies involve training neural networks from ground truth and associated measurement datasets that, however, may be expensive or impossible to collect. Recently, self-supervised learning techniques have emerged, with the major advantage of no longer requiring ground truth data. Most theoretical and experimental results on self-supervised learning focus on linear inverse problems. The present work aims to study self-supervised learning for the non-linear inverse problem of recovering audio signals from clipped measurements. An equivariance-based selfsupervised loss is proposed and studied. Performance is assessed on simulated clipped measurements with controlled and varied levels of clipping, and further reported on standard real music signals. We show that the performance of the proposed equivariance-based self-supervised declipping strategy compares favorably to fully supervised learning while only requiring clipped measurements alone for training.
IVFeb 25
Learning to reconstruct from saturated data: audio declipping and high-dynamic range imagingVictor Sechaud, Laurent Jacques, Patrice Abry et al.
Learning based methods are now ubiquitous for solving inverse problems, but their deployment in real-world applications is often hindered by the lack of ground truth references for training. Recent self-supervised learning strategies offer a promising alternative, avoiding the need for ground truth. However, most existing methods are limited to linear inverse problems. This work extends self-supervised learning to the non-linear problem of recovering audio and images from clipped measurements, by assuming that the signal distribution is approximately invariant to changes in amplitude. We provide sufficient conditions for learning to reconstruct from saturated signals alone and a self-supervised loss that can be used to train reconstruction networks. Experiments on both audio and image data show that the proposed approach is almost as effective as fully supervised approaches, despite relying solely on clipped measurements for training.
IVDec 18, 2023
Scale-Equivariant Imaging: Self-Supervised Learning for Image Super-Resolution and DeblurringJérémy Scanvic, Mike Davies, Patrice Abry et al.
Self-supervised methods have recently proved to be nearly as effective as supervised ones in various imaging inverse problems, paving the way for learning-based approaches in scientific and medical imaging applications where ground truth data is hard or expensive to obtain. These methods critically rely on invariance to translations and/or rotations of the image distribution to learn from incomplete measurement data alone. However, existing approaches fail to obtain competitive performances in the problems of image super-resolution and deblurring, which play a key role in most imaging systems. In this work, we show that invariance to roto-translations is insufficient to learn from measurements that only contain low-frequency information. Instead, we propose scale-equivariant imaging, a new self-supervised approach that leverages the fact that many image distributions are approximately scale-invariant, enabling the recovery of high-frequency information lost in the measurement process. We demonstrate throughout a series of experiments on real datasets that the proposed method outperforms other self-supervised approaches, and obtains performances on par with fully supervised learning.
CVOct 1, 2025
Equivariant Splitting: Self-supervised learning from incomplete dataVictor Sechaud, Jérémy Scanvic, Quentin Barthélemy et al.
Self-supervised learning for inverse problems allows to train a reconstruction network from noise and/or incomplete data alone. These methods have the potential of enabling learning-based solutions when obtaining ground-truth references for training is expensive or even impossible. In this paper, we propose a new self-supervised learning strategy devised for the challenging setting where measurements are observed via a single incomplete observation model. We introduce a new definition of equivariance in the context of reconstruction networks, and show that the combination of self-supervised splitting losses and equivariant reconstruction networks results in the same minimizer in expectation as the one of a supervised loss. Through a series of experiments on image inpainting, accelerated magnetic resonance imaging, and compressive sensing, we demonstrate that the proposed loss achieves state-of-the-art performance in settings with highly rank-deficient forward models.
IRSep 30, 2025
Self-supervised learning for phase retrievalVictor Sechaud, Patrice Abry, Laurent Jacques et al.
In recent years, deep neural networks have emerged as a solution for inverse imaging problems. These networks are generally trained using pairs of images: one degraded and the other of high quality, the latter being called 'ground truth'. However, in medical and scientific imaging, the lack of fully sampled data limits supervised learning. Recent advances have made it possible to reconstruct images from measurement data alone, eliminating the need for references. However, these methods remain limited to linear problems, excluding non-linear problems such as phase retrieval. We propose a self-supervised method that overcomes this limitation in the case of phase retrieval by using the natural invariance of images to translations.
MEJan 30, 2025
A spectral clustering-type algorithm for the consistent estimation of the Hurst distribution in moderately high dimensionsPatrice Abry, Gustavo Didier, Oliver Orejola et al.
Scale invariance (fractality) is a prominent feature of the large-scale behavior of many stochastic systems. In this work, we construct an algorithm for the statistical identification of the Hurst distribution (in particular, the scaling exponents) undergirding a high-dimensional fractal system. The algorithm is based on wavelet random matrices, modified spectral clustering and a model selection step for picking the value of the clustering precision hyperparameter. In a moderately high-dimensional regime where the dimension, the sample size and the scale go to infinity, we show that the algorithm consistently estimates the Hurst distribution. Monte Carlo simulations show that the proposed methodology is efficient for realistic sample sizes and outperforms another popular clustering method based on mixed-Gaussian modeling. We apply the algorithm in the analysis of real-world macroeconomic time series to unveil evidence for cointegration.
LGMar 17, 2021
Deep Learning-based Extreme Heatwave ForecastValérian Jacques-Dumas, Francesco Ragone, Pierre Borgnat et al.
Because of the impact of extreme heat waves and heat domes on society and biodiversity, their study is a key challenge. We specifically study long-lasting extreme heat waves, which are among the most important for climate impacts. Physics driven weather forecast systems or climate models can be used to forecast their occurrence or predict their probability. The present work explores the use of deep learning architectures, trained using outputs of a climate model, as an alternative strategy to forecast the occurrence of extreme long-lasting heatwaves. This new approach will be useful for several key scientific goals which include the study of climate model statistics, building a quantitative proxy for resampling rare events in climate models, study the impact of climate change, and should eventually be useful for forecasting. Fulfilling these important goals implies addressing issues such as class-size imbalance that is intrinsically associated with rare event prediction, assessing the potential benefits of transfer learning to address the nested nature of extreme events (naturally included in less extreme ones). We train a Convolutional Neural Network, using 1000 years of climate model outputs, with large-class undersampling and transfer learning. From the observed snapshots of the surface temperature and the 500 hPa geopotential height fields, the trained network achieves significant performance in forecasting the occurrence of long-lasting extreme heatwaves. We are able to predict them at three different levels of intensity, and as early as 15 days ahead of the start of the event (30 days ahead of the end of the event).
LGOct 30, 2020
Multiview Variational Graph Autoencoders for Canonical Correlation AnalysisYacouba Kaloga, Pierre Borgnat, Sundeep Prabhakar Chepuri et al.
We present a novel multiview canonical correlation analysis model based on a variational approach. This is the first nonlinear model that takes into account the available graph-based geometric constraints while being scalable for processing large scale datasets with multiple views. It is based on an autoencoder architecture with graph convolutional neural network layers. We experiment with our approach on classification, clustering, and recommendation tasks on real datasets. The algorithm is competitive with state-of-the-art multiview representation learning techniques.
MLApr 20, 2020
Automated data-driven selection of the hyperparameters for Total-Variation based texture segmentationBarbara Pascal, Samuel Vaiter, Nelly Pustelnik et al.
Penalized Least Squares are widely used in signal and image processing. Yet, it suffers from a major limitation since it requires fine-tuning of the regularization parameters. Under assumptions on the noise probability distribution, Stein-based approaches provide unbiased estimator of the quadratic risk. The Generalized Stein Unbiased Risk Estimator is revisited to handle correlated Gaussian noise without requiring to invert the covariance matrix. Then, in order to avoid expansive grid search, it is necessary to design algorithmic scheme minimizing the quadratic risk with respect to regularization parameters. This work extends the Stein's Unbiased GrAdient estimator of the Risk of Deledalle et al. to the case of correlated Gaussian noise, deriving a general automatic tuning of regularization parameters. First, the theoretical asymptotic unbiasedness of the gradient estimator is demonstrated in the case of general correlated Gaussian noise. Then, the proposed parameter selection strategy is particularized to fractal texture segmentation, where problem formulation naturally entails inter-scale and spatially correlated noise. Numerical assessment is provided, as well as discussion of the practical issues.
SIMar 11, 2019
$L^γ$-PageRank for Semi-Supervised LearningEsteban Bautista, Patrice Abry, Paulo Gonçalves
PageRank for Semi-Supervised Learning has shown to leverage data structures and limited tagged examples to yield meaningful classification. Despite successes, classification performance can still be improved, particularly in cases of fuzzy graphs or unbalanced labeled data. To address such limitations, a novel approach based on powers of the Laplacian matrix $L^γ$ ($γ> 0$), referred to as $L^γ$-PageRank, is proposed. Its theoretical study shows that it operates on signed graphs, where nodes belonging to one same class are more likely to share positive edges while nodes from different classes are more likely to be connected with negative edges. It is shown that by selecting an optimal $γ$, classification performance can be significantly enhanced. A procedure for the automated estimation of the optimal $γ$, from a unique observation of data, is devised and assessed. Experiments on several datasets demonstrate the effectiveness of both $L^γ$-PageRank classification and the optimal $γ$ estimation.
LGAug 27, 2016
Bayesian selection for the l2-Potts model regularization parameter: 1D piecewise constant signal denoisingJordan Frecon, Nelly Pustelnik, Nicolas Dobigeon et al.
Piecewise constant denoising can be solved either by deterministic optimization approaches, based on the Potts model, or by stochastic Bayesian procedures. The former lead to low computational time but require the selection of a regularization parameter, whose value significantly impacts the achieved solution, and whose automated selection remains an involved and challenging problem. Conversely, fully Bayesian formalisms encapsulate the regularization parameter selection into hierarchical models, at the price of high computational costs. This contribution proposes an operational strategy that combines hierarchical Bayesian and Potts model formulations, with the double aim of automatically tuning the regularization parameter and of maintaining computational effciency. The proposed procedure relies on formally connecting a Bayesian framework to a l2-Potts functional. Behaviors and performance for the proposed piecewise constant denoising and regularization parameter tuning techniques are studied qualitatively and assessed quantitatively, and shown to compare favorably against those of a fully Bayesian hierarchical procedure, both in accuracy and in computational load.
LGApr 22, 2015
On-the-fly Approximation of Multivariate Total Variation MinimizationJordan Frecon, Nelly Pustelnik, Patrice Abry et al.
In the context of change-point detection, addressed by Total Variation minimization strategies, an efficient on-the-fly algorithm has been designed leading to exact solutions for univariate data. In this contribution, an extension of such an on-the-fly strategy to multivariate data is investigated. The proposed algorithm relies on the local validation of the Karush-Kuhn-Tucker conditions on the dual problem. Showing that the non-local nature of the multivariate setting precludes to obtain an exact on-the-fly solution, we devise an on-the-fly algorithm delivering an approximate solution, whose quality is controlled by a practitioner-tunable parameter, acting as a trade-off between quality and computational cost. Performance assessment shows that high quality solutions are obtained on-the-fly while benefiting of computational costs several orders of magnitude lower than standard iterative procedures. The proposed algorithm thus provides practitioners with an efficient multivariate change-point detection on-the-fly procedure.
CVApr 22, 2015
Combining local regularity estimation and total variation optimization for scale-free texture segmentationNelly Pustelnik, Herwig Wendt, Patrice Abry et al.
Texture segmentation constitutes a standard image processing task, crucial to many applications. The present contribution focuses on the particular subset of scale-free textures and its originality resides in the combination of three key ingredients: First, texture characterization relies on the concept of local regularity ; Second, estimation of local regularity is based on new multiscale quantities referred to as wavelet leaders ; Third, segmentation from local regularity faces a fundamental bias variance trade-off: In nature, local regularity estimation shows high variability that impairs the detection of changes, while a posteriori smoothing of regularity estimates precludes from locating correctly changes. Instead, the present contribution proposes several variational problem formulations based on total variation and proximal resolutions that effectively circumvent this trade-off. Estimation and segmentation performance for the proposed procedures are quantified and compared on synthetic as well as on real-world textures.
DATA-ANOct 17, 2014
Bayesian estimation of the multifractality parameter for image texture using a Whittle approximationSébastien Combrexelle, Herwig Wendt, Nicolas Dobigeon et al.
Texture characterization is a central element in many image processing applications. Multifractal analysis is a useful signal and image processing tool, yet, the accurate estimation of multifractal parameters for image texture remains a challenge. This is due in the main to the fact that current estimation procedures consist of performing linear regressions across frequency scales of the two-dimensional (2D) dyadic wavelet transform, for which only a few such scales are computable for images. The strongly non-Gaussian nature of multifractal processes, combined with their complicated dependence structure, makes it difficult to develop suitable models for parameter estimation. Here, we propose a Bayesian procedure that addresses the difficulties in the estimation of the multifractality parameter. The originality of the procedure is threefold: The construction of a generic semi-parametric statistical model for the logarithm of wavelet leaders; the formulation of Bayesian estimators that are associated with this model and the set of parameter values admitted by multifractal theory; the exploitation of a suitable Whittle approximation within the Bayesian model which enables the otherwise infeasible evaluation of the posterior distribution associated with the model. Performance is assessed numerically for several 2D multifractal processes, for several image sizes and a large range of process parameters. The procedure yields significant benefits over current benchmark estimators in terms of estimation performance and ability to discriminate between the two most commonly used classes of multifractal process models. The gains in performance are particularly pronounced for small image sizes, notably enabling for the first time the analysis of image patches as small as 64x64 pixels.