Riccardo Finotello

HEP-TH
9papers
105citations
Novelty42%
AI Score45

9 Papers

59.7DATA-ANApr 15
Functional Renormalization for Signal Detection: Dimensional Analysis and Dimensional Phase Transition for Nearly Continuous Spectra Effective Field Theory

Riccardo Finotello, Vincent Lahoche, Dine Ousmane Samary

Signal detection in high dimensions is a critical challenge in data science. While standard methods based on random matrix theory provide sharp detection thresholds for finite-rank perturbations, such as the known Baik-Ben Arous-Péché (BBP) transition, they are often insufficient for realistic data exhibiting nearly continuous (extensive-rank) signal distributions that merge with the noise bulk. In this regime, typically associated with real-world scenarios such as images for computer vision tasks, the signal does not manifest as a clear outlier but as a deformation of the spectral density's geometry. We use the functional renormalisation group (FRG) framework to probe these subtle spectral deformations. Treating the empirical spectrum as an effective field theory, we define a scale-dependent "canonical dimension" that acts as a sensitive order parameter for the spectral geometry. We show that this dimension undergoes a sharp crossover, interpreted as a "dimensional phase transition", at signal-to-noise ratios significantly lower than the standard BBP threshold. This dimensional instability is shown to correlate with a spontaneous symmetry breaking in the effective potential and a deviation of eigenvector statistics from the universal Porter-Thomas distribution, confirming the consistency of the method. Such behaviour aligns with recent theoretical results on the "extensive spike model", where signal information persists inside the noise bulk before any spectral gap opens. We validate our approach on realistic datasets, demonstrating that the FRG flow consistently detects the onset of this bulk deformation. Finally, we explore a formalisation of this methodology for analysing nearly continuous spectra, proposing a heuristic criterion for signal detection and a method to estimate the number of independent noise components based on the stability of these canonical dimensions.

APP-PHOct 7, 2022
Trustworthiness of Laser-Induced Breakdown Spectroscopy Predictions via Simulation-based Synthetic Data Augmentation and Multitask Learning

Riccardo Finotello, Daniel L'Hermite, Celine Quéré et al.

We consider quantitative analyses of spectral data using laser-induced breakdown spectroscopy. We address the small size of training data available, and the validation of the predictions during inference on unknown data. For the purpose, we build robust calibration models using deep convolutional multitask learning architectures to predict the concentration of the analyte, alongside additional spectral information as auxiliary outputs. These secondary predictions can be used to validate the trustworthiness of the model by taking advantage of the mutual dependencies of the parameters of the multitask neural networks. Due to the experimental lack of training samples, we introduce a simulation-based data augmentation process to synthesise an arbitrary number of spectra, statistically representative of the experimental data. Given the nature of the deep learning model, no dimensionality reduction or data selection processes are required. The procedure is an end-to-end pipeline including the process of synthetic data augmentation, the construction of a suitable robust, homoscedastic, deep learning model, and the validation of its predictions. In the article, we compare the performance of the multitask model with traditional univariate and multivariate analyses, to highlight the separate contributions of each element introduced in the process.

HEP-THNov 20, 2023
Deep learning complete intersection Calabi-Yau manifolds

Harold Erbin, Riccardo Finotello

We review advancements in deep learning techniques for complete intersection Calabi-Yau (CICY) 3- and 4-folds, with the aim of understanding better how to handle algebraic topological data with machine learning. We first discuss methodological aspects and data analysis, before describing neural networks architectures. Then, we describe the state-of-the art accuracy in predicting Hodge numbers. We include new results on extrapolating predictions from low to high Hodge numbers, and conversely.

LGFeb 25
Learning Complex Physical Regimes via Coverage-oriented Uncertainty Quantification: An application to the Critical Heat Flux

Michele Cazzola, Alberto Ghione, Lucia Sargentini et al.

A central challenge in scientific machine learning (ML) is the correct representation of physical systems governed by multi-regime behaviours. In these scenarios, standard data analysis techniques often fail to capture the nature of the data, as the system's response varies significantly across the state space due to its stochasticity and the different physical regimes. Uncertainty quantification (UQ) should thus not be viewed merely as a safety assessment, but as a support to the learning task itself, guiding the model to internalise the behaviour of the data. We address this by focusing on the Critical Heat Flux (CHF) benchmark and dataset presented by the OECD/NEA Expert Group on Reactor Systems Multi-Physics. This case study represents a test for scientific ML due to the non-linear dependence of CHF on the inputs and the existence of distinct microscopic physical regimes. These regimes exhibit diverse statistical profiles, a complexity that requires UQ techniques to internalise the data behaviour and ensure reliable predictions. In this work, we conduct a comparative analysis of UQ methodologies to determine their impact on physical representation. We contrast post-hoc methods, specifically conformal prediction, against end-to-end coverage-oriented pipelines, including (Bayesian) heteroscedastic regression and quality-driven losses. These approaches treat uncertainty not as a final metric, but as an active component of the optimisation process, modelling the prediction and its behaviour simultaneously. We show that while post-hoc methods ensure statistical calibration, coverage-oriented learning effectively reshapes the model's representation to match the complex physical regimes. The result is a model that delivers not only high predictive accuracy but also a physically consistent uncertainty estimation that adapts dynamically to the intrinsic variability of the CHF.

81.6STAT-MECHMay 11
Field Theory of Data: Anomaly Detection via the Functional Renormalization Group. The 2D Ising Model as a Benchmark

Riccardo Finotello, Vincent Lahoche, Parham Radpay et al.

We establish a correspondence between anomaly detection in high-noise regimes and the renormalization group flow of non-equilibrium field theories. We provide a physical grounding for this framework by proving that the detection of phase transitions in interacting non-equilibrium systems maps to the study of an effective equilibrium field theory near its Gaussian fixed point, which we identify with the universal Marchenko-Pastur distribution. Applying the Functional Renormalization Group to the two-dimensional Model A, we demonstrate that the noise-to-signal ratio acts as a physical temperature, where the signal emerges as ordered domains within a thermalized background of fluctuations. Using the exact Onsager solution as a benchmark, we show that this approach identifies critical thresholds with an error below 4%, significantly outperforming standard information-theoretic metrics such as the Kullback-Leibler divergence. Our results provide a universal strategy for resolving structures in complex datasets near criticality, bridging the gap between statistical mechanics and statistical inference.

APP-PHNov 30, 2021
HyperPCA: a Powerful Tool to Extract Elemental Maps from Noisy Data Obtained in LIBS Mapping of Materials

Riccardo Finotello, Mohamed Tamaazousti, Jean-Baptiste Sirven

Laser-induced breakdown spectroscopy is a preferred technique for fast and direct multi-elemental mapping of samples under ambient pressure, without any limitation on the targeted element. However, LIBS mapping data have two peculiarities: an intrinsically low signal-to-noise ratio due to single-shot measurements, and a high dimensionality due to the high number of spectra acquired for imaging. This is all the truer as lateral resolution gets higher: in this case, the ablation spot diameter is reduced, as well as the ablated mass and the emission signal, while the number of spectra for a given surface increases. Therefore, efficient extraction of physico-chemical information from a noisy and large dataset is a major issue. Multivariate approaches were introduced by several authors as a means to cope with such data, particularly Principal Component Analysis. This technique is useful to analyse correlations between different elements, but it is limited to low signal-to-noise ratios. In this paper, we introduce HyperPCA, a new analysis tool for hyperspectral images based on a sparse representation of the data using Discrete Wavelet Transform and kernel-based sparse PCA to reduce the impact of noise on the data and to consistently extract the spectroscopic signal, with a particular emphasis on LIBS data. The method is first illustrated using simulated LIBS mapping datasets to emphasise its performances with an extremely low shot-to-shot signal-to-noise ratio, and with a variable degree of spectral interference. Comparisons to standard PCA and to traditional univariate data analyses are provided. Finally, it is used to process real data in two cases that clearly illustrate the potential of the proposed algorithm. We show that the method presents advantages both in quantity and quality of the information recovered, thus improving the physico-chemical characterization of analysed surfaces.

HEP-THAug 4, 2021
Deep multi-task mining Calabi-Yau four-folds

Harold Erbin, Riccardo Finotello, Robin Schneider et al.

We continue earlier efforts in computing the dimensions of tangent space cohomologies of Calabi-Yau manifolds using deep learning. In this paper, we consider the dataset of all Calabi-Yau four-folds constructed as complete intersections in products of projective spaces. Employing neural networks inspired by state-of-the-art computer vision architectures, we improve earlier benchmarks and demonstrate that all four non-trivial Hodge numbers can be learned at the same time using a multi-task architecture. With 30% (80%) training ratio, we reach an accuracy of 100% for $h^{(1,1)}$ and 97% for $h^{(2,1)}$ (100% for both), 81% (96%) for $h^{(3,1)}$, and 49% (83%) for $h^{(2,2)}$. Assuming that the Euler number is known, as it is easy to compute, and taking into account the linear constraint arising from index computations, we get 100% total accuracy.

HEP-THJul 30, 2020
Machine learning for complete intersection Calabi-Yau manifolds: a methodological study

Harold Erbin, Riccardo Finotello

We revisit the question of predicting both Hodge numbers $h^{1,1}$ and $h^{2,1}$ of complete intersection Calabi-Yau (CICY) 3-folds using machine learning (ML), considering both the old and new datasets built respectively by Candelas-Dale-Lutken-Schimmrigk / Green-Hübsch-Lutken and by Anderson-Gao-Gray-Lee. In real world applications, implementing a ML system rarely reduces to feed the brute data to the algorithm. Instead, the typical workflow starts with an exploratory data analysis (EDA) which aims at understanding better the input data and finding an optimal representation. It is followed by the design of a validation procedure and a baseline model. Finally, several ML models are compared and combined, often involving neural networks with a topology more complicated than the sequential models typically used in physics. By following this procedure, we improve the accuracy of ML computations for Hodge numbers with respect to the existing literature. First, we obtain 97% (resp. 99%) accuracy for $h^{1,1}$ using a neural network inspired by the Inception model for the old dataset, using only 30% (resp. 70%) of the data for training. For the new one, a simple linear regression leads to almost 100% accuracy with 30% of the data for training. The computation of $h^{2,1}$ is less successful as we manage to reach only 50% accuracy for both datasets, but this is still better than the 16% obtained with a simple neural network (SVM with Gaussian kernel and feature engineering and sequential convolutional network reach at best 36%). This serves as a proof of concept that neural networks can be valuable to study the properties of geometries appearing in string theory.

HEP-THJul 27, 2020
Inception Neural Network for Complete Intersection Calabi-Yau 3-folds

Harold Erbin, Riccardo Finotello

We introduce a neural network inspired by Google's Inception model to compute the Hodge number $h^{1,1}$ of complete intersection Calabi-Yau (CICY) 3-folds. This architecture improves largely the accuracy of the predictions over existing results, giving already 97% of accuracy with just 30% of the data for training. Moreover, accuracy climbs to 99% when using 80% of the data for training. This proves that neural networks are a valuable resource to study geometric aspects in both pure mathematics and string theory.