Erwan Allys

LG
4papers
28citations
Novelty53%
AI Score39

4 Papers

DATA-ANJun 29, 2023
Scattering Spectra Models for Physics

Sihao Cheng, Rudy Morel, Erwan Allys et al.

Physicists routinely need probabilistic models for a number of tasks such as parameter inference or the generation of new realizations of a field. Establishing such models for highly non-Gaussian fields is a challenge, especially when the number of samples is limited. In this paper, we introduce scattering spectra models for stationary fields and we show that they provide accurate and robust statistical descriptions of a wide range of fields encountered in physics. These models are based on covariances of scattering coefficients, i.e. wavelet decomposition of a field coupled with a point-wise modulus. After introducing useful dimension reductions taking advantage of the regularity of a field under rotation and scaling, we validate these models on various multi-scale physical fields and demonstrate that they reproduce standard statistics, including spatial moments up to 4th order. These scattering spectra provide us with a low-dimensional structured representation that captures key properties encountered in a wide range of physical fields. These generic models can be used for data exploration, classification, parameter inference, symmetry detection, and component separation.

LGJan 27, 2023
Unearthing InSights into Mars: Unsupervised Source Separation with Limited Data

Ali Siahkoohi, Rudy Morel, Maarten V. de Hoop et al.

Source separation involves the ill-posed problem of retrieving a set of source signals that have been observed through a mixing operator. Solving this problem requires prior knowledge, which is commonly incorporated by imposing regularity conditions on the source signals, or implicitly learned through supervised or unsupervised methods from existing data. While data-driven methods have shown great promise in source separation, they often require large amounts of data, which rarely exists in planetary space missions. To address this challenge, we propose an unsupervised source separation scheme for domains with limited data access that involves solving an optimization problem in the wavelet scattering covariance representation space$\unicode{x2014}$an interpretable, low-dimensional representation of stationary processes. We present a real-data example in which we remove transient, thermally-induced microtilts$\unicode{x2014}$known as glitches$\unicode{x2014}$from data recorded by a seismometer during NASA's InSight mission on Mars. Thanks to the wavelet scattering covariances' ability to capture non-Gaussian properties of stochastic processes, we are able to separate glitches using only a few glitch-free data snippets.

79.3AIMar 18
Competing with AI Scientists: Agent-Driven Approach to Astrophysics Research

Thomas Borrett, Licong Xu, Andy Nilipour et al.

We present an agent-driven approach to the construction of parameter inference pipelines for scientific data analysis. Our method leverages a multi-agent system, Cmbagent (the analysis system of the AI scientist Denario), in which specialized agents collaborate to generate research ideas, write and execute code, evaluate results, and iteratively refine the overall pipeline. As a case study, we apply this approach to the FAIR Universe Weak Lensing Uncertainty Challenge, a competition under time constraints focused on robust cosmological parameter inference with realistic observational uncertainties. While the fully autonomous exploration initially did not reach expert-level performance, the integration of human intervention enabled our agent-driven workflow to achieve a first-place result in the challenge. This demonstrates that semi-autonomous agentic systems can compete with, and in some cases surpass, expert solutions. We describe our workflow in detail, including both the autonomous and semi-autonomous exploration by Cmbagent. Our final inference pipeline utilizes parameter-efficient convolutional neural networks, likelihood calibration over a known parameter grid, and multiple regularization techniques. Our results suggest that agent-driven research workflows can provide a scalable framework to rapidly explore and construct pipelines for inference problems.

LGMay 25, 2023
Multi-scale clustering and source separation of InSight mission seismic data

Ali Siahkoohi, Rudy Morel, Randall Balestriero et al.

Unsupervised source separation involves unraveling an unknown set of source signals recorded through a mixing operator, with limited prior knowledge about the sources, and only access to a dataset of signal mixtures. This problem is inherently ill-posed and is further challenged by the variety of timescales exhibited by sources in time series data from planetary space missions. As such, a systematic multi-scale unsupervised approach is needed to identify and separate sources at different timescales. Existing methods typically rely on a preselected window size that determines their operating timescale, limiting their capacity to handle multi-scale sources. To address this issue, we propose an unsupervised multi-scale clustering and source separation framework by leveraging wavelet scattering spectra that provide a low-dimensional representation of stochastic processes, capable of distinguishing between different non-Gaussian stochastic processes. Nested within this representation space, we develop a factorial variational autoencoder that is trained to probabilistically cluster sources at different timescales. To perform source separation, we use samples from clusters at multiple timescales obtained via the factorial variational autoencoder as prior information and formulate an optimization problem in the wavelet scattering spectra representation space. When applied to the entire seismic dataset recorded during the NASA InSight mission on Mars, containing sources varying greatly in timescale, our approach disentangles such different sources, e.g., minute-long transient one-sided pulses (known as "glitches") and structured ambient noises resulting from atmospheric activities that typically last for tens of minutes, and provides an opportunity to conduct further investigations into the isolated sources.