Cora Dvorkin

h-index101

7papers

86citations

Novelty41%

AI Score33

Ranked #132,533 of 205,806 authors (top 64%)#61 in CO (top 40%)

7 Papers

HEP-PHMar 15, 2022

Machine Learning and Cosmology

Cora Dvorkin, Siddharth Mishra-Sharma, Brian Nord et al.

Methods based on machine learning have recently made substantial inroads in many corners of cosmology. Through this process, new computational tools, new perspectives on data collection, model development, analysis, and discovery, as well as new communities and educational pathways have emerged. Despite rapid progress, substantial potential at the intersection of cosmology and machine learning remains untapped. In this white paper, we summarize current and ongoing developments relating to the application of machine learning within cosmology and provide a set of recommendations aimed at maximizing the scientific impact of these burgeoning tools over the coming decade through both technical development as well as the fostering of emerging communities.

COAug 29, 2022

Inferring subhalo effective density slopes from strong lensing observations with neural likelihood-ratio estimation

Gemma Zhang, Siddharth Mishra-Sharma, Cora Dvorkin

Strong gravitational lensing has emerged as a promising approach for probing dark matter models on sub-galactic scales. Recent work has proposed the subhalo effective density slope as a more reliable observable than the commonly used subhalo mass function. The subhalo effective density slope is a measurement independent of assumptions about the underlying density profile and can be inferred for individual subhalos through traditional sampling methods. To go beyond individual subhalo measurements, we leverage recent advances in machine learning and introduce a neural likelihood-ratio estimator to infer an effective density slope for populations of subhalos. We demonstrate that our method is capable of harnessing the statistical power of multiple subhalos (within and across multiple images) to distinguish between characteristics of different subhalo populations. The computational efficiency warranted by the neural likelihood-ratio estimator over traditional sampling enables statistical studies of dark matter perturbers and is particularly useful as we expect an influx of strong lensing systems from upcoming surveys.

COAug 18, 2023

Data Compression and Inference in Cosmology with Self-Supervised Machine Learning

Aizhan Akhmetzhanova, Siddharth Mishra-Sharma, Cora Dvorkin

The influx of massive amounts of data from current and upcoming cosmological surveys necessitates compression schemes that can efficiently summarize the data with minimal loss of information. We introduce a method that leverages the paradigm of self-supervised machine learning in a novel manner to construct representative summaries of massive datasets using simulation-based augmentations. Deploying the method on hydrodynamical cosmological simulations, we show that it can deliver highly informative summaries, which can be used for a variety of downstream tasks, including precise and accurate parameter inference. We demonstrate how this paradigm can be used to construct summary representations that are insensitive to prescribed systematic effects, such as the influence of baryonic physics. Our results indicate that self-supervised machine learning techniques offer a promising new approach for compression of cosmological data as well its analysis.

AISep 2, 2025

The Future of Artificial Intelligence and the Mathematical and Physical Sciences (AI+MPS)

Andrew Ferguson, Marisa LaFleur, Lars Ruthotto et al. · stanford

This community paper developed out of the NSF Workshop on the Future of Artificial Intelligence (AI) and the Mathematical and Physics Sciences (MPS), which was held in March 2025 with the goal of understanding how the MPS domains (Astronomy, Chemistry, Materials Research, Mathematical Sciences, and Physics) can best capitalize on, and contribute to, the future of AI. We present here a summary and snapshot of the MPS community's perspective, as of Spring/Summer 2025, in a rapidly developing field. The link between AI and MPS is becoming increasingly inextricable; now is a crucial moment to strengthen the link between AI and Science by pursuing a strategy that proactively and thoughtfully leverages the potential of AI for scientific discovery and optimizes opportunities to impact the development of AI by applying concepts from fundamental science. To achieve this, we propose activities and strategic priorities that: (1) enable AI+MPS research in both directions; (2) build up an interdisciplinary community of AI+MPS researchers; and (3) foster education and workforce development in AI for MPS researchers and students. We conclude with a summary of suggested priorities for funding agencies, educational institutions, and individual researchers to help position the MPS community to be a leader in, and take full advantage of, the transformative potential of AI+MPS.

COSep 14, 2020

Extracting the Subhalo Mass Function from Strong Lens Images with Image Segmentation

Bryan Ostdiek, Ana Diaz Rivero, Cora Dvorkin

Detecting substructure within strongly lensed images is a promising route to shed light on the nature of dark matter. However, it is a challenging task, which traditionally requires detailed lens modeling and source reconstruction, taking weeks to analyze each system. We use machine-learning to circumvent the need for lens and source modeling and develop a neural network to both locate subhalos in an image as well as determine their mass using the technique of image segmentation. The network is trained on images with a single subhalo located near the Einstein ring across a wide range of apparent source magnitudes. The network is then able to resolve subhalos with masses $m\gtrsim 10^{8.5} M_{\odot}$. Training in this way allows the network to learn the gravitational lensing of light, and remarkably, it is then able to detect entire populations of substructure, even for locations further away from the Einstein ring than those used in training. Over a wide range of the apparent source magnitude, the false-positive rate is around three false subhalos per 100 images, coming mostly from the lightest detectable subhalo for that signal-to-noise ratio. With good accuracy and a low false-positive rate, counting the number of pixels assigned to each subhalo class over multiple images allows for a measurement of the subhalo mass function (SMF). When measured over three mass bins from $10^9M_{\odot}$--$10^{10} M_{\odot}$ the SMF slope is recovered with an error of 36% for 50 images, and this improves to 10% for 1000 images with Hubble Space Telescope-like noise.

COJul 10, 2020

Flow-Based Likelihoods for Non-Gaussian Inference

Ana Diaz Rivero, Cora Dvorkin

We investigate the use of data-driven likelihoods to bypass a key assumption made in many scientific analyses, which is that the true likelihood of the data is Gaussian. In particular, we suggest using the optimization targets of flow-based generative models, a class of models that can capture complex distributions by transforming a simple base distribution through layers of nonlinearities. We call these flow-based likelihoods (FBL). We analyze the accuracy and precision of the reconstructed likelihoods on mock Gaussian data, and show that simply gauging the quality of samples drawn from the trained model is not a sufficient indicator that the true likelihood has been learned. We nevertheless demonstrate that the likelihood can be reconstructed to a precision equal to that of sampling error due to a finite sample size. We then apply FBLs to mock weak lensing convergence power spectra, a cosmological observable that is significantly non-Gaussian (NG). We find that the FBL captures the NG signatures in the data extremely well, while other commonly used data-driven likelihoods, such as Gaussian mixture models and independent component analysis, fail to do so. This suggests that works that have found small posterior shifts in NG data with data-driven likelihoods such as these could be underestimating the impact of non-Gaussianity in parameter constraints. By introducing a suite of tests that can capture different levels of NG in the data, we show that the success or failure of traditional data-driven likelihoods can be tied back to the structure of the NG in the data. Unlike other methods, the flexibility of the FBL makes it successful at tackling different types of NG simultaneously. Because of this, and consequently their likely applicability across datasets and domains, we encourage their use for inference when sufficient mock data are available for training.

COOct 17, 2019

A Novel CMB Component Separation Method: Hierarchical Generalized Morphological Component Analysis

Sebastian Wagner-Carena, Max Hopkins, Ana Diaz Rivero et al.

We present a novel technique for Cosmic Microwave Background (CMB) foreground subtraction based on the framework of blind source separation. Inspired by previous work incorporating local variation to Generalized Morphological Component Analysis (GMCA), we introduce Hierarchical GMCA (HGMCA), a Bayesian hierarchical graphical model for source separation. We test our method on $N_{\rm side}=256$ simulated sky maps that include dust, synchrotron, free-free and anomalous microwave emission, and show that HGMCA reduces foreground contamination by $25\%$ over GMCA in both the regions included and excluded by the Planck UT78 mask, decreases the error in the measurement of the CMB temperature power spectrum to the $0.02-0.03\%$ level at $\ell>200$ (and $<0.26\%$ for all $\ell$), and reduces correlation to all the foregrounds. We find equivalent or improved performance when compared to state-of-the-art Internal Linear Combination (ILC)-type algorithms on these simulations, suggesting that HGMCA may be a competitive alternative to foreground separation techniques previously applied to observed CMB data. Additionally, we show that our performance does not suffer when we perturb model parameters or alter the CMB realization, which suggests that our algorithm generalizes well beyond our simplified simulations. Our results open a new avenue for constructing CMB maps through Bayesian hierarchical analysis.