COAug 4, 2023
Learning from Topology: Cosmological Parameter Estimation from the Large-scale StructureJacky H. T. Yip, Adam Rouhiainen, Gary Shiu
The topology of the large-scale structure of the universe contains valuable information on the underlying cosmological parameters. While persistent homology can extract this topological information, the optimal method for parameter estimation from the tool remains an open question. To address this, we propose a neural network model to map persistence images to cosmological parameters. Through a parameter recovery test, we demonstrate that our model makes accurate and precise estimates, considerably outperforming conventional Bayesian inference approaches.
HEP-THJul 4, 2025
Transforming Calabi-Yau Constructions: Generating New Calabi-Yau Manifolds with TransformersJacky H. T. Yip, Charles Arnal, François Charton et al.
Fine, regular, and star triangulations (FRSTs) of four-dimensional reflexive polytopes give rise to toric varieties, within which generic anticanonical hypersurfaces yield smooth Calabi-Yau threefolds. We introduce CYTransformer, a deep learning model based on the transformer architecture, to automate the generation of FRSTs. We demonstrate that CYTransformer efficiently and unbiasedly samples FRSTs for polytopes across a range of sizes, and can self-improve through retraining on its own output. These results lay the foundation for AICY: a community-driven platform designed to combine self-improving machine learning models with a continuously expanding database to explore and catalog the Calabi-Yau landscape.
CODec 19, 2024
Cosmology with Persistent Homology: Parameter Inference via Machine LearningJuan Calles, Jacky H. T. Yip, Gabriella Contardo et al.
Building upon [2308.02636], we investigate the constraining power of persistent homology on cosmological parameters and primordial non-Gaussianity in a likelihood-free inference pipeline utilizing machine learning. We evaluate the ability of Persistence Images (PIs) to infer parameters, comparing them to the combined Power Spectrum and Bispectrum (PS/BS). We also compare two classes of models: neural-based and tree-based. PIs consistently lead to better predictions compared to the combined PS/BS for parameters that can be constrained, i.e., for $\{Ω_{\rm m}, σ_8, n_{\rm s}, f_{\rm NL}^{\rm loc}\}$. PIs perform particularly well for $f_{\rm NL}^{\rm loc}$, highlighting the potential of persistent homology for constraining primordial non-Gaussianity. Our results indicate that combining PIs with PS/BS provides only marginal gains, indicating that the PS/BS contains little additional or complementary information to the PIs. Finally, we provide a visualization of the most important topological features for $f_{\rm NL}^{\rm loc}$ and for $Ω_{\rm m}$. This reveals that clusters and voids (0-cycles and 2-cycles) are most informative for $Ω_{\rm m}$, while $f_{\rm NL}^{\rm loc}$ is additionally informed by filaments (1-cycles).
COOct 17, 2019
From Dark Matter to Galaxies with Convolutional Neural NetworksJacky H. T. Yip, Xinyue Zhang, Yanfang Wang et al.
Cosmological simulations play an important role in the interpretation of astronomical data, in particular in comparing observed data to our theoretical expectations. However, to compare data with these simulations, the simulations in principle need to include gravity, magneto-hydrodyanmics, radiative transfer, etc. These ideal large-volume simulations (gravo-magneto-hydrodynamical) are incredibly computationally expensive which can cost tens of millions of CPU hours to run. In this paper, we propose a deep learning approach to map from the dark-matter-only simulation (computationally cheaper) to the galaxy distribution (from the much costlier cosmological simulation). The main challenge of this task is the high sparsity in the target galaxy distribution: space is mainly empty. We propose a cascade architecture composed of a classification filter followed by a regression procedure. We show that our result outperforms a state-of-the-art model used in the astronomical community, and provides a good trade-off between computational cost and prediction accuracy.