Maven: A Multimodal Foundation Model for Supernova Science
This addresses the data imbalance in time-domain astrophysics for astronomers, enabling better analysis of supernovae with multimodal datasets, though it is incremental as it applies existing contrastive learning methods to this domain.
The paper tackles the problem of analyzing supernovae with limited high-quality observations by developing Maven, a multimodal foundation model that aligns photometric and spectroscopic data using contrastive learning, achieving state-of-the-art performance on classification and redshift estimation with 4,702 observed supernovae.
A common setting in astronomy is the availability of a small number of high-quality observations, and larger amounts of either lower-quality observations or synthetic data from simplified models. Time-domain astrophysics is a canonical example of this imbalance, with the number of supernovae observed photometrically outpacing the number observed spectroscopically by multiple orders of magnitude. At the same time, no data-driven models exist to understand these photometric and spectroscopic observables in a common context. Contrastive learning objectives, which have grown in popularity for aligning distinct data modalities in a shared embedding space, provide a potential solution to extract information from these modalities. We present Maven, the first foundation model for supernova science. To construct Maven, we first pre-train our model to align photometry and spectroscopy from 0.5M synthetic supernovae using a constrastive objective. We then fine-tune the model on 4,702 observed supernovae from the Zwicky Transient Facility. Maven reaches state-of-the-art performance on both classification and redshift estimation, despite the embeddings not being explicitly optimized for these tasks. Through ablation studies, we show that pre-training with synthetic data improves overall performance. In the upcoming era of the Vera C. Rubin Observatory, Maven serves as a Rosetta Stone for leveraging large, unlabeled and multimodal time-domain datasets.