90.5IMMay 18Code
Hyrax: An Extensible Framework for Rapid ML Experimentation and Unsupervised Discovery in the Era of Rubin, Roman, and EuclidAritra Ghosh, Drew Oldag, Michael Tauraso et al.
The NSF-DOE Vera C. Rubin Observatory, Roman Space Telescope, Euclid, and other next-generation surveys will deliver imaging, spectroscopic, and time-domain data at scales that increasingly shift the bottleneck in astronomical machine learning (ML) projects from model design to infrastructure. We present Hyrax, an open-source, modular, GPU-enabled Python framework that supports the full ML lifecycle in astronomy: from data acquisition and training to inference and experiment comparison, with capabilities including multimodal dataset support, integrated vector databases for similarity search, and interactive two- and three-dimensional latent-space exploration for unsupervised discovery. We demonstrate Hyrax's versatility through five representative applications on real survey data: (i) unsupervised representation learning on $\sim 4\times10^5$ Rubin Legacy Survey of Space and Time (LSST) Data Preview 1 (DP1) galaxies, surfacing new merger and low-surface-brightness candidates missing from reference Euclid and Dark Energy Survey catalogs, while also isolating imaging artifacts -- all without labeled training data; (ii) hybrid density-based clustering for identifying cluster-scale gravitational lens candidates in DP1 data; (iii) multimodal early-time transient classification in the Zwicky Transient Facility leveraging light curves, spectra, images, and metadata; (iv) supervised false-positive filtering in shift-and-stack searches for distant solar system objects in the Dark Energy Camera Ecliptic Exploration Project survey; and (v) supervised detection of semi-resolved dwarf galaxies in Hyper Suprime-Cam and LSST-like imaging using synthetic source injection. Together, these results demonstrate that Hyrax provides astronomy-specific ML infrastructure that enables systematic discovery and rapid methodological iteration across next-generation astronomical surveys.
20.0EPMay 7
You Only Stack Once (YOSO): A Motion-Filtered, Deep-Learning Framework for Detecting Faint Moving SourcesNitya Pandey, César Fuentes, Pedro Bernardinelli et al.
We present You Only Stack Once (YOSO), an automated pipeline designed to detect faint, slow-moving Solar System objects in wide-field astronomical surveys. The pipeline integrates a novel Gaussian Motion Filter (GMoF) that operates at the pixel level to enhance signal-to-noise for objects exhibiting a range of apparent rates of motion. Unlike conventional shift-and-stack methods, which rely on discrete velocity trials, GMoF amplifies trails while suppressing random noise and static background features. Applied to a subset of DEEP observations from the Dark Energy Camera, YOSO recovered 45 out of 73 previously detected objects, as well as 11 new TNOs. It also discovered 216 objects in the near Solar System. Although alternative shift-and-stack methods are sensitive to objects about 0.88 magnitudes fainter, YOSO's false positive rate is extremely low, since it detects only sources that exhibit a trail and are consistent with a point source when shifted at the right rate. We show how this method can be deployed on large surveys like LSST, and adapted for other domains that require motion-based signal enhancement, including exoplanet imaging through Angular Differential Imaging (ADI), and near-Earth object (NEO) detection for missions like NEO Surveyor. YOSO thus provides a versatile, scalable approach for extracting faint, motion-dependent signals in the era of data-intensive astronomy.
CVNov 26, 2024
Selfish Evolution: Making Discoveries in Extreme Label Noise with the Help of Overfitting DynamicsNima Sedaghat, Tanawan Chatchadanoraset, Colin Orion Chandler et al.
Motivated by the scarcity of proper labels in an astrophysical application, we have developed a novel technique, called Selfish Evolution, which allows for the detection and correction of corrupted labels in a weakly supervised fashion. Unlike methods based on early stopping, we let the model train on the noisy dataset. Only then do we intervene and allow the model to overfit to individual samples. The ``evolution'' of the model during this process reveals patterns with enough information about the noisiness of the label, as well as its correct version. We train a secondary network on these spatiotemporal ``evolution cubes'' to correct potentially corrupted labels. We incorporate the technique in a closed-loop fashion, allowing for automatic convergence towards a mostly clean dataset, without presumptions about the state of the network in which we intervene. We evaluate on the main task of the Supernova-hunting dataset but also demonstrate efficiency on the more standard MNIST dataset.