Matthew J. Graham

IM
4papers
67citations
Novelty41%
AI Score37

4 Papers

IMDec 12, 2025
Pre-training vision models for the classification of alerts from wide-field time-domain surveys

Nabeel Rehemtulla, Adam A. Miller, Mike Walmsley et al.

Modern wide-field time-domain surveys facilitate the study of transient, variable and moving phenomena by conducting image differencing and relaying alerts to their communities. Machine learning tools have been used on data from these surveys and their precursors for more than a decade, and convolutional neural networks (CNNs), which make predictions directly from input images, saw particularly broad adoption through the 2010s. Since then, continually rapid advances in computer vision have transformed the standard practices around using such models. It is now commonplace to use standardized architectures pre-trained on large corpora of everyday images (e.g., ImageNet). In contrast, time-domain astronomy studies still typically design custom CNN architectures and train them from scratch. Here, we explore the affects of adopting various pre-training regimens and standardized model architectures on the performance of alert classification. We find that the resulting models match or outperform a custom, specialized CNN like what is typically used for filtering alerts. Moreover, our results show that pre-training on galaxy images from Galaxy Zoo tends to yield better performance than pre-training on ImageNet or training from scratch. We observe that the design of standardized architectures are much better optimized than the custom CNN baseline, requiring significantly less time and memory for inference despite having more trainable parameters. On the eve of the Legacy Survey of Space and Time and other image-differencing surveys, these findings advocate for a paradigm shift in the creation of vision models for alerts, demonstrating that greater performance and efficiency, in time and in data, can be achieved by adopting the latest practices from the computer vision field.

IMFeb 26, 2021Code
Tails: Chasing Comets with the Zwicky Transient Facility and Deep Learning

Dmitry A. Duev, Bryce T. Bolin, Matthew J. Graham et al.

We present Tails, an open-source deep-learning framework for the identification and localization of comets in the image data of the Zwicky Transient Facility (ZTF), a robotic optical time-domain survey currently in operation at the Palomar Observatory in California, USA. Tails employs a custom EfficientDet-based architecture and is capable of finding comets in single images in near real time, rather than requiring multiple epochs as with traditional methods. The system achieves state-of-the-art performance with 99% recall, 0.01% false positive rate, and 1-2 pixel root mean square error in the predicted position. We report the initial results of the Tails efficiency evaluation in a production setting on the data of the ZTF Twilight survey, including the first AI-assisted discovery of a comet (C/2020 T2) and the recovery of a comet (P/2016 J3 = P/2021 A3).

CYMay 8, 2016
Software search is not a science, even among scientists: A survey of how scientists and engineers find software

Michael Hucka, Matthew J. Graham

Improved software discovery is a prerequisite for greater software reuse: after all, if someone cannot find software for a particular task, they cannot reuse it. Understanding people's approaches and preferences when they look for software could help improve facilities for software discovery. We surveyed people working in several scientific and engineering fields to better understand their approaches and selection criteria. We found that even among highly-trained people, the rudimentary approaches of relying on general Web searches, the opinions of colleagues, and the literature were still the most commonly used. However, those who were involved in software development differed from nondevelopers in their use of social help sites, software project repositories, software catalogs, and organization-specific mailing lists or forums. For example, software developers in our sample were more likely to search in community sites such as Stack Overflow even when seeking ready-to-run software rather than source code, and likewise, asking colleagues was significantly more important when looking for ready-to-run software. Our survey also provides insight into the criteria that matter most to people when they are searching for ready-to-run software. Finally, our survey also identifies some factors that can prevent people from finding software.

IMOct 8, 2013
Feature Selection Strategies for Classifying High Dimensional Astronomical Data Sets

Ciro Donalek, Arun Kumar A., S. G. Djorgovski et al.

The amount of collected data in many scientific fields is increasing, all of them requiring a common task: extract knowledge from massive, multi parametric data sets, as rapidly and efficiently possible. This is especially true in astronomy where synoptic sky surveys are enabling new research frontiers in the time domain astronomy and posing several new object classification challenges in multi dimensional spaces; given the high number of parameters available for each object, feature selection is quickly becoming a crucial task in analyzing astronomical data sets. Using data sets extracted from the ongoing Catalina Real-Time Transient Surveys (CRTS) and the Kepler Mission we illustrate a variety of feature selection strategies used to identify the subsets that give the most information and the results achieved applying these techniques to three major astronomical problems.