Suzan van der Lee

h-index95
2papers

2 Papers

LGMar 3, 2025
Building Machine Learning Challenges for Anomaly Detection in Science

Elizabeth G. Campolongo, Yuan-Tang Chou, Ekaterina Govorkova et al.

Scientific discoveries are often made by finding a pattern or object that was not predicted by the known rules of science. Oftentimes, these anomalous events or objects that do not conform to the norms are an indication that the rules of science governing the data are incomplete, and something new needs to be present to explain these unexpected outliers. The challenge of finding anomalies can be confounding since it requires codifying a complete knowledge of the known scientific behaviors and then projecting these known behaviors on the data to look for deviations. When utilizing machine learning, this presents a particular challenge since we require that the model not only understands scientific data perfectly but also recognizes when the data is inconsistent and out of the scope of its trained behavior. In this paper, we present three datasets aimed at developing machine learning-based anomaly detection for disparate scientific domains covering astrophysics, genomics, and polar science. We present the different datasets along with a scheme to make machine learning challenges around the three datasets findable, accessible, interoperable, and reusable (FAIR). Furthermore, we present an approach that generalizes to future machine learning challenges, enabling the possibility of large, more compute-intensive challenges that can ultimately lead to scientific discovery.

GEO-PHNov 5, 2020
Applying Machine Learning to Crowd-sourced Data from Earthquake Detective

Omkar Ranadive, Suzan van der Lee, Vivian Tang et al.

Dynamically triggered earthquakes and tremor generate two classes of weak seismic signals whose detection, identification, and authentication traditionally call for laborious analyses. Machine learning (ML) has grown in recent years to be a powerful efficiency-boosting tool in geophysical analyses, including the detection of specific signals in time series. However, detecting weak signals that are buried in noise challenges ML algorithms, in part because ubiquitous training data is not always available. Under these circumstances, ML can be as ineffective as human experts are inefficient. At this intersection of effectiveness and efficiency, we leverage a third tool that has grown in popularity over the past decade: Citizen science. Citizen science project Earthquake Detective leverages the eyes and ears of volunteers to detect and classify weak signals in seismograms from potentially dynamically triggered (PDT) events. Here, we present the Earthquake Detective data set - A crowd-sourced set of labels on PDT earthquakes and tremor. We apply Machine Learning to classify these PDT seismic events and explore the challenges faced in segregating and classifying such weak signals. We confirm that with an image- and wavelet-based algorithm, machine learning can detect signals from small earthquakes. In addition, we report that our ML algorithm can also detect signals from PDT tremor, which has not been previously demonstrated. The citizen science data set of classifications and ML code are available online.