LGFeb 15, 2024
An Evaluation of Real-time Adaptive Sampling Change Point Detection Algorithm using KCUSUMVijayalakshmi Saravanan, Perry Siehien, Shinjae Yoo et al.
Detecting abrupt changes in real-time data streams from scientific simulations presents a challenging task, demanding the deployment of accurate and efficient algorithms. Identifying change points in live data stream involves continuous scrutiny of incoming observations for deviations in their statistical characteristics, particularly in high-volume data scenarios. Maintaining a balance between sudden change detection and minimizing false alarms is vital. Many existing algorithms for this purpose rely on known probability distributions, limiting their feasibility. In this study, we introduce the Kernel-based Cumulative Sum (KCUSUM) algorithm, a non-parametric extension of the traditional Cumulative Sum (CUSUM) method, which has gained prominence for its efficacy in online change point detection under less restrictive conditions. KCUSUM splits itself by comparing incoming samples directly with reference samples and computes a statistic grounded in the Maximum Mean Discrepancy (MMD) non-parametric framework. This approach extends KCUSUM's pertinence to scenarios where only reference samples are available, such as atomic trajectories of proteins in vacuum, facilitating the detection of deviations from the reference sample without prior knowledge of the data's underlying distribution. Furthermore, by harnessing MMD's inherent random-walk structure, we can theoretically analyze KCUSUM's performance across various use cases, including metrics like expected delay and mean runtime to false alarms. Finally, we discuss real-world use cases from scientific simulations such as NWChem CODAR and protein folding data, demonstrating KCUSUM's practical effectiveness in online change point detection.
BMMay 28, 2020
Targeting SARS-CoV-2 with AI- and HPC-enabled Lead Generation: A First Data ReleaseYadu Babuji, Ben Blaiszik, Tom Brettin et al.
Researchers across the globe are seeking to rapidly repurpose existing drugs or discover new drugs to counter the the novel coronavirus disease (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). One promising approach is to train machine learning (ML) and artificial intelligence (AI) tools to screen large numbers of small molecules. As a contribution to that effort, we are aggregating numerous small molecules from a variety of sources, using high-performance computing (HPC) to computer diverse properties of those molecules, using the computed properties to train ML/AI models, and then using the resulting models for screening. In this first data release, we make available 23 datasets collected from community sources representing over 4.2 B molecules enriched with pre-computed: 1) molecular fingerprints to aid similarity searches, 2) 2D images of molecules to enable exploration and application of image-based deep learning methods, and 3) 2D and 3D molecular descriptors to speed development of machine learning models. This data release encompasses structural information on the 4.2 B molecules and 60 TB of pre-computed data. Future releases will expand the data to include more detailed molecular simulations, computed models, and other products.
SEOct 22, 2019
Theory-Software Translation: Research Challenges and Future DirectionsCaroline Jay, Robert Haines, Daniel S. Katz et al.
The Theory-Software Translation Workshop, held in New Orleans in February 2019, explored in depth the process of both instantiating theory in software - for example, implementing a mathematical model in code as part of a simulation - and using the outputs of software - such as the behavior of a simulation - to advance knowledge. As computation within research is now ubiquitous, the workshop provided a timely opportunity to reflect on the particular challenges of research software engineering - the process of developing and maintaining software for scientific discovery. In addition to the general challenges common to all software development projects, research software additionally must represent, manipulate, and provide data for complex theoretical constructs. Ensuring this process is robust is essential to maintaining the integrity of the science resulting from it, and the workshop highlighted a number of areas where the current approach to research software engineering would benefit from an evidence base that could be used to inform best practice. The workshop brought together expert research software engineers and academics to discuss the challenges of Theory-Software Translation over a two-day period. This report provides an overview of the workshop activities, and a synthesises of the discussion that was recorded. The body of the report presents a thematic analysis of the challenges of Theory-Software Translation as identified by workshop participants, summarises these into a set of research areas, and provides recommendations for the future direction of this work.