LGApr 20, 2022

fairDMS: Rapid Model Training by Data and Model Reuse

arXiv:2204.09805v34 citationsh-index: 36
Originality Incremental advance
AI Analysis

This addresses the problem of timely event detection and error correction in scientific instruments for researchers, though it appears incremental as it builds on existing ML and data storage concepts.

The paper tackles the challenge of rapidly extracting information from high-data-rate instruments like LCLS-II and APS-U, where conventional methods struggle with speed and ML models degrade with changes. It presents a data and model reuse architecture that achieves up to 100x data labeling speedup, 200x training speed improvement, and 92x end-to-end model updating speedup compared to state-of-the-art.

Extracting actionable information rapidly from data produced by instruments such as the Linac Coherent Light Source (LCLS-II) and Advanced Photon Source Upgrade (APS-U) is becoming ever more challenging due to high (up to TB/s) data rates. Conventional physics-based information retrieval methods are hard-pressed to detect interesting events fast enough to enable timely focusing on a rare event or correction of an error. Machine learning~(ML) methods that learn cheap surrogate classifiers present a promising alternative, but can fail catastrophically when changes in instrument or sample result in degradation in ML performance. To overcome such difficulties, we present a new data storage and ML model training architecture designed to organize large volumes of data and models so that when model degradation is detected, prior models and/or data can be queried rapidly and a more suitable model retrieved and fine-tuned for new conditions. We show that our approach can achieve up to 100x data labelling speedup compared to the current state-of-the-art, 200x improvement in training speed, and 92x speedup in-terms of end-to-end model updating time.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes