CVAug 2, 2023
DLSIA: Deep Learning for Scientific Image AnalysisEric J Roberts, Tanny Chavez, Alexander Hexemer et al.
We introduce DLSIA (Deep Learning for Scientific Image Analysis), a Python-based machine learning library that empowers scientists and researchers across diverse scientific domains with a range of customizable convolutional neural network (CNN) architectures for a wide variety of tasks in image analysis to be used in downstream data processing, or for experiment-in-the-loop computing scenarios. DLSIA features easy-to-use architectures such as autoencoders, tunable U-Nets, and parameter-lean mixed-scale dense networks (MSDNets). Additionally, we introduce sparse mixed-scale networks (SMSNets), generated using random graphs and sparse connections. As experimental data continues to grow in scale and complexity, DLSIA provides accessible CNN construction and abstracts CNN complexities, allowing scientists to tailor their machine learning approaches, accelerate discoveries, foster interdisciplinary collaboration, and advance research in scientific image analysis.
LGAug 20, 2022
MLExchange: A web-based platform enabling exchangeable machine learning workflows for scientific studiesZhuowen Zhao, Tanny Chavez, Elizabeth A. Holman et al.
Machine learning (ML) algorithms are showing a growing trend in helping the scientific communities across different disciplines and institutions to address large and diverse data problems. However, many available ML tools are programmatically demanding and computationally costly. The MLExchange project aims to build a collaborative platform equipped with enabling tools that allow scientists and facility users who do not have a profound ML background to use ML and computational resources in scientific discovery. At the high level, we are targeting a full user experience where managing and exchanging ML algorithms, workflows, and data are readily available through web applications. Since each component is an independent container, the whole platform or its individual service(s) can be easily deployed at servers of different scales, ranging from a personal device (laptop, smart phone, etc.) to high performance clusters (HPC) accessed (simultaneously) by many users. Thus, MLExchange renders flexible using scenarios -- users could either access the services and resources from a remote server or run the whole platform or its individual service(s) within their local network.
54.5LGMay 1
Machine Learning-Augmented Acceleration of Iterative Ptychographic ReconstructionBowen Zheng, Katayun Kamdin, David Shapiro et al.
Iterative ptychographic reconstruction algorithms are widely used for coherent diffractive imaging but can exhibit slow convergence under realistic experimental conditions. We propose a machine learning-augmented approach that accelerates iterative ptychographic reconstruction by introducing a learned fast-forward operator applied during reconstruction. Following an initial warm-up using standard iterations, the fast-forward operator advances the reconstruction toward a more converged state, after which conventional iterative updates are resumed. This strategy preserves the physical consistency and flexibility of established ptychographic solvers while reducing the number of iterations required for convergence. The model is trained on diverse ptychographic datasets and evaluated on experimental data acquired in a different year, demonstrating robustness and temporal generalization. Compared with conventional iterative solvers, the machine learning-augmented method achieves comparable reconstruction quality while converging faster in terms of Poisson negative log-likelihood, yielding over a two-fold reduction in wall-clock time. The approach has been integrated into an existing reconstruction pipeline and deployed in production at a synchrotron beamline, demonstrating practicality for real-time experimental operation.
MEFeb 20
Conformal Tradeoffs: Guarantees Beyond CoveragePetrus H. Zwart
Deployed conformal predictors are long-lived decision infrastructure operating over finite operational windows. The real-world question is not only ``Does the true label lie in the prediction set at the target rate?'' (marginal coverage), but ``How often does the system commit versus defer? What error exposure does it induce when it acts? How do these rates trade off?'' Marginal coverage does not determine these deployment-facing quantities: the same calibrated thresholds can yield different operational profiles depending on score geometry. We provide a framework for operational certification and planning beyond coverage with three contributions. (1) Small-Sample Beta Correction (SSBC): we invert the exact finite-sample Beta/rank law for split conformal to map a user request $(α^\star,δ)$ to a calibrated grid point with PAC-style semantics, yielding explicit finite-window coverage guarantees. (2) Calibrate-and-Audit: since no distribution-free pivot exists for rates beyond coverage, we introduce a two-stage design in which an independent audit set produces a reusable region -- label table and certified finite-window envelopes (Binomial/Beta-Binomial) for operational quantities -- commitment frequency, deferral, decisive error exposure, and commit purity -- via linear projection. (3) Geometric characterization: we describe feasibility constraints, regime boundaries (hedging vs.\ rejection), and cost-coherence conditions induced by a fixed conformal partition, explaining why operational rates are coupled and how calibration navigates their trade-offs. The output is an auditable operational menu: for a fixed scoring model, we trace attainable operational profiles across calibration settings and attach finite-window uncertainty envelopes. We demonstrate the approach on Tox21 toxicity prediction (12 endpoints) and aqueous solubility screening using AquaSolDB.
LGSep 18, 2025
Probabilistic Conformal Coverage Guarantees in Small-Data SettingsPetrus H. Zwart
Conformal prediction provides distribution-free prediction sets with guaranteed marginal coverage. However, in split conformal prediction this guarantee is training-conditional only in expectation: across many calibration draws, the average coverage equals the nominal level, but the realized coverage for a single calibration set may vary substantially. This variance undermines effective risk control in practical applications. Here we introduce the Small Sample Beta Correction (SSBC), a plug-and-play adjustment to the conformal significance level that leverages the exact finite-sample distribution of conformal coverage to provide probabilistic guarantees, ensuring that with user-defined probability over the calibration draw, the deployed predictor achieves at least the desired coverage.
AIMay 13, 2025
Behind the Noise: Conformal Quantile Regression Reveals Emergent RepresentationsPetrus H. Zwart, Tamas Varga, Odeta Qafoku et al.
Scientific imaging often involves long acquisition times to obtain high-quality data, especially when probing complex, heterogeneous systems. However, reducing acquisition time to increase throughput inevitably introduces significant noise into the measurements. We present a machine learning approach that not only denoises low-quality measurements with calibrated uncertainty bounds, but also reveals emergent structure in the latent space. By using ensembles of lightweight, randomly structured neural networks trained via conformal quantile regression, our method performs reliable denoising while uncovering interpretable spatial and chemical features -- without requiring labels or segmentation. Unlike conventional approaches focused solely on image restoration, our framework leverages the denoising process itself to drive the emergence of meaningful representations. We validate the approach on real-world geobiochemical imaging data, showing how it supports confident interpretation and guides experimental design under resource constraints.