EvalBlocks: A Modular Pipeline for Rapidly Evaluating Foundation Models in Medical Imaging
This addresses the burden on researchers in medical imaging by automating evaluation logistics, though it is incremental as it builds on existing tools like Snakemake.
The authors tackled the problem of slow and error-prone evaluation workflows for foundation models in medical imaging by introducing EvalBlocks, a modular framework that centralizes tracking and enables reproducible experiments, demonstrated on five models and three tasks to streamline iteration.
Developing foundation models in medical imaging requires continuous monitoring of downstream performance. Researchers are burdened with tracking numerous experiments, design choices, and their effects on performance, often relying on ad-hoc, manual workflows that are inherently slow and error-prone. We introduce EvalBlocks, a modular, plug-and-play framework for efficient evaluation of foundation models during development. Built on Snakemake, EvalBlocks supports seamless integration of new datasets, foundation models, aggregation methods, and evaluation strategies. All experiments and results are tracked centrally and are reproducible with a single command, while efficient caching and parallel execution enable scalable use on shared compute infrastructure. Demonstrated on five state-of-the-art foundation models and three medical imaging classification tasks, EvalBlocks streamlines model evaluation, enabling researchers to iterate faster and focus on model innovation rather than evaluation logistics. The framework is released as open source software at https://github.com/DIAGNijmegen/eval-blocks.