A Proper Scoring Rule for Virtual Staining
This work provides a more robust evaluation method for virtual staining models, which is crucial for researchers in high-throughput screening relying on accurate biological feature predictions.
This paper introduces Information Gain (IG) as a cell-wise evaluation framework for generative virtual staining (VS) models used in high-throughput screening (HTS). IG directly assesses predicted posteriors, unlike existing methods that only check marginal distributions, and reveals substantial performance differences between diffusion- and GAN-based models that other metrics miss.
Generative virtual staining (VS) models for high-throughput screening (HTS) can provide an estimated posterior distribution of possible biological feature values for each input and cell. However, when evaluating a VS model, the true posterior is unavailable. Existing evaluation protocols only check the accuracy of the marginal distribution over the dataset rather than the predicted posteriors. We introduce information gain (IG) as a cell-wise evaluation framework that enables direct assessment of predicted posteriors. IG is a strictly proper scoring rule and comes with a sound theoretical motivation allowing for interpretability, and for comparing results across models and features. We evaluate diffusion- and GAN-based models on an extensive HTS dataset using IG and other metrics and show that IG can reveal substantial performance differences other metrics cannot.