A Standardized Framework For Evaluating Gene Expression Generative Models
This addresses the problem of inconsistent and incomparable benchmarking for researchers in computational biology and single-cell genomics, though it is incremental as it standardizes existing practices rather than introducing new methods.
The authors tackled the lack of standardized evaluation for generative models of single-cell gene expression data by developing GGE, an open-source framework that provides consistent metrics and biologically-grounded analysis, demonstrating that metric values vary significantly with implementation choices.
The rapid development of generative models for single-cell gene expression data has created an urgent need for standardised evaluation frameworks. Current evaluation practices suffer from inconsistent metric implementations, incomparable hyperparameter choices, and a lack of biologically-grounded metrics. We present Generated Genetic Expression Evaluator (GGE), an open-source Python framework that addresses these challenges by providing a comprehensive suite of distributional metrics with explicit computation space options and biologically-motivated evaluation through differentially expressed gene (DEG)-focused analysis and perturbation-effect correlation, enabling standardized reporting and reproducible benchmarking. Through extensive analysis of the single-cell generative modeling literature, we identify that no standardized evaluation protocol exists. Methods report incomparable metrics computed in different spaces with different hyperparameters. We demonstrate that metric values vary substantially depending on implementation choices, highlighting the critical need for standardization. GGE enables fair comparison across generative approaches and accelerates progress in perturbation response prediction, cellular identity modeling, and counterfactual inference.