Position: Bayesian Statistics Facilitates Stakeholder Participation in Evaluation of Generative AI
This addresses the need for more robust evaluation methods in Generative AI for public policy and decision-making, though it appears incremental as it applies existing Bayesian methods to this domain.
The paper tackles the problem of evaluating Generative AI systems by proposing Bayesian statistics as a framework to incorporate stakeholder perspectives and quantify uncertainty, aiming to enhance fairness, transparency, and reliability in assessments.
The evaluation of Generative AI (GenAI) systems plays a critical role in public policy and decision-making, yet existing methods are often limited by reliance on benchmark-driven, point-estimate comparisons that fail to capture uncertainty and broader societal impacts. This paper argues for the use of Bayesian statistics as a principled framework to address these challenges. Bayesian methods enable the integration of domain expertise through prior elicitation, allow for continuous learning from new data, and provide robust uncertainty quantification via posterior inference. We demonstrate how Bayesian inference can be applied to GenAI evaluation, particularly in incorporating stakeholder perspectives to enhance fairness, transparency, and reliability. Furthermore, we discuss Bayesian workflows as an iterative process for model validation and refinement, ensuring robust assessments of GenAI systems in dynamic, real-world contexts.