HCAIFeb 6, 2024

GenLens: A Systematic Evaluation of Visual GenAI Model Outputs

arXiv:2402.03700v11 citationsh-index: 17PacificVis
Originality Incremental advance
AI Analysis

This addresses the problem of subjective and unscalable evaluation practices for GenAI developers, though it is incremental as it builds on existing tools for dataset quality and explainability.

The paper tackles the lack of systematic evaluation tools for visual generative AI model outputs during early development by introducing GenLens, a visual analytic interface that enables quantifiable failure case annotation and collaboration, with user studies showing high satisfaction and intent to adopt it.

The rapid development of generative AI (GenAI) models in computer vision necessitates effective evaluation methods to ensure their quality and fairness. Existing tools primarily focus on dataset quality assurance and model explainability, leaving a significant gap in GenAI output evaluation during model development. Current practices often depend on developers' subjective visual assessments, which may lack scalability and generalizability. This paper bridges this gap by conducting a formative study with GenAI model developers in an industrial setting. Our findings led to the development of GenLens, a visual analytic interface designed for the systematic evaluation of GenAI model outputs during the early stages of model development. GenLens offers a quantifiable approach for overviewing and annotating failure cases, customizing issue tags and classifications, and aggregating annotations from multiple users to enhance collaboration. A user study with model developers reveals that GenLens effectively enhances their workflow, evidenced by high satisfaction rates and a strong intent to integrate it into their practices. This research underscores the importance of robust early-stage evaluation tools in GenAI development, contributing to the advancement of fair and high-quality GenAI models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes