AIApr 26

MetaGAI: A Large-Scale and High-Quality Benchmark for Generative AI Model and Data Card Generation

arXiv:2604.2353992.4Has Code
Predicted impact top 23% in AI · last 90 daysOriginality Incremental advance
AI Analysis

For researchers and practitioners in generative AI governance, MetaGAI provides the first large-scale, high-fidelity benchmark to systematically evaluate automated documentation methods.

MetaGAI introduces a benchmark of 2,541 verified document triplets for automated Model and Data Card generation, using a multi-agent framework and human-in-the-loop validation. It reveals that sparse Mixture-of-Experts architectures offer superior cost-quality efficiency, and identifies a trade-off between faithfulness and completeness.

The rapid proliferation of Generative AI necessitates rigorous documentation standards for transparency and governance. However, manual creation of Model and Data Cards is not scalable, while automated approaches lack large-scale, high-fidelity benchmarks for systematic evaluation. We introduce MetaGAI, a comprehensive benchmark comprising 2,541 verified document triplets constructed through semantic triangulation of academic papers, GitHub repositories, and Hugging Face artifacts. Unlike prior single-source datasets, MetaGAI employs a multi-agent framework with specialized Retriever, Generator, and Editor agents, validated through four-dimensional human-in-the-loop assessment, including human evaluation of editor-refined ground truth. We establish a robust evaluation protocol combining automated metrics with validated LLM-as-a-Judge frameworks. Extensive analysis reveals that sparse Mixture-of-Experts architectures achieve superior cost-quality efficiency, while a fundamental trade-off exists between faithfulness and completeness. MetaGAI provides a foundational testbed for benchmarking, training, and analyzing automated Model and Data Card generation methods at scale. Our data and code are available at: https://github.com/haoxuan-unt2024/MetaGAI-Benchmark.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes