Learning from AVA: Early Lessons from a Curated and Trustworthy Generative AI for Policy and Development Research
For development and policy researchers, AVA demonstrates a practical approach to building trustworthy AI for specialized domains, though the novelty is incremental as it applies known techniques (multi-agent pipelines, citation tracing) to a new curated dataset.
AVA, a generative AI platform built on a curated library of over 4,000 World Bank Reports, reduces misinformation risks for policy experts by providing evidence-based syntheses with citation verifiability and reasoned abstention. In an evaluation with over 2,200 users, sustained engagement was associated with 2.4-3.9 hours saved weekly.
General-purpose LLMs pose misinformation risks for development and policy experts, lacking epistemic humility for verifiable outputs. We present AVA (AI + Verified Analysis), a GenAI platform built on a curated library of over 4,000 World Bank Reports with multilingual capabilities. AVA's multi-agent pipeline enables users to query and receive evidence-based syntheses. It operationalizes epistemic humility through two mechanisms: citation verifiability (tracing claims to sources) and reasoned abstention (declining unsupported queries with justification and redirection). We conducted an in-the-wild evaluation with over 2,200 individuals from heterogeneous organisations and roles in 116 countries, via log analysis, surveys, and 20 interviews. Difference-in-Differences estimates associate sustained engagement with 2.4-3.9 hours saved weekly. Qualitatively, participants used AVA as a specialized "evidence engine"; reasoned abstention clarified scope boundaries, and trust was calibrated through institutional provenance and page-anchored citations. We contribute design guidelines for specialized AI and articulate a vision for "ecosystem-aware" Humble AI.