CL AI CY HCDec 5, 2025

Collective Narrative Grounding: Community-Coordinated Data Contributions to Improve Local AI Systems

Zihan Gao, Mohsin Y. K. Yousufi, Jacob Thebault-Spieker

arXiv:2601.04201v10.6

Originality Incremental advance

AI Analysis

This addresses the problem of epistemic injustice and marginalization in AI systems for local communities, offering a novel approach to improve local question-answering, though it is incremental in applying participatory methods to a known bottleneck.

The paper tackles the problem of large language models failing on community-specific queries, which creates knowledge blind spots and marginalizes local voices, by introducing Collective Narrative Grounding, a participatory protocol that transforms community stories into structured narrative units for integration into AI systems, showing that on a participatory QA set, a state-of-the-art LLM answered fewer than 21% of questions correctly without added context.

Large language model (LLM) question-answering systems often fail on community-specific queries, creating "knowledge blind spots" that marginalize local voices and reinforce epistemic injustice. We present Collective Narrative Grounding, a participatory protocol that transforms community stories into structured narrative units and integrates them into AI systems under community governance. Learning from three participatory mapping workshops with N=24 community members, we designed elicitation methods and a schema that retain narrative richness while enabling entity, time, and place extraction, validation, and provenance control. To scope the problem, we audit a county-level benchmark of 14,782 local information QA pairs, where factual gaps, cultural misunderstandings, geographic confusions, and temporal misalignments account for 76.7% of errors. On a participatory QA set derived from our workshops, a state-of-the-art LLM answered fewer than 21% of questions correctly without added context, underscoring the need for local grounding. The missing facts often appear in the collected narratives, suggesting a direct path to closing the dominant error modes for narrative items. Beyond the protocol and pilot, we articulate key design tensions, such as representation and power, governance and control, and privacy and consent, providing concrete requirements for retrieval-first, provenance-visible, locally governed QA systems. Together, our taxonomy, protocol, and participatory evaluation offer a rigorous foundation for building community-grounded AI that better answers local questions.

View on arXiv PDF

Similar