CLNov 13, 2025
LocalBench: Benchmarking LLMs on County-Level Local Knowledge and ReasoningZihan Gao, Yifei Xu, Jacob Thebault-Spieker
Large language models (LLMs) have been widely evaluated on macro-scale geographic tasks, such as global factual recall, event summarization, and regional reasoning. Yet, their ability to handle hyper-local knowledge remains poorly understood. This gap is increasingly consequential as real-world applications, from civic platforms to community journalism, demand AI systems that can reason about neighborhood-specific dynamics, cultural narratives, and local governance. Existing benchmarks fall short in capturing this complexity, often relying on coarse-grained data or isolated references. We present LocalBench, the first benchmark designed to systematically evaluate LLMs on county-level local knowledge across the United States. Grounded in the Localness Conceptual Framework, LocalBench includes 14,782 validated question-answer pairs across 526 U.S. counties in 49 states, integrating diverse sources such as Census statistics, local subreddit discourse, and regional news. It spans physical, cognitive, and relational dimensions of locality. Using LocalBench, we evaluate 13 state-of-the-art LLMs under both closed-book and web-augmented settings. Our findings reveal critical limitations: even the best-performing models reach only 56.8% accuracy on narrative-style questions and perform below 15.5% on numerical reasoning. Moreover, larger model size and web augmentation do not guarantee better performance, for example, search improves Gemini's accuracy by +13.6%, but reduces GPT-series performance by -11.4%. These results underscore the urgent need for language models that can support equitable, place-aware AI systems: capable of engaging with the diverse, fine-grained realities of local communities across geographic and cultural contexts.
CLDec 5, 2025
Collective Narrative Grounding: Community-Coordinated Data Contributions to Improve Local AI SystemsZihan Gao, Mohsin Y. K. Yousufi, Jacob Thebault-Spieker
Large language model (LLM) question-answering systems often fail on community-specific queries, creating "knowledge blind spots" that marginalize local voices and reinforce epistemic injustice. We present Collective Narrative Grounding, a participatory protocol that transforms community stories into structured narrative units and integrates them into AI systems under community governance. Learning from three participatory mapping workshops with N=24 community members, we designed elicitation methods and a schema that retain narrative richness while enabling entity, time, and place extraction, validation, and provenance control. To scope the problem, we audit a county-level benchmark of 14,782 local information QA pairs, where factual gaps, cultural misunderstandings, geographic confusions, and temporal misalignments account for 76.7% of errors. On a participatory QA set derived from our workshops, a state-of-the-art LLM answered fewer than 21% of questions correctly without added context, underscoring the need for local grounding. The missing facts often appear in the collected narratives, suggesting a direct path to closing the dominant error modes for narrative items. Beyond the protocol and pilot, we articulate key design tensions, such as representation and power, governance and control, and privacy and consent, providing concrete requirements for retrieval-first, provenance-visible, locally governed QA systems. Together, our taxonomy, protocol, and participatory evaluation offer a rigorous foundation for building community-grounded AI that better answers local questions.
HCMar 28, 2019
The Geography of Pokémon GO: Beneficial and Problematic Effects on Places and MovementAshley Colley, Jacob Thebault-Spieker, Allen Yilun Lin et al.
The widespread popularity of Pokémon GO presents the first opportunity to observe the geographic effects of location-based gaming at scale. This paper reports the results of a mixed methods study of the geography of Pokémon GO that includes a five-country field survey of 375 Pokémon GO players and a large scale geostatistical analysis of game elements. Focusing on the key geographic themes of places and movement, we find that the design of Pokémon GO reinforces existing geographically-linked biases (e.g. the game advantages urban areas and neighborhoods with smaller minority populations), that Pokémon GO may have instigated a relatively rare large-scale shift in global human mobility patterns, and that Pokémon GO has geographically-linked safety risks, but not those typically emphasized by the media. Our results point to geographic design implications for future systems in this space such as a means through which the geographic biases present in Pokémon GO may be counteracted.
HCNov 11, 2014
User Session Identification Based on Strong Regularities in Inter-activity TimeAaron Halfaker, Os Keyes, Daniel Kluver et al.
Session identification is a common strategy used to develop metrics for web analytics and behavioral analyses of user-facing systems. Past work has argued that session identification strategies based on an inactivity threshold is inherently arbitrary or advocated that thresholds be set at about 30 minutes. In this work, we demonstrate a strong regularity in the temporal rhythms of user initiated events across several different domains of online activity (incl. video gaming, search, page views and volunteer contributions). We describe a methodology for identifying clusters of user activity and argue that regularity with which these activity clusters appear implies a good rule-of-thumb inactivity threshold of about 1 hour. We conclude with implications that these temporal rhythms may have for system design based on our observations and theories of goal-directed human activity.