AICLJun 4

Humans' ALMANAC: A Human Collaboration Dataset of Action-Level Mental Model Annotations for Agent Collaboration

arXiv:2606.0638890.7
Predicted impact top 7% in AI · last 90 daysOriginality Incremental advance
AI Analysis

For researchers developing LLM agents for human collaboration, this dataset provides a novel resource to train and evaluate models on process-level collaborative competence, addressing a gap in existing data.

The authors introduce ALMANAC, a dataset of 2,987 collaboration actions with action-level mental model annotations from the Map Task, and benchmark six LLMs on predicting human next-turn behavior and mental models, demonstrating the dataset's utility for evaluating collaborative AI.

Recent advances in LLM agents have enabled complex cognitive capabilities, such as multi-step reasoning, planning, and tool use, that increasingly position these agents as human collaborators. Effective collaboration, however, requires collaborators to continuously maintain and align mental models of their own reasoning,partners' intentions, and shared goals during the collaborative process. Today's agents rarely develop such capabilities since they are primarily optimized for task completion, and the community lacks authentic human collaboration data with action-level mental model annotations that could guide agents toward process-level collaborative competence. To bridge this gap, we present ALMANAC, a dataset of Action-Level Mental model ANnotations for Agent Collaboration built from the Map Task, a classic dyadic routing task from social science. ALMANAC contains 2,987 collaboration actions, each paired with theory-informed mental model annotations that record the participants' self-reasoning, perceived partner intent, and perceived team goal. We benchmark six LLMs on predicting humans' next-turn behavior and mental models. Our results demonstrate ALMANAC's utility in evaluating models' ability to simulate human collaborative behaviors and infer their underlying mental models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes