94.3HCMay 21Code
XARP Tools: An Extended Reality Platform for Humans and AI AgentsArthur Caetano, Radha Kumaran, Kelvin Jou et al.
Building XR-AI research prototypes requires navigating two largely separate ecosystems. Mainstream XR development relies on C#/C++ and game engines, while AI development is centered on Python. This toolchain fragmentation slows down contributions to human-AI spatial interaction research. To broaden access to XR development in the Python ecosystem, we present XARP (XR Agent-ready Remote Procedures), a toolkit for rapid XR-AI prototyping in Python. XARP application logic runs on a Python server and controls a Unity client through WebSocket messages. This architecture enables compatibility with multiple client platforms and live reloading of application code without client redeployment. XARP is available to humans as a library and to AI agents as callable tools and through Model Context Protocol. We designed XARP through formative case studies and refined it through an early acceptance evaluation with 24 XR and AI developers and a six-week longitudinal study with two developers building an independent research project. Potential users expected the toolkit to improve their performance and facilitate development. Sustained use confirmed faster iteration and easier setup compared to conventional XR workflows, with asset-intensive and performance-critical projects emerging as the clearest limitations. Technical benchmarks show that hand and head tracking data streaming was close to the device refresh rate of 72 FPS, and that AI agents using XARP consumed 19% fewer tokens than those writing equivalent C# Unity code. Beyond broadening access to XR development, XARP reduces engineering friction in spatial computing research and opens new pathways for AI agents to participate in XR application development. XARP is open source and available at https://github.com/hal-ucsb/xarp.
92.7HCMay 10
MiXR: Harvesting and Recomposing Geometry from Real-World Objects for In-Situ 3D DesignFaraz Faruqi, Demircan Tas, Arthur Caetano et al.
Recent developments in 3D generative AI enable users to create bespoke 3D models from text or image prompts. However, these approaches provide limited control over spatial structure, making them ill suited for tasks requiring precise geometric composition. We present MiXR, an XR system for in-situ compositional modeling that enables users to create new 3D models by harvesting geometry from their environment. Users extract segments from captured objects and assemble new artifacts through direct 3D manipulation, while generative AI synthesizes a coherent model from the user-defined composition. This hybrid workflow allows users to define spatial structure explicitly while delegating geometric refinement to generative models, enabling them to specify spatial intent that is difficult to express through verbal prompts alone. In a controlled user study ($N=12$), participants using MiXR rated their designs as significantly closer to the target, felt more in control, and experienced lower cognitive workload compared to a generative composition baseline.
HCSep 25, 2025
Understanding Mode Switching in Human-AI Collaboration: Behavioral Insights and Predictive ModelingAvinash Ajit Nargund, Arthur Caetano, Kevin Yang et al.
Human-AI collaboration is typically offered in one of two of user control levels: guidance, where the AI provides suggestions and the human makes the final decision, and delegation, where the AI acts autonomously within user-defined constraints. Systems that integrate both modes, common in robotic surgery or driving assistance, often overlook shifts in user preferences within a task in response to factors like evolving trust, decision complexity, and perceived control. In this work, we investigate how users dynamically switch between higher and lower levels of control during a sequential decision-making task. Using a hand-and-brain chess setup, participants either selected a piece and the AI decided how it moved (brain mode), or the AI selected a piece and the participant decided how it moved (hand mode). We collected over 400 mode-switching decisions from eight participants, along with gaze, emotional state, and subtask difficulty data. Statistical analysis revealed significant differences in gaze patterns and subtask complexity prior to a switch and in the quality of the subsequent move. Based on these results, we engineered behavioral and task-specific features to train a lightweight model that predicted control level switches ($F1 = 0.65$). The model performance suggests that real-time behavioral signals can serve as a complementary input alongside system-driven mode-switching mechanisms currently used. We complement our quantitative results with qualitative factors that influence switching including perceived AI ability, decision complexity, and level of control, identified from post-game interview analysis. The combined behavioral and modeling insights can help inform the design of shared autonomy systems that need dynamic, subtask-level control switches aligned with user intent and evolving task demands.
AIJun 20, 2024
Artificial Leviathan: Exploring Social Evolution of LLM Agents Through the Lens of Hobbesian Social Contract TheoryGordon Dai, Weijia Zhang, Jinhan Li et al.
The emergence of Large Language Models (LLMs) and advancements in Artificial Intelligence (AI) offer an opportunity for computational social science research at scale. Building upon prior explorations of LLM agent design, our work introduces a simulated agent society where complex social relationships dynamically form and evolve over time. Agents are imbued with psychological drives and placed in a sandbox survival environment. We conduct an evaluation of the agent society through the lens of Thomas Hobbes's seminal Social Contract Theory (SCT). We analyze whether, as the theory postulates, agents seek to escape a brutish "state of nature" by surrendering rights to an absolute sovereign in exchange for order and security. Our experiments unveil an alignment: Initially, agents engage in unrestrained conflict, mirroring Hobbes's depiction of the state of nature. However, as the simulation progresses, social contracts emerge, leading to the authorization of an absolute sovereign and the establishment of a peaceful commonwealth founded on mutual cooperation. This congruence between our LLM agent society's evolutionary trajectory and Hobbes's theoretical account indicates LLMs' capability to model intricate social dynamics and potentially replicate forces that shape human societies. By enabling such insights into group behavior and emergent societal phenomena, LLM-driven multi-agent simulations, while unable to simulate all the nuances of human behavior, may hold potential for advancing our understanding of social structures, group dynamics, and complex human systems.