HCNov 4, 2025Code
SigmaCollab: An Application-Driven Dataset for Physically Situated CollaborationDan Bohus, Sean Andrist, Ann Paradiso et al.
We introduce SigmaCollab, a dataset enabling research on physically situated human-AI collaboration. The dataset consists of a set of 85 sessions in which untrained participants were guided by a mixed-reality assistive AI agent in performing procedural tasks in the physical world. SigmaCollab includes a set of rich, multimodal data streams, such as the participant and system audio, egocentric camera views from the head-mounted device, depth maps, head, hand and gaze tracking information, as well as additional annotations performed post-hoc. While the dataset is relatively small in size (~ 14 hours), its application-driven and interactive nature brings to the fore novel research challenges for human-AI collaboration, and provides more realistic testing grounds for various AI models operating in this space. In future work, we plan to use the dataset to construct a set of benchmarks for physically situated collaboration in mixed-reality task assistive scenarios. SigmaCollab is available at https://github.com/microsoft/SigmaCollab.
HCMay 16, 2024Code
SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance ResearchDan Bohus, Sean Andrist, Nick Saw et al.
We introduce an open-source system called SIGMA (short for "Situated Interactive Guidance, Monitoring, and Assistance") as a platform for conducting research on task-assistive agents in mixed-reality scenarios. The system leverages the sensing and rendering affordances of a head-mounted mixed-reality device in conjunction with large language and vision models to guide users step by step through procedural tasks. We present the system's core capabilities, discuss its overall design and implementation, and outline directions for future research enabled by the system. SIGMA is easily extensible and provides a useful basis for future research at the intersection of mixed reality and AI. By open-sourcing an end-to-end implementation, we aim to lower the barrier to entry, accelerate research in this space, and chart a path towards community-driven end-to-end evaluation of large language, vision, and multimodal models in the context of real-world interactive applications.
AIMar 29, 2021Code
Platform for Situated IntelligenceDan Bohus, Sean Andrist, Ashley Feniello et al.
We introduce Platform for Situated Intelligence, an open-source framework created to support the rapid development and study of multimodal, integrative-AI systems. The framework provides infrastructure for sensing, fusing, and making inferences from temporal streams of data across different modalities, a set of tools that enable visualization and debugging, and an ecosystem of components that encapsulate a variety of perception and processing technologies. These assets jointly provide the means for rapidly constructing and refining multimodal, integrative-AI systems, while retaining the efficiency and performance characteristics required for deployment in open-world settings.