HCAICLMAMMFeb 9, 2021

Hallmarks of Human-Machine Collaboration: A framework for assessment in the DARPA Communicating with Computers Program

arXiv:2102.04958v114 citations
Originality Synthesis-oriented
AI Analysis

This framework addresses the challenge of assessing human-machine collaboration for researchers developing systems for complex, open-ended tasks where traditional metrics are insufficient. It is an incremental contribution to evaluation methodology.

This paper presents a framework for evaluating human-machine collaboration systems in open-ended, complex tasks without a single correct answer. The framework identifies 'Key Properties' and 'Hallmarks' of success, which have been applied to assess creative collaborations in story and music generation, interactive block building, and cancer research.

There is a growing desire to create computer systems that can communicate effectively to collaborate with humans on complex, open-ended activities. Assessing these systems presents significant challenges. We describe a framework for evaluating systems engaged in open-ended complex scenarios where evaluators do not have the luxury of comparing performance to a single right answer. This framework has been used to evaluate human-machine creative collaborations across story and music generation, interactive block building, and exploration of molecular mechanisms in cancer. These activities are fundamentally different from the more constrained tasks performed by most contemporary personal assistants as they are generally open-ended, with no single correct solution, and often no obvious completion criteria. We identified the Key Properties that must be exhibited by successful systems. From there we identified "Hallmarks" of success -- capabilities and features that evaluators can observe that would be indicative of progress toward achieving a Key Property. In addition to being a framework for assessment, the Key Properties and Hallmarks are intended to serve as goals in guiding research direction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes