HCJul 22, 2024
Building Machines that Learn and Think with PeopleKatherine M. Collins, Ilia Sucholutsky, Umang Bhatt et al. · mit
What do we want from machine intelligence? We envision machines that are not just tools for thought, but partners in thought: reasonable, insightful, knowledgeable, reliable, and trustworthy systems that think with us. Current artificial intelligence (AI) systems satisfy some of these criteria, some of the time. In this Perspective, we show how the science of collaborative cognition can be put to work to engineer systems that really can be called ``thought partners,'' systems built to meet our expectations and complement our limitations. We lay out several modes of collaborative thought in which humans and AI thought partners can engage and propose desiderata for human-compatible thought partnerships. Drawing on motifs from computational cognitive science, we motivate an alternative scaling path for the design of thought partners and ecosystems around their use through a Bayesian lens, whereby the partners we construct actively build and reason over models of the human and world.
LGJul 12, 2023
Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for Test-Time Policy AdaptationAndi Peng, Aviv Netanyahu, Mark Ho et al.
Policies often fail due to distribution shift -- changes in the state and reward that occur when a policy is deployed in new environments. Data augmentation can increase robustness by making the model invariant to task-irrelevant changes in the agent's observation. However, designers don't know which concepts are irrelevant a priori, especially when different end users have different preferences about how the task is performed. We propose an interactive framework to leverage feedback directly from the user to identify personalized task-irrelevant concepts. Our key idea is to generate counterfactual demonstrations that allow users to quickly identify possible task-relevant and irrelevant concepts. The knowledge of task-irrelevant concepts is then used to perform data augmentation and thus obtain a policy adapted to personalized user objectives. We present experiments validating our framework on discrete and continuous control tasks with real human users. Our method (1) enables users to better understand agent failure, (2) reduces the number of demonstrations required for fine-tuning, and (3) aligns the agent to individual user task preferences.
AIOct 30, 2023
Concept Alignment as a Prerequisite for Value AlignmentSunayana Rane, Mark Ho, Ilia Sucholutsky et al.
Value alignment is essential for building AI systems that can safely and reliably interact with people. However, what a person values -- and is even capable of valuing -- depends on the concepts that they are currently using to understand and evaluate what happens in the world. The dependence of values on concepts means that concept alignment is a prerequisite for value alignment -- agents need to align their representation of a situation with that of humans in order to successfully align their values. Here, we formally analyze the concept alignment problem in the inverse reinforcement learning setting, show how neglecting concept alignment can lead to systematic value mis-alignment, and describe an approach that helps minimize such failure modes by jointly reasoning about a person's concepts and values. Additionally, we report experimental results with human participants showing that humans reason about the concepts used by an agent when acting intentionally, in line with our joint reasoning model.
ARJul 2, 2025
Multimodal Chip Physical Design Engineer AssistantYun-Da Tsai, Chang-Yu Chao, Liang-Yeh Shen et al.
Modern chip physical design relies heavily on Electronic Design Automation (EDA) tools, which often struggle to provide interpretable feedback or actionable guidance for improving routing congestion. In this work, we introduce a Multimodal Large Language Model Assistant (MLLMA) that bridges this gap by not only predicting congestion but also delivering human-interpretable design suggestions. Our method combines automated feature generation through MLLM-guided genetic prompting with an interpretable preference learning framework that models congestion-relevant tradeoffs across visual, tabular, and textual inputs. We compile these insights into a "Design Suggestion Deck" that surfaces the most influential layout features and proposes targeted optimizations. Experiments on the CircuitNet benchmark demonstrate that our approach outperforms existing models on both accuracy and explainability. Additionally, our design suggestion guidance case study and qualitative analyses confirm that the learned preferences align with real-world design principles and are actionable for engineers. This work highlights the potential of MLLMs as interactive assistants for interpretable and context-aware physical design optimization.
LGJun 6, 2024
Representational Alignment Supports Effective Machine TeachingIlia Sucholutsky, Katherine M. Collins, Maya Malaviya et al.
A good teacher should not only be knowledgeable, but should also be able to communicate in a way that the student understands -- to share the student's representation of the world. In this work, we introduce a new controlled experimental setting, GRADE, to study pedagogy and representational alignment. We use GRADE through a series of machine-machine and machine-human teaching experiments to characterize a utility curve defining a relationship between representational alignment, teacher expertise, and student learning outcomes. We find that improved representational alignment with a student improves student learning outcomes (i.e., task accuracy), but that this effect is moderated by the size and representational diversity of the class being taught. We use these insights to design a preliminary classroom matching procedure, GRADE-Match, that optimizes the assignment of students to teachers. When designing machine teachers, our results suggest that it is important to focus not only on accuracy, but also on representational alignment with human learners.