Henry Bell

CL
3papers
1citation
Novelty40%
AI Score38

3 Papers

LGJan 26
Beyond Preferences: Learning Alignment Principles Grounded in Human Reasons and Values

Henry Bell, Lara Neubauer da Costa Schertel, Bochu Ding et al.

A crucial consideration when developing and deploying Large Language Models (LLMs) is the human values to which these models are aligned. In the constitutional framework of alignment models are aligned to a set of principles (the constitution) specified in natural language. However, it is unclear how to fairly determine this constitution with widespread stakeholder input. In this work we propose Grounded Constitutional AI (GCAI), a unified framework for generating constitutions of principles that are representative of both users' general expectations toward AI (general principles) and their interaction-time preferences (contextual principles). We extend the Inverse Constitutional AI (ICAI) approach to generate contextual principles from human preference annotation data by leveraging human-provided \textit{reasons} for their preferences. We supplement these contextual principles with general principles surfaced from user statements of \textit{values} regarding AI. We show that a constitution generated by GCAI is preferred by humans over one generated through ICAI both personally, and for widespread use in governing AI behavior. Additionally participants consider the GCAI constitution to be more morally grounded, coherent, and pluralistic.

CLJan 26
Reflect: Transparent Principle-Guided Reasoning for Constitutional Alignment at Scale

Henry Bell, Caroline Zhang, Mohammed Mobasserul Haque et al.

The constitutional framework of alignment aims to align large language models (LLMs) with value-laden principles written in natural language (such as to avoid using biased language). Prior work has focused on parameter fine-tuning techniques, such as reinforcement learning from human feedback (RLHF), to instill these principles. However, these approaches are computationally demanding, require careful engineering and tuning, and often require difficult-to-obtain human annotation data. We propose \textsc{reflect}, an inference-time framework for constitutional alignment that does not require any training or data, providing a plug-and-play approach for aligning an instruction-tuned model to a set of principles. \textsc{reflect} operates entirely in-context, combining a (i) constitution-conditioned base response with post-generation (ii) self-evaluation, (iii)(a) self-critique, and (iii)(b) final revision. \textsc{reflect}'s technique of explicit in-context reasoning over principles during post-generation outperforms standard few-shot prompting and provides transparent reasoning traces. Our results demonstrate that \textsc{reflect} significantly improves LLM conformance to diverse and complex principles, including principles quite distinct from those emphasized in the model's original parameter fine-tuning, without sacrificing factual reasoning. \textsc{reflect} is particularly effective at reducing the rate of rare but significant violations of principles, thereby improving safety and robustness in the tail end of the distribution of generations. Finally, we show that \textsc{reflect} naturally generates useful training data for traditional parameter fine-tuning techniques, allowing for efficient scaling and the reduction of inference-time computational overhead in long-term deployment scenarios.

CYJul 14, 2025
"Is it always watching? Is it always listening?" Exploring Contextual Privacy and Security Concerns Toward Domestic Social Robots

Henry Bell, Jabari Kwesi, Hiba Laabadli et al.

Equipped with artificial intelligence (AI) and advanced sensing capabilities, social robots are gaining interest among consumers in the United States. These robots seem like a natural evolution of traditional smart home devices. However, their extensive data collection capabilities, anthropomorphic features, and capacity to interact with their environment make social robots a more significant security and privacy threat. Increased risks include data linkage, unauthorized data sharing, and the physical safety of users and their homes. It is critical to investigate U.S. users' security and privacy needs and concerns to guide the design of social robots while these devices are still in the early stages of commercialization in the U.S. market. Through 19 semi-structured interviews, we identified significant security and privacy concerns, highlighting the need for transparency, usability, and robust privacy controls to support adoption. For educational applications, participants worried most about misinformation, and in medical use cases, they worried about the reliability of these devices. Participants were also concerned with the data inference that social robots could enable. We found that participants expect tangible privacy controls, indicators of data collection, and context-appropriate functionality.