NAJul 12, 2018
Low-Rank Kernel Matrix Approximation Using Skeletonized Interpolation With Endo- or Exo-VerticesZixi Xu, Léopold Cambier, François-Henry Rouet et al. · stanford
The efficient compression of kernel matrices, for instance the off-diagonal blocks of discretized integral equations, is a crucial step in many algorithms. In this paper, we study the application of Skeletonized Interpolation to construct such factorizations. In particular, we study four different strategies for selecting the initial candidate pivots of the algorithm: Chebyshev grids, points on a sphere, maximally-dispersed and random vertices. Among them, the first two introduce new interpolation points (exo-vertices) while the last two are subsets of the given clusters (endo- vertices). We perform experiments using three real-world problems coming from the multiphysics code LS-DYNA. The pivot selection strategies are compared in term of quality (final rank) and efficiency (size of the initial grid). These benchmarks demonstrate that overall, maximally-dispersed vertices provide an accurate and efficient sets of pivots for most applications. It allows to reach near-optimal ranks while starting with relatively small sets of vertices, compared to other strategies.
HCApr 13
From Words to Widgets for Controllable LLM GenerationChao Zhang, Yiren Liu, Lunyiu Nie et al. · allen-ai
Natural language remains the predominant way people interact with large language models (LLMs). However, users often struggle to precisely express and control subjective preferences (e.g., tone, style, and emphasis) through prompting. We propose Malleable Prompting, a new interactive prompting technique for controllable LLM generation. It reifies preference expressions in natural language prompts into GUI widgets (e.g., sliders, dropdowns, and toggles) that users can directly configure to steer generation, while visualizing each control's influence on the output to support attribution and comparison across iterations. To enable this interaction, we introduce an LLM decoding algorithm that modulates the token probability distribution during generation based on preference expressions and their widget values. Through a user study, we show that Malleable Prompting helps participants achieve target preferences more precisely and is perceived as more controllable and transparent than natural language prompting alone.
HCSep 19, 2024
PersonaFlow: Designing LLM-Simulated Expert Perspectives for Enhanced Research IdeationYiren Liu, Pranav Sharma, Mehul Jitendra Oswal et al.
Generating interdisciplinary research ideas requires diverse domain expertise, but access to timely feedback is often limited by the availability of experts. In this paper, we introduce PersonaFlow, a novel system designed to provide multiple perspectives by using LLMs to simulate domain-specific experts. Our user studies showed that the new design 1) increased the perceived relevance and creativity of ideated research directions, and 2) promoted users' critical thinking activities (e.g., interpretation, analysis, evaluation, inference, and self-regulation), without increasing their perceived cognitive load. Moreover, users' ability to customize expert profiles significantly improved their sense of agency, which can potentially mitigate their over-reliance on AI. This work contributes to the design of intelligent systems that augment creativity and collaboration, and provides design implications of using customizable AI-simulated personas in domains within and beyond research ideation.
HCOct 23, 2023
Synergizing Human-AI Agency: A Guide of 23 Heuristics for Service Co-Creation with LLM-Based AgentsQingxiao Zheng, Zhongwei Xu, Abhinav Choudhry et al.
This empirical study serves as a primer for interested service providers to determine if and how Large Language Models (LLMs) technology will be integrated for their practitioners and the broader community. We investigate the mutual learning journey of non-AI experts and AI through CoAGent, a service co-creation tool with LLM-based agents. Engaging in a three-stage participatory design processes, we work with with 23 domain experts from public libraries across the U.S., uncovering their fundamental challenges of integrating AI into human workflows. Our findings provide 23 actionable "heuristics for service co-creation with AI", highlighting the nuanced shared responsibilities between humans and AI. We further exemplar 9 foundational agency aspects for AI, emphasizing essentials like ownership, fair treatment, and freedom of expression. Our innovative approach enriches the participatory design model by incorporating AI as crucial stakeholders and utilizing AI-AI interaction to identify blind spots. Collectively, these insights pave the way for synergistic and ethical human-AI co-creation in service contexts, preparing for workforce ecosystems where AI coexists.
HCOct 23, 2023
Learning Through AI-Clones: Enhancing Self-Perception and Presentation PerformanceQingxiao Zheng, Zhuoer Chen, Yun Huang
This study examines the impact of AI-generated digital clones with self-images on enhancing perceptions and skills in online presentations. A mixed-design experiment with 44 international students compared self-recording videos (self-recording group) to AI-clone videos (AI-clone group) for online English presentation practice. AI-clone videos were generated using voice cloning, face swapping, lip-syncing, and body-language simulation, refining the repetition, filler words, and pronunciation of participants' original presentations. Through the lens of social comparison theory, the results showed that AI clones functioned as positive "role models" for facilitating social comparisons. When comparing the effects on self-perceptions, speech qualities, and self-kindness, the self-recording group showed an increase in pronunciation satisfaction. However, the AI-clone group exhibited greater self-kindness, broader observational coverage, and a meaningful transition from a corrective to an enhancive approach in self-critique. Moreover, machine-rated scores revealed immediate performance gains only within the AI-clone group. Considering individual differences, aligning interventions with participants' regulatory focus significantly enhanced their learning experience. These findings highlight the theoretical, practical, and ethical implications of AI clones in supporting emotional and cognitive skill development.
HCSep 15, 2024
ValueCompass: A Framework for Measuring Contextual Value Alignment Between Human and LLMsHua Shen, Tiffany Knearem, Reshmi Ghosh et al.
As AI systems become more advanced, ensuring their alignment with a diverse range of individuals and societal values becomes increasingly critical. But how can we capture fundamental human values and assess the degree to which AI systems align with them? We introduce ValueCompass, a framework of fundamental values, grounded in psychological theory and a systematic review, to identify and evaluate human-AI alignment. We apply ValueCompass to measure the value alignment of humans and large language models (LLMs) across four real-world scenarios: collaborative writing, education, public sectors, and healthcare. Our findings reveal concerning misalignments between humans and LLMs, such as humans frequently endorse values like "National Security" which were largely rejected by LLMs. We also observe that values differ across scenarios, highlighting the need for context-aware AI alignment strategies. This work provides valuable insights into the design space of human-AI alignment, laying the foundations for developing AI systems that responsibly reflect societal values and ethics.
CVSep 15, 2024
Aligning AI with Public Values: Deliberation and Decision-Making for Governing Multimodal LLMs in Political Video AnalysisTanusree Sharma, Yujin Potter, Zachary Kilhoffer et al.
How AI models should deal with political topics has been discussed, but it remains challenging and requires better governance. This paper examines the governance of large language models through individual and collective deliberation, focusing on politically sensitive videos. We conducted a two-step study: interviews with 10 journalists established a baseline understanding of expert video interpretation; 114 individuals through deliberation using InclusiveAI, a platform that facilitates democratic decision-making through decentralized autonomous organization (DAO) mechanisms. Our findings reveal distinct differences in interpretative priorities: while experts emphasized emotion and narrative, the general public prioritized factual clarity, objectivity, and emotional neutrality. Furthermore, we examined how different governance mechanisms - quadratic vs. weighted voting and equal vs. 20/80 voting power - shape users' decision-making regarding AI behavior. Results indicate that voting methods significantly influence outcomes, with quadratic voting reinforcing perceptions of liberal democracy and political equality. Our study underscores the necessity of selecting appropriate governance mechanisms to better capture user perspectives and suggests decentralized AI governance as a potential way to facilitate broader public engagement in AI development, ensuring that varied perspectives meaningfully inform design decisions.
CLFeb 5
Copyright Detective: A Forensic System to Evidence LLMs Flickering Copyright Leakage RisksGuangwei Zhang, Jianing Zhu, Cheng Qian et al.
We present Copyright Detective, the first interactive forensic system for detecting, analyzing, and visualizing potential copyright risks in LLM outputs. The system treats copyright infringement versus compliance as an evidence discovery process rather than a static classification task due to the complex nature of copyright law. It integrates multiple detection paradigms, including content recall testing, paraphrase-level similarity analysis, persuasive jailbreak probing, and unlearning verification, within a unified and extensible framework. Through interactive prompting, response collection, and iterative workflows, our system enables systematic auditing of verbatim memorization and paraphrase-level leakage, supporting responsible deployment and transparent evaluation of LLM copyright risks even with black-box access.
HCSep 24, 2024
Improving Emotional Support Delivery in Text-Based Community Safety Reporting Using Large Language ModelsYiren Liu, Yerong Li, Ryan Mayfield et al.
Emotional support is a crucial aspect of communication between community members and police dispatchers during incident reporting. However, there is a lack of understanding about how emotional support is delivered through text-based systems, especially in various non-emergency contexts. In this study, we analyzed two years of chat logs comprising 57,114 messages across 8,239 incidents from 130 higher education institutions. Our empirical findings revealed significant variations in emotional support provided by dispatchers, influenced by the type of incident, service time, and a noticeable decline in support over time across multiple organizations. To improve the consistency and quality of emotional support, we developed and implemented a fine-tuned Large Language Model (LLM), named dispatcherLLM. We evaluated dispatcherLLM by comparing its generated responses to those of human dispatchers and other off-the-shelf models using real chat messages. Additionally, we conducted a human evaluation to assess the perceived effectiveness of the support provided by dispatcherLLM. This study not only contributes new empirical understandings of emotional support in text-based dispatch systems but also demonstrates the significant potential of generative AI in improving service delivery.
HCApr 4
YT-Pilot: Turning YouTube into Structured Learning Pathways with Context-Aware AI SupportDina Albassam, Kexin Quan, Mengke Wu et al.
YouTube is widely used for informal learning, where learners explore lectures and tutorials without a predefined curriculum. However, learning across videos remains fragmented: learners must decide what to watch, how videos relate, and how knowledge builds. Existing tools provide partial support but treat planning and learning as separate activities, lacking a persistent interaction structure that connects them. Grounded in self-regulated learning theory (SRLT), we introduce YT-Pilot, a pathway-aware learning system that operationalizes the learning pathway as a persistent, user-facing interaction structure spanning planning and learning. The pathway coordinates goal setting, planning, navigation, progress tracking, and cross-video assistance. Through a within-subjects study ($N=20$), we show that YT-Pilot significantly improves perceived goal clarity, pathway coherence, and progress tracking, while shifting interaction toward pathway-level reasoning across multiple resources.
CHEM-PHMay 11
Physical probes expose and alleviate chemical-environment collapse in molecular representationsJiebin Fang, Zidi Yan, Churu Mao et al.
Nuclear magnetic resonance (NMR) spectroscopy provides an experimental readout of local chemical environments, but its use in molecular representation learning has been constrained by heterogeneous data and incomplete atom-level assignments. Here we construct complementary high-fidelity experimental and computational 13C NMR resources, which reveal a recurrent form of representational collapse: atoms that are equivalent in molecular topology can remain experimentally distinct in their real chemical environments, whereas explicit 3D descriptions are further limited by static conformations in dynamic regimes. To alleviate this bottleneck, we develop CLAIM (Contrastive Learning for Atom-to-molecule Inference of Molecular NMR), a framework that aligns efficient topological molecular inputs with atom-resolved NMR observables. Through hierarchical chemical priors and cross-level contrastive learning, CLAIM restores lost chemical resolution and markedly improves atom-level molecule-spectrum retrieval. CLAIM remains robust in flexible and tautomeric systems for 13C NMR prediction, improves stereoisomer discrimination without explicit 3D modelling, and transfers to broader molecular property tasks including ADMET prediction and fluorescence estimation. These results establish physically grounded spectral alignment as an effective strategy for alleviating chemical-environment collapse and for guiding experimentally grounded molecular representation learning.
CLMar 18, 2024
Reference-based Metrics Disprove Themselves in Question GenerationBang Nguyen, Mengxia Yu, Yun Huang et al.
Reference-based metrics such as BLEU and BERTScore are widely used to evaluate question generation (QG). In this study, on QG benchmarks such as SQuAD and HotpotQA, we find that using human-written references cannot guarantee the effectiveness of the reference-based metrics. Most QG benchmarks have only one reference; we replicate the annotation process and collect another reference. A good metric is expected to grade a human-validated question no worse than generated questions. However, the results of reference-based metrics on our newly collected reference disproved the metrics themselves. We propose a reference-free metric consisted of multi-dimensional criteria such as naturalness, answerability, and complexity, utilizing large language models. These criteria are not constrained to the syntactic or semantic of a single reference question, and the metric does not require a diverse set of references. Experiments reveal that our metric accurately distinguishes between high-quality questions and flawed ones, and achieves state-of-the-art alignment with human judgment.
CLMay 1, 2024
Social Life Simulation for Non-Cognitive Skills LearningZihan Yan, Yaohong Xiang, Yun Huang
Non-cognitive skills are crucial for personal and social life well-being, and such skill development can be supported by narrative-based (e.g., storytelling) technologies. While generative AI enables interactive and role-playing storytelling, little is known about how users engage with and perceive the use of AI in social life simulation for non-cognitive skills learning. Additionally, the benefits of AI mentorship on self-reflection awareness and ability in this context remain largely underexplored. To this end, we introduced Simulife++, an interactive platform enabled by a large language model (LLM). The system allows users to act as protagonists, creating stories with one or multiple AI-based characters in diverse social scenarios. In particular, we expanded the Human-AI interaction to a Human-AI-AI collaboration by including a Sage Agent, who acts as a bystander, providing users with some perspectives and guidance on their choices and conversations in terms of non-cognitive skills to promote reflection. In a within-subject user study, our quantitative results reveal that, when accompanied by Sage Agent, users exhibit significantly higher levels of reflection on motivation, self-perceptions, and resilience & coping, along with an enhanced experience of narrative transportation. Additionally, our qualitative findings suggest that Sage Agent plays a crucial role in promoting reflection on non-cognitive skills, enhancing social communication and decision-making performance, and improving overall user experience within Simulife++. Multiple supportive relationships between Sage Agent and users were also reported. We offer design implications for the application of generative AI in narrative solutions and the future potential of Sage Agent for non-cognitive skill development in broader social contexts.
CYJan 17, 2025
An Integrated Platform for Studying Learning with Intelligent Tutoring Systems: CTAT+TutorShopVincent Aleven, Conrad Borchers, Yun Huang et al.
Intelligent tutoring systems (ITSs) are effective in helping students learn; further research could make them even more effective. Particularly desirable is research into how students learn with these systems, how these systems best support student learning, and what learning sciences principles are key in ITSs. CTAT+Tutorshop provides a full stack integrated platform that facilitates a complete research lifecycle with ITSs, which includes using ITS data to discover learner challenges, to identify opportunities for system improvements, and to conduct experimental studies. The platform includes authoring tools to support and accelerate development of ITS, which provide automatic data logging in a format compatible with DataShop, an independent site that supports the analysis of ed tech log data to study student learnings. Among the many technology platforms that exist to support learning sciences research, CTAT+Tutorshop may be the only one that offers researchers the possibility to author elements of ITSs, or whole ITSs, as part of designing studies. This platform has been used to develop and conduct an estimated 147 research studies which have run in a wide variety of laboratory and real-world educational settings, including K-12 and higher education, and have addressed a wide range of research questions. This paper presents five case studies of research conducted on the CTAT+Tutorshop platform, and summarizes what has been accomplished and what is possible for future researchers. We reflect on the distinctive elements of this platform that have made it so effective in facilitating a wide range of ITS research.
HCSep 24, 2025
Perspectra: Choosing Your Experts Enhances Critical Thinking in Multi-Agent Research IdeationYiren Liu, Viraj Shah, Sangho Suh et al. · allen-ai
Recent advances in multi-agent systems (MAS) enable tools for information search and ideation by assigning personas to agents. However, how users can effectively control, steer, and critically evaluate collaboration among multiple domain-expert agents remains underexplored. We present Perspectra, an interactive MAS that visualizes and structures deliberation among LLM agents via a forum-style interface, supporting @-mention to invite targeted agents, threading for parallel exploration, with a real-time mind map for visualizing arguments and rationales. In a within-subjects study with 18 participants, we compared Perspectra to a group-chat baseline as they developed research proposals. Our findings show that Perspectra significantly increased the frequency and depth of critical-thinking behaviors, elicited more interdisciplinary replies, and led to more frequent proposal revisions than the group chat condition. We discuss implications for designing multi-agent tools that scaffold critical thinking by supporting user control over multi-agent adversarial discourse.
CLFeb 20, 2025
SuperGPQA: Scaling LLM Evaluation across 285 Graduate DisciplinesM-A-P Team, Xinrun Du, Yifan Yao et al.
Large language models (LLMs) have demonstrated remarkable proficiency in mainstream academic disciplines such as mathematics, physics, and computer science. However, human knowledge encompasses over 200 specialized disciplines, far exceeding the scope of existing benchmarks. The capabilities of LLMs in many of these specialized fields-particularly in light industry, agriculture, and service-oriented disciplines-remain inadequately evaluated. To address this gap, we present SuperGPQA, a comprehensive benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines. Our benchmark employs a novel Human-LLM collaborative filtering mechanism to eliminate trivial or ambiguous questions through iterative refinement based on both LLM responses and expert feedback. Our experimental results reveal significant room for improvement in the performance of current state-of-the-art LLMs across diverse knowledge domains (e.g., the reasoning-focused model DeepSeek-R1 achieved the highest accuracy of 61.82% on SuperGPQA), highlighting the considerable gap between current model capabilities and artificial general intelligence. Additionally, we present comprehensive insights from our management of a large-scale annotation process, involving over 80 expert annotators and an interactive Human-LLM collaborative system, offering valuable methodological guidance for future research initiatives of comparable scope.
CLJan 6, 2025
VicSim: Enhancing Victim Simulation with Emotional and Linguistic FidelityYerong Li, Yiren Liu, Yun Huang
Scenario-based training has been widely adopted in many public service sectors. Recent advancements in Large Language Models (LLMs) have shown promise in simulating diverse personas to create these training scenarios. However, little is known about how LLMs can be developed to simulate victims for scenario-based training purposes. In this paper, we introduce VicSim (victim simulator), a novel model that addresses three key dimensions of user simulation: informational faithfulness, emotional dynamics, and language style (e.g., grammar usage). We pioneer the integration of scenario-based victim modeling with GAN-based training workflow and key-information-based prompting, aiming to enhance the realism of simulated victims. Our adversarial training approach teaches the discriminator to recognize grammar and emotional cues as reliable indicators of synthetic content. According to evaluations by human raters, the VicSim model outperforms GPT-4 in terms of human-likeness.
HCFeb 20, 2022
UX Research on Conversational Human-AI Interaction: A Literature Review of the ACM Digital LibraryQingxiao Zheng, Yiliu Tang, Yiren Liu et al.
Early conversational agents (CAs) focused on dyadic human-AI interaction between humans and the CAs, followed by the increasing popularity of polyadic human-AI interaction, in which CAs are designed to mediate human-human interactions. CAs for polyadic interactions are unique because they encompass hybrid social interactions, i.e., human-CA, human-to-human, and human-to-group behaviors. However, research on polyadic CAs is scattered across different fields, making it challenging to identify, compare, and accumulate existing knowledge. To promote the future design of CA systems, we conducted a literature review of ACM publications and identified a set of works that conducted UX (user experience) research. We qualitatively synthesized the effects of polyadic CAs into four aspects of human-human interactions, i.e., communication, engagement, connection, and relationship maintenance. Through a mixed-method analysis of the selected polyadic and dyadic CA studies, we developed a suite of evaluation measurements on the effects. Our findings show that designing with social boundaries, such as privacy, disclosure, and identification, is crucial for ethical polyadic CAs. Future research should also advance usability testing methods and trust-building guidelines for conversational AI.
CYJan 15, 2022
"It's A Blessing and A Curse": Unpacking Creators' Practices with Non-Fungible Tokens (NFTs) and Their CommunitiesTanusree Sharma, Zhixuan Zhou, Yun Huang et al.
NFTs (Non-Fungible Tokens) are blockchain-based cryptographic tokens to represent ownership of unique content such as images, videos, or 3D objects. Despite NFTs' increasing popularity and skyrocketing trading prices, little is known about people's perceptions of and experiences with NFTs. In this work, we focus on NFT creators and present results of an exploratory qualitative study in which we interviewed 15 NFT creators from nine different countries. Our participants had nuanced feelings about NFTs and their communities. We found that most of our participants were enthusiastic about the underlying technologies and how they empower individuals to express their creativity and pursue new business models of content creation. Our participants also gave kudos to the NFT communities that have supported them to learn, collaborate, and grow in their NFT endeavors. However, these positivities were juxtaposed by their accounts of the many challenges that they encountered and thorny issues that the NFT ecosystem is grappling with around ownership of digital content, low-quality NFTs, scams, possible money laundering, and regulations. We discuss how the built-in properties (e.g., decentralization) of blockchains and NFTs might have contributed to some of these issues. We present design implications on how to improve the NFT ecosystem (e.g., making NFTs even more accessible to newcomers and the broader population).