CLAug 29, 2024Code
PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in ActionYijia Shao, Tianshi Li, Weiyan Shi et al. · harvard, stanford
As language models (LMs) are widely utilized in personalized communication scenarios (e.g., sending emails, writing social media posts) and endowed with a certain level of agency, ensuring they act in accordance with the contextual privacy norms becomes increasingly critical. However, quantifying the privacy norm awareness of LMs and the emerging privacy risk in LM-mediated communication is challenging due to (1) the contextual and long-tailed nature of privacy-sensitive cases, and (2) the lack of evaluation approaches that capture realistic application scenarios. To address these challenges, we propose PrivacyLens, a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories, enabling multi-level evaluation of privacy leakage in LM agents' actions. We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds. Using this dataset, we reveal a discrepancy between LM performance in answering probing questions and their actual behavior when executing user instructions in an agent setup. State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions. We also demonstrate the dynamic nature of PrivacyLens by extending each seed into multiple trajectories to red-team LM privacy leakage risk. Dataset and code are available at https://github.com/SALT-NLP/PrivacyLens.
CLOct 28, 2022
When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good LabelsWeiyan Shi, Emily Dinan, Kurt Shuster et al. · meta-ai
Deployed dialogue agents have the potential to integrate human feedback to continuously improve themselves. However, humans may not always provide explicit signals when the chatbot makes mistakes during interactions. In this work, we propose Juicer, a framework to make use of both binary and free-form textual human feedback. It works by: (i) extending sparse binary feedback by training a satisfaction classifier to label the unlabeled data; and (ii) training a reply corrector to map the bad replies to good ones. We find that augmenting training with model-corrected replies improves the final dialogue model, and we can further improve performance by using both positive and negative replies through the recently proposed Director model.
CLNov 22, 2022
AutoReply: Detecting Nonsense in Dialogue Introspectively with Discriminative RepliesWeiyan Shi, Emily Dinan, Adi Renduchintala et al. · cmu
Existing approaches built separate classifiers to detect nonsense in dialogues. In this paper, we show that without external classifiers, dialogue models can detect errors in their own messages introspectively, by calculating the likelihood of replies that are indicative of poor messages. For example, if an agent believes its partner is likely to respond "I don't understand" to a candidate message, that message may not make sense, so an alternative message should be chosen. We evaluate our approach on a dataset from the game Diplomacy, which contains long dialogues richly grounded in the game state, on which existing models make many errors. We first show that hand-crafted replies can be effective for the task of detecting nonsense in applications as complex as Diplomacy. We then design AutoReply, an algorithm to search for such discriminative replies automatically, given a small number of annotated dialogue examples. We find that AutoReply-generated replies outperform handcrafted replies and perform on par with carefully fine-tuned large supervised models. Results also show that one single reply without much computation overheads can also detect dialogue nonsense reasonably well.
CLOct 11, 2022
Social Influence Dialogue Systems: A Survey of Datasets and Models For Social Influence TasksKushal Chawla, Weiyan Shi, Jingwen Zhang et al.
Dialogue systems capable of social influence such as persuasion, negotiation, and therapy, are essential for extending the use of technology to numerous realistic scenarios. However, existing research primarily focuses on either task-oriented or open-domain scenarios, a categorization that has been inadequate for capturing influence skills systematically. There exists no formal definition or category for dialogue systems with these skills and data-driven efforts in this direction are highly limited. In this work, we formally define and introduce the category of social influence dialogue systems that influence users' cognitive and emotional responses, leading to changes in thoughts, opinions, and behaviors through natural conversations. We present a survey of various tasks, datasets, and methods, compiling the progress across seven diverse domains. We discuss the commonalities and differences between the examined systems, identify limitations, and recommend future directions. This study serves as a comprehensive reference for social influence dialogue systems to inspire more dedicated research and discussion in this emerging area.
CLApr 15, 2022
Just Fine-tune Twice: Selective Differential Privacy for Large Language ModelsWeiyan Shi, Ryan Shea, Si Chen et al.
Protecting large language models from privacy leakage is becoming increasingly crucial with their wide adoption in real-world products. Yet applying differential privacy (DP), a canonical notion with provable privacy guarantees for machine learning models, to those models remains challenging due to the trade-off between model utility and privacy loss. Utilizing the fact that sensitive information in language data tends to be sparse, Shi et al. (2021) formalized a DP notion extension called Selective Differential Privacy (SDP) to protect only the sensitive tokens defined by a policy function. However, their algorithm only works for RNN-based models. In this paper, we develop a novel framework, Just Fine-tune Twice (JFT), that achieves SDP for state-of-the-art large transformer-based models. Our method is easy to implement: it first fine-tunes the model with redacted in-domain data, and then fine-tunes it again with the original in-domain data using a private training mechanism. Furthermore, we study the scenario of imperfect implementation of policy functions that misses sensitive tokens and develop systematic methods to handle it. Experiments show that our method achieves strong utility compared to previous baselines. We also analyze the SDP privacy guarantee empirically with the canary insertion attack.
CLJul 4, 2024
Zero-shot Persuasive Chatbots with LLM-Generated Strategies and Information RetrievalKazuaki Furumai, Roberto Legaspi, Julio Vizcarra et al. · stanford
Persuasion plays a pivotal role in a wide range of applications from health intervention to the promotion of social good. Persuasive chatbots employed responsibly for social good can be an enabler of positive individual and social change. Existing methods rely on fine-tuning persuasive chatbots with task-specific training data which is costly, if not infeasible, to collect. Furthermore, they employ only a handful of pre-defined persuasion strategies. We propose PersuaBot, a zero-shot chatbot based on Large Language Models (LLMs) that is factual and more persuasive by leveraging many more nuanced strategies. PersuaBot uses an LLM to first generate natural responses, from which the strategies used are extracted. To combat hallucination of LLMs, Persuabot replace any unsubstantiated claims in the response with retrieved facts supporting the extracted strategies. We applied our chatbot, PersuaBot, to three significantly different domains needing persuasion skills: donation solicitation, recommendations, and health intervention. Our experiments on simulated and human conversations show that our zero-shot approach is more persuasive than prior work, while achieving factual accuracy surpassing state-of-the-art knowledge-oriented chatbots.
CLMar 15, 2022
Seamlessly Integrating Factual Information and Social Content with Persuasive DialogueMaximillian Chen, Weiyan Shi, Feifan Yan et al.
Complex conversation settings such as persuasion involve communicating changes in attitude or behavior, so users' perspectives need to be addressed, even when not directly related to the topic. In this work, we contribute a novel modular dialogue system framework that seamlessly integrates factual information and social content into persuasive dialogue. Our framework is generalizable to any dialogue tasks that have mixed social and task contents. We conducted a study that compared user evaluations of our framework versus a baseline end-to-end generation model. We found our framework was evaluated more favorably in all dimensions including competence and friendliness, compared to the end-to-end model which does not explicitly handle social content or factual questions.
CLNov 16, 2023
Decoding Susceptibility: Modeling Misbelief to Misinformation Through a Computational ApproachYanchen Liu, Mingyu Derek Ma, Wenna Qin et al. · gatech, harvard
Susceptibility to misinformation describes the degree of belief in unverifiable claims, a latent aspect of individuals' mental processes that is not observable. Existing susceptibility studies heavily rely on self-reported beliefs, which can be subject to bias, expensive to collect, and challenging to scale for downstream applications. To address these limitations, in this work, we propose a computational approach to model users' latent susceptibility levels. As shown in previous research, susceptibility is influenced by various factors (e.g., demographic factors, political ideology), and directly influences people's reposting behavior on social media. To represent the underlying mental process, our susceptibility modeling incorporates these factors as inputs, guided by the supervision of people's sharing behavior. Using COVID-19 as a testbed domain, our experiments demonstrate a significant alignment between the susceptibility scores estimated by our computational modeling and human judgments, confirming the effectiveness of this latent modeling approach. Furthermore, we apply our model to annotate susceptibility scores on a large-scale dataset and analyze the relationships between susceptibility with various factors. Our analysis reveals that political leanings and psychological factors exhibit varying degrees of association with susceptibility to COVID-19 misinformation.
CRFeb 13Code
Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using AgentsXu Li, Simon Yu, Minzhou Pan et al.
LLM-based agents are becoming increasingly capable, yet their safety lags behind. This creates a gap between what agents can do and should do. This gap widens as agents engage in multi-turn interactions and employ diverse tools, introducing new risks overlooked by existing benchmarks. To systematically scale safety testing into multi-turn, tool-realistic settings, we propose a principled taxonomy that transforms single-turn harmful tasks into multi-turn attack sequences. Using this taxonomy, we construct MT-AgentRisk (Multi-Turn Agent Risk Benchmark), the first benchmark to evaluate multi-turn tool-using agent safety. Our experiments reveal substantial safety degradation: the Attack Success Rate (ASR) increases by 16% on average across open and closed models in multi-turn settings. To close this gap, we propose ToolShield, a training-free, tool-agnostic, self-exploration defense: when encountering a new tool, the agent autonomously generates test cases, executes them to observe downstream effects, and distills safety experiences for deployment. Experiments show that ToolShield effectively reduces ASR by 30% on average in multi-turn interactions. Our code is available at https://github.com/CHATS-lab/ToolShield.
CLFeb 23
Assessing Risks of Large Language Models in Mental Health Support: A Framework for Automated Clinical AI Red TeamingIan Steenstra, Paola Pedrelli, Weiyan Shi et al.
Large Language Models (LLMs) are increasingly utilized for mental health support; however, current safety benchmarks often fail to detect the complex, longitudinal risks inherent in therapeutic dialogue. We introduce an evaluation framework that pairs AI psychotherapists with simulated patient agents equipped with dynamic cognitive-affective models and assesses therapy session simulations against a comprehensive quality of care and risk ontology. We apply this framework to a high-impact test case, Alcohol Use Disorder, evaluating six AI agents (including ChatGPT, Gemini, and Character.AI) against a clinically-validated cohort of 15 patient personas representing diverse clinical phenotypes. Our large-scale simulation (N=369 sessions) reveals critical safety gaps in the use of AI for mental health support. We identify specific iatrogenic risks, including the validation of patient delusions ("AI Psychosis") and failure to de-escalate suicide risk. Finally, we validate an interactive data visualization dashboard with diverse stakeholders, including AI engineers and red teamers, mental health professionals, and policy experts (N=9), demonstrating that this framework effectively enables stakeholders to audit the "black box" of AI psychotherapy. These findings underscore the critical safety risks of AI-provided mental health support and the necessity of simulation-based clinical red teaming before deployment.
AIMay 11Code
Shepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution TraceSimon Yu, Derek Chong, Ananjan Nandi et al.
We introduce Shepherd, a functional programming model that formalizes meta-agent operations on target agents as functions, with core operations mechanized in Lean. Shepherd records every agent-environment interaction as a typed event in a Git-like execution trace, enabling any past state to be forked and replayed. The system forks the agent process and its filesystem $5\times$ faster than Docker, achieving $>95\%$ prompt-cache reuse on replay. We demonstrate the model through three applications. First, in runtime intervention, a live supervisor increases pair coding pass rates from 28.8% to 54.7% on CooperBench. Second, in counterfactual meta-optimization, branching exploration outperforms baselines across four benchmarks by up to 11 points while reducing wall-clock time by up to 58%. Third, in Tree-RL training, forking rollouts at selected turns improves TerminalBench-2 performance from 34.2% to 39.4%. These results establish Shepherd as an efficient infrastructure for programming meta-agents. We open-source the system to support future research.
HCApr 13
When Drawing Is Not Enough: Exploring Spontaneous Speech with Sketch for Intent Alignment in Multimodal LLMsWeiyan Shi, Dorien Herremans, Kenny Tsu Wei Choo
Early-stage design ideation often relies on rough sketches created under time pressure, leaving much of the designer's intent implicit. In practice, designers frequently speak while sketching, verbally articulating functional goals and ideas that are difficult to express visually. We introduce TalkSketchD, a sketch-while-speaking dataset that captures spontaneous speech temporally aligned with freehand sketches during early-stage toaster ideation. To examine the dataset's value, we conduct a sketch-to-image generation study comparing sketch-only inputs with sketches augmented by concurrent speech transcripts using multimodal large language models (MLLMs). Generated images are evaluated against designers' self-reported intent using a reasoning MLLM as a judge. Quantitative results show that incorporating spontaneous speech significantly improves judged intent alignment of generated design images across form, function, experience, and overall intent. These findings demonstrate that temporally aligned sketch-and-speech data can enhance MLLMs' ability to interpret user intent in early-stage design ideation.
CLApr 23, 2024Code
CultureBank: An Online Community-Driven Knowledge Base Towards Culturally Aware Language TechnologiesWeiyan Shi, Ryan Li, Yutong Zhang et al.
To enhance language models' cultural awareness, we design a generalizable pipeline to construct cultural knowledge bases from different online communities on a massive scale. With the pipeline, we construct CultureBank, a knowledge base built upon users' self-narratives with 12K cultural descriptors sourced from TikTok and 11K from Reddit. Unlike previous cultural knowledge resources, CultureBank contains diverse views on cultural descriptors to allow flexible interpretation of cultural knowledge, and contextualized cultural scenarios to help grounded evaluation. With CultureBank, we evaluate different LLMs' cultural awareness, and identify areas for improvement. We also fine-tune a language model on CultureBank: experiments show that it achieves better performances on two downstream cultural tasks in a zero-shot setting. Finally, we offer recommendations based on our findings for future culturally aware language technologies. The project page is https://culturebank.github.io . The code and model is at https://github.com/SALT-NLP/CultureBank . The released CultureBank dataset is at https://huggingface.co/datasets/SALT-NLP/CultureBank .
CLSep 21, 2024
Contrastive Learning for Knowledge-Based Question Generation in Large Language ModelsZhenhong Zhang, Jiajing Chen, Weiyan Shi et al.
With the rapid development of artificial intelligence technology, especially the increasingly widespread application of question-and-answer systems, high-quality question generation has become a key component in supporting the development of these systems. This article focuses on knowledge-based question generation technology, which aims to enable computers to simulate the human questioning process based on understanding specific texts or knowledge bases. In light of the issues of hallucination and knowledge gaps present in large-scale language models when applied to knowledge-intensive tasks, this paper proposes an enhanced question generation method that incorporates contrastive learning. This method utilizes multiple models to jointly mine domain knowledge and uses contrastive learning to guide the model in reducing noise and hallucinations in generation. Experimental results show that by designing prompts containing contrasting examples, the model's performance in question generation improves considerably, particularly when contrasting instructions and examples are used simultaneously, leading to the highest quality of generated questions and improved accuracy. These results demonstrate that the method proposed in this study, which combines contrasting context and chain-of-thought prompts, can effectively improve both the quality and the practicality of question generation.
LGFeb 14, 2024Code
The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward PassesMyeongseob Ko, Feiyang Kang, Weiyan Shi et al.
Large-scale black-box models have become ubiquitous across numerous applications. Understanding the influence of individual training data sources on predictions made by these models is crucial for improving their trustworthiness. Current influence estimation techniques involve computing gradients for every training point or repeated training on different subsets. These approaches face obvious computational challenges when scaled up to large datasets and models. In this paper, we introduce and explore the Mirrored Influence Hypothesis, highlighting a reciprocal nature of influence between training and test data. Specifically, it suggests that evaluating the influence of training data on test predictions can be reformulated as an equivalent, yet inverse problem: assessing how the predictions for training samples would be altered if the model were trained on specific test samples. Through both empirical and theoretical validations, we demonstrate the wide applicability of our hypothesis. Inspired by this, we introduce a new method for estimating the influence of training data, which requires calculating gradients for specific test samples, paired with a forward pass for each training point. This approach can capitalize on the common asymmetry in scenarios where the number of test samples under concurrent examination is much smaller than the scale of the training dataset, thus gaining a significant improvement in efficiency compared to existing approaches. We demonstrate the applicability of our method across a range of scenarios, including data attribution in diffusion models, data leakage detection, analysis of memorization, mislabeled data detection, and tracing behavior in language models. Our code will be made available at https://github.com/ruoxi-jia-group/Forward-INF.
CVSep 18, 2024
Exploring Gaze Pattern Differences Between Autistic and Neurotypical Children: Clustering, Visualisation, and PredictionWeiyan Shi, Haihong Zhang, Wei Wang et al.
Autism Spectrum Disorder (ASD) affects children's social and communication abilities, with eye-tracking widely used to identify atypical gaze patterns. While unsupervised clustering can automate the creation of areas of interest for gaze feature extraction, the use of internal cluster validity indices, like Silhouette Coefficient, to distinguish gaze pattern differences between ASD and typically developing (TD) children remains underexplored. We explore whether internal cluster validity indices can distinguish ASD from TD children. Specifically, we apply seven clustering algorithms to gaze points and extract 63 internal cluster validity indices to reveal correlations with ASD diagnosis. Using these indices, we train predictive models for ASD diagnosis. Experiments on three datasets demonstrate high predictive accuracy (81\% AUC), validating the effectiveness of these indices.
CLJan 12, 2024
How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMsYi Zeng, Hongpeng Lin, Jingwen Zhang et al.
Most traditional AI safety research has approached AI models as machines and centered on algorithm-focused attacks developed by security experts. As large language models (LLMs) become increasingly common and competent, non-expert users can also impose risks during daily interactions. This paper introduces a new perspective to jailbreak LLMs as human-like communicators, to explore this overlooked intersection between everyday language interaction and AI safety. Specifically, we study how to persuade LLMs to jailbreak them. First, we propose a persuasion taxonomy derived from decades of social science research. Then, we apply the taxonomy to automatically generate interpretable persuasive adversarial prompts (PAP) to jailbreak LLMs. Results show that persuasion significantly increases the jailbreak performance across all risk categories: PAP consistently achieves an attack success rate of over $92\%$ on Llama 2-7b Chat, GPT-3.5, and GPT-4 in $10$ trials, surpassing recent algorithm-focused attacks. On the defense side, we explore various mechanisms against PAP and, found a significant gap in existing defenses, and advocate for more fundamental mitigation for highly interactive LLMs
AIMay 13
How to Interpret Agent BehaviorJie Gao, Kaiser Sun, Jen-tse Huang et al.
Autonomous agents such as Claude Code and Codex now operate for hours or even days. Understanding their runtime behavior has become critical for downstream tasks such as diagnosing inefficiencies, fixing bugs, and ensuring better oversight. A primary way to gain this understanding is analyzing the reasoning trajectories and execution traces these agents generate. Yet such data remains in unstructured natural-language form, making it difficult for humans to interpret at scale. We introduce ACT*ONOMY (a combination of Action and Taxonomy), a taxonomy for describing and analyzing agent behavior at runtime. ACT*ONOMY has two components: (1) the taxonomy itself, developed through Grounded Theory and structured as a three-level hierarchy of 10 actions, 46 subactions, and 120 leaf categories; and (2) an open repository that hosts the living taxonomy, provides an automated analysis pipeline that applies it to agent trajectories analysis, and defines an extension protocol for customization and growth. Our experiments show that ACTONOMY can compare behavioral profiles across agents and characterize a single agent's behavior across diverse trajectories, surfacing patterns indicative of failure modes. By providing a shared vocabulary, ACT*ONOMY helps researchers, agent designers, and end users interpret agent behavior more consistently, enabling better oversight and control.
LGOct 1, 2025Code
GEM: A Gym for Agentic LLMsZichen Liu, Anya Sims, Keyu Duan et al.
The training paradigm for large language models (LLMs) is moving from static datasets to experience-based learning, where agents acquire skills via interacting with complex environments. To facilitate this transition we introduce GEM (General Experience Maker), an open-source environment simulator designed for the age of LLMs. Analogous to OpenAI-Gym for traditional reinforcement learning (RL), GEM provides a standardized framework for the environment-agent interface, including asynchronous vectorized execution for high throughput, and flexible wrappers for easy extensibility. GEM also features a diverse suite of environments, robust integrated tools, and single-file example scripts demonstrating using GEM with five popular RL training frameworks. Along with this, we also provide a set of baselines across 24 environments using REINFORCE with Return Batch Normalization (ReBN), which -- unlike GRPO -- is compatible with the full RL setting of dense per-turn rewards and offers better credit assignment. We further conduct apple-to-apple benchmarking of PPO, GRPO and REINFORCE in both single- and multi-turn settings using GEM to shed light on the algorithmic designs. Lastly, GEM also functions as a convenient evaluation toolkit besides a training environment. We hope this framework can help accelerate future agentic LLM research.
CLDec 8, 2024Code
7B Fully Open Source Moxin-LLM/VLM -- From Pretraining to GRPO-based Reinforcement Learning EnhancementPu Zhao, Xuan Shen, Zhenglun Kong et al. · harvard
Recently, Large Language Models (LLMs) have undergone a significant transformation, marked by a rapid rise in both their popularity and capabilities. Leading this evolution are proprietary LLMs like GPT-4 and GPT-o1, which have captured widespread attention in the AI community due to their remarkable performance and versatility. Simultaneously, open-source LLMs, such as LLaMA, have made great contributions to the ever-increasing popularity of LLMs due to the ease to customize and deploy the models across diverse applications. Although open-source LLMs present unprecedented opportunities for innovation and research, the commercialization of LLMs has raised concerns about transparency, reproducibility, and safety. Many open-source LLMs fail to meet fundamental transparency requirements by withholding essential components like training code and data, which may hinder further innovations on LLMs. To mitigate this issue, we introduce Moxin 7B, a fully open-source LLM developed, adhering to principles of open science, open source, open data, and open access. We release the pre-training code and configurations, training and fine-tuning datasets, and intermediate and final checkpoints, aiming to make continuous commitments to fully open-source LLMs. After pre-training the base model, we finetune the Moxin Base model with SOTA post-training framework and instruction data to obtain Moxin Instruct model. To improve the reasoning capability, we further finetune our Instruct model with chain-of-thought data distilled from DeepSeek R1, and then use Group Relative Policy Optimization (GRPO) following DeepSeek R1 to finetune our model, leading to the Moxin Reasoning model. Moreover, we develop our vision language model based on our Moxin model. Experiments show that our models achieve superior performance in various evaluations such as zero-shot evaluation, few-shot evaluation, and CoT evaluation.
CLAug 30, 2021Code
Selective Differential Privacy for Language ModelingWeiyan Shi, Aiqi Cui, Evan Li et al.
With the increasing applications of language models, it has become crucial to protect these models from leaking private information. Previous work has attempted to tackle this challenge by training RNN-based language models with differential privacy guarantees. However, applying classical differential privacy to language models leads to poor model performance as the underlying privacy notion is over-pessimistic and provides undifferentiated protection for all tokens in the data. Given that the private information in natural language is sparse (for example, the bulk of an email might not carry personally identifiable information), we propose a new privacy notion, selective differential privacy, to provide rigorous privacy guarantees on the sensitive portion of the data to improve model utility. To realize such a new notion, we develop a corresponding privacy mechanism, Selective-DPSGD, for RNN-based language models. Besides language modeling, we also apply the method to a more concrete application--dialog systems. Experiments on both language modeling and dialog system building show that the proposed privacy-preserving mechanism achieves better utilities while remaining safe under various privacy attacks compared to the baselines. The data and code are released at https://github.com/wyshi/lm_privacy to facilitate future research .
AIMay 5, 2021Code
LEGOEval: An Open-Source Toolkit for Dialogue System Evaluation via CrowdsourcingYu Li, Josh Arnold, Feifan Yan et al.
We present LEGOEval, an open-source toolkit that enables researchers to easily evaluate dialogue systems in a few lines of code using the online crowdsource platform, Amazon Mechanical Turk. Compared to existing toolkits, LEGOEval features a flexible task design by providing a Python API that maps to commonly used React.js interface components. Researchers can personalize their evaluation procedures easily with our built-in pages as if playing with LEGO blocks. Thus, LEGOEval provides a fast, consistent method for reproducing human evaluation results. Besides the flexible task design, LEGOEval also offers an easy API to review collected data.
CLDec 14, 2023
The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive ConversationRongwu Xu, Brian S. Lin, Shujian Yang et al. · uw
Large language models (LLMs) encapsulate vast amounts of knowledge but still remain vulnerable to external misinformation. Existing research mainly studied this susceptibility behavior in a single-turn setting. However, belief can change during a multi-turn conversation, especially a persuasive one. Therefore, in this study, we delve into LLMs' susceptibility to persuasive conversations, particularly on factual questions that they can answer correctly. We first curate the Farm (i.e., Fact to Misinform) dataset, which contains factual questions paired with systematically generated persuasive misinformation. Then, we develop a testing framework to track LLMs' belief changes in a persuasive dialogue. Through extensive experiments, we find that LLMs' correct beliefs on factual knowledge can be easily manipulated by various persuasive strategies.
AIMar 7, 2024
A Safe Harbor for AI Evaluation and Red TeamingShayne Longpre, Sayash Kapoor, Kevin Klyman et al.
Independent evaluation and red teaming are critical for identifying the risks posed by generative AI systems. However, the terms of service and enforcement strategies used by prominent AI companies to deter model misuse have disincentives on good faith safety evaluations. This causes some researchers to fear that conducting such research or releasing their findings will result in account suspensions or legal reprisal. Although some companies offer researcher access programs, they are an inadequate substitute for independent research access, as they have limited community representation, receive inadequate funding, and lack independence from corporate incentives. We propose that major AI developers commit to providing a legal and technical safe harbor, indemnifying public interest safety research and protecting it from the threat of account suspensions or legal reprisal. These proposals emerged from our collective experience conducting safety, privacy, and trustworthiness research on generative AI systems, where norms and incentives could be better aligned with public interests, without exacerbating model misuse. We believe these commitments are a necessary step towards more inclusive and unimpeded community efforts to tackle the risks of generative AI.
HCDec 31, 2024
Proactive Conversational Agents with Inner ThoughtsXingyu Bruce Liu, Shitao Fang, Weiyan Shi et al.
One of the long-standing aspirations in conversational AI is to allow them to autonomously take initiatives in conversations, i.e., being proactive. This is especially challenging for multi-party conversations. Prior NLP research focused mainly on predicting the next speaker from contexts like preceding conversations. In this paper, we demonstrate the limitations of such methods and rethink what it means for AI to be proactive in multi-party, human-AI conversations. We propose that just like humans, rather than merely reacting to turn-taking cues, a proactive AI formulates its own inner thoughts during a conversation, and seeks the right moment to contribute. Through a formative study with 24 participants and inspiration from linguistics and cognitive psychology, we introduce the Inner Thoughts framework. Our framework equips AI with a continuous, covert train of thoughts in parallel to the overt communication process, which enables it to proactively engage by modeling its intrinsic motivation to express these thoughts. We instantiated this framework into two real-time systems: an AI playground web app and a chatbot. Through a technical evaluation and user studies with human participants, our framework significantly surpasses existing baselines on aspects like anthropomorphism, coherence, intelligence, and turn-taking appropriateness.
AIJun 30, 2025
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement LearningBo Liu, Leon Guertler, Simon Yu et al. · pku
Recent advances in reinforcement learning have shown that language models can develop sophisticated reasoning through training on tasks with verifiable rewards, but these approaches depend on human-curated problem-answer pairs and domain-specific reward engineering. We introduce SPIRAL, a self-play framework where models learn by playing multi-turn, zero-sum games against continuously improving versions of themselves, eliminating the need for human supervision. Through self-play, SPIRAL generates an infinite curriculum of progressively challenging problems as models must constantly adapt to stronger opponents. To enable this self-play training at scale, We implement a fully online, multi-turn, multi-agent reinforcement learning system for LLMs and propose role-conditioned advantage estimation (RAE) to stabilize multi-agent training. Using SPIRAL, self-play on zero-sum games produces reasoning capabilities that transfer broadly. Training Qwen3-4B-Base on Kuhn Poker alone achieves 8.6% improvement on math and 8.4% on general reasoning, outperforming SFT on 25,000 expert game trajectories. Analysis reveals that this transfer occurs through three cognitive patterns: systematic decomposition, expected value calculation, and case-by-case analysis. Multi-game training (TicTacToe, Kuhn Poker, Simple Negotiation) further enhances performance as each game develops distinct reasoning strengths. Applying SPIRAL to a strong reasoning model (DeepSeek-R1-Distill-Qwen-7B) can still lead to 2.0% average improvement. These results demonstrate that zero-sum games naturally develop transferable reasoning capabilities, highlighting a promising direction for autonomous reasoning development.
CLApr 3
Train Yourself as an LLM: Exploring Effects of AI Literacy on Persuasion via Role-playing LLM TrainingQihui Fan, Min Ge, Chenyan Jia et al.
As large language models (LLMs) become increasingly persuasive, there is concern that people's opinions and decisions may be influenced across various contexts at scale. Prior mitigation (e.g., AI detectors and disclaimers) largely treats people as passive recipients of AI-generated information. To provide a more proactive intervention against persuasive AI, we introduce $\textbf{LLMimic}$, a role-play-based, interactive, gamified AI literacy tutorial, where participants assume the role of an LLM and progress through three key stages of the training pipeline (pretraining, SFT, and RLHF). We conducted a $2 \times 3$ between-subjects study ($N = 274$) where participants either (1) watched an AI history video (control) or (2) interacted with LLMimic (treatment), and then engaged in one of three realistic AI persuasion scenarios: (a) charity donation persuasion, (b) malicious money solicitation, or (c) hotel recommendation. Our results show that LLMimic significantly improved participants' AI literacy ($p < .001$), reduced persuasion success across scenarios ($p < .05$), and enhanced truthfulness and social responsibility levels ($p<0.01$) in the hotel scenario. These findings suggest that LLMimic offers a scalable, human-centered approach to improving AI literacy and supporting more informed interactions with persuasive AI.
CYFeb 28, 2024
Social Intelligence Data Infrastructure: Structuring the Present and Navigating the FutureMinzhi Li, Weiyan Shi, Caleb Ziems et al.
As Natural Language Processing (NLP) systems become increasingly integrated into human social life, these technologies will need to increasingly rely on social intelligence. Although there are many valuable datasets that benchmark isolated dimensions of social intelligence, there does not yet exist any body of work to join these threads into a cohesive subfield in which researchers can quickly identify research gaps and future directions. Towards this goal, we build a Social AI Data Infrastructure, which consists of a comprehensive social AI taxonomy and a data library of 480 NLP datasets. Our infrastructure allows us to analyze existing dataset efforts, and also evaluate language models' performance in different social intelligence aspects. Our analyses demonstrate its utility in enabling a thorough understanding of current data landscape and providing a holistic perspective on potential directions for future dataset development. We show there is a need for multifaceted datasets, increased diversity in language and culture, more long-tailed social situations, and more interactive data in future social intelligence data efforts.
CLJul 16, 2025
LLMs Encode Harmfulness and Refusal SeparatelyJiachen Zhao, Jing Huang, Zhengxuan Wu et al. · stanford
LLMs are trained to refuse harmful instructions, but do they truly understand harmfulness beyond just refusing? Prior work has shown that LLMs' refusal behaviors can be mediated by a one-dimensional subspace, i.e., a refusal direction. In this work, we identify a new dimension to analyze safety mechanisms in LLMs, i.e., harmfulness, which is encoded internally as a separate concept from refusal. There exists a harmfulness direction that is distinct from the refusal direction. As causal evidence, steering along the harmfulness direction can lead LLMs to interpret harmless instructions as harmful, but steering along the refusal direction tends to elicit refusal responses directly without reversing the model's judgment on harmfulness. Furthermore, using our identified harmfulness concept, we find that certain jailbreak methods work by reducing the refusal signals without reversing the model's internal belief of harmfulness. We also find that adversarially finetuning models to accept harmful instructions has minimal impact on the model's internal belief of harmfulness. These insights lead to a practical safety application: The model's latent harmfulness representation can serve as an intrinsic safeguard (Latent Guard) for detecting unsafe inputs and reducing over-refusals that is robust to finetuning attacks. For instance, our Latent Guard achieves performance comparable to or better than Llama Guard 3 8B, a dedicated finetuned safeguard model, across different jailbreak methods. Our findings suggest that LLMs' internal understanding of harmfulness is more robust than their refusal decision to diverse input instructions, offering a new perspective to study AI safety.
CYApr 14, 2025
Labeling Messages as AI-Generated Does Not Reduce Their Persuasive EffectsIsabel O. Gallegos, Chen Shani, Weiyan Shi et al.
As generative artificial intelligence (AI) enables the creation and dissemination of information at massive scale and speed, it is increasingly important to understand how people perceive AI-generated content. One prominent policy proposal requires explicitly labeling AI-generated content to increase transparency and encourage critical thinking about the information, but prior research has not yet tested the effects of such labels. To address this gap, we conducted a survey experiment (N=1601) on a diverse sample of Americans, presenting participants with an AI-generated message about several public policies (e.g., allowing colleges to pay student-athletes), randomly assigning whether participants were told the message was generated by (a) an expert AI model, (b) a human policy expert, or (c) no label. We found that messages were generally persuasive, influencing participants' views of the policies by 9.74 percentage points on average. However, while 94.6% of participants assigned to the AI and human label conditions believed the authorship labels, labels had no significant effects on participants' attitude change toward the policies, judgments of message accuracy, nor intentions to share the message with others. These patterns were robust across a variety of participant characteristics, including prior knowledge of the policy, prior experience with AI, political party, education level, or age. Taken together, these results imply that, while authorship labels would likely enhance transparency, they are unlikely to substantially affect the persuasiveness of the labeled content, highlighting the need for alternative strategies to address challenges posed by AI-generated information.
CLNov 21, 2024
NewsInterview: a Dataset and a Playground to Evaluate LLMs' Ground Gap via Informational InterviewsMichael Lu, Hyundong Justin Cho, Weiyan Shi et al.
Large Language Models (LLMs) have demonstrated impressive capabilities in generating coherent text but often struggle with grounding language and strategic dialogue. To address this gap, we focus on journalistic interviews, a domain rich in grounding communication and abundant in data. We curate a dataset of 40,000 two-person informational interviews from NPR and CNN, and reveal that LLMs are significantly less likely than human interviewers to use acknowledgements and to pivot to higher-level questions. Realizing that a fundamental deficit exists in multi-turn planning and strategic thinking, we develop a realistic simulated environment, incorporating source personas and persuasive elements, in order to facilitate the development of agents with longer-horizon rewards. Our experiments show that while source LLMs mimic human behavior in information sharing, interviewer LLMs struggle with recognizing when questions are answered and engaging persuasively, leading to suboptimal information extraction across model size and capability. These findings underscore the need for enhancing LLMs' strategic dialogue capabilities.
CLOct 1, 2025
Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM DiversityJiayi Zhang, Simon Yu, Derek Chong et al.
Post-training alignment often reduces LLM diversity, leading to a phenomenon known as mode collapse. Unlike prior work that attributes this effect to algorithmic limitations, we identify a fundamental, pervasive data-level driver: typicality bias in preference data, whereby annotators systematically favor familiar text as a result of well-established findings in cognitive psychology. We formalize this bias theoretically, verify it on preference datasets empirically, and show that it plays a central role in mode collapse. Motivated by this analysis, we introduce Verbalized Sampling, a simple, training-free prompting strategy to circumvent mode collapse. VS prompts the model to verbalize a probability distribution over a set of responses (e.g., "Generate 5 jokes about coffee and their corresponding probabilities"). Comprehensive experiments show that VS significantly improves performance across creative writing (poems, stories, jokes), dialogue simulation, open-ended QA, and synthetic data generation, without sacrificing factual accuracy and safety. For instance, in creative writing, VS increases diversity by 1.6-2.1x over direct prompting. We further observe an emergent trend that more capable models benefit more from VS. In sum, our work provides a new data-centric perspective on mode collapse and a practical inference-time remedy that helps unlock pre-trained generative diversity.
HCJun 8, 2025
Sword and Shield: Uses and Strategies of LLMs in Navigating DisinformationGionnieve Lim, Bryan Chen Zhengyu Tan, Kellie Yu Hui Sim et al.
The emergence of Large Language Models (LLMs) presents a dual challenge in the fight against disinformation. These powerful tools, capable of generating human-like text at scale, can be weaponised to produce sophisticated and persuasive disinformation, yet they also hold promise for enhancing detection and mitigation strategies. This paper investigates the complex dynamics between LLMs and disinformation through a communication game that simulates online forums, inspired by the game Werewolf, with 25 participants. We analyse how Disinformers, Moderators, and Users leverage LLMs to advance their goals, revealing both the potential for misuse and combating disinformation. Our findings highlight the varying uses of LLMs depending on the participants' roles and strategies, underscoring the importance of understanding their effectiveness in this context. We conclude by discussing implications for future LLM development and online platform design, advocating for a balanced approach that empowers users and fosters trust while mitigating the risks of LLM-assisted disinformation.
CVNov 30, 2025
SocialFusion: Addressing Social Degradation in Pre-trained Vision-Language ModelsHamza Tahboub, Weiyan Shi, Gang Hua et al.
Understanding social interactions from visual cues is a fundamental challenge for a socially competent AI. While powerful pre-trained vision-language models (VLMs) have shown remarkable general capabilities, they surprisingly struggle to unify and learn multiple social perception tasks simultaneously, often exhibiting negative transfer. We identify that this negative transfer stems from a critical issue we term "social degradation," whereby the general visual-linguistic pre-training process of VLMs impairs the visual encoder's ability to represent nuanced social information. We investigate this behavior further under two lenses: decodability through linear representation probing and compatibility through gradient conflict analysis, revealing that both play a role in the degradation, especially the former, which is significantly compromised in the VLM pre-training process. To address these issues, we propose SocialFusion, a unified framework that learns a minimal connection between a frozen visual encoder and a language model. Compared with existing VLMs, it exhibits positive transfer across all five social tasks, leveraging synergies between them to enhance overall performance and achieves comparable performance to task-specific state-of-the-art models on various benchmarks. Our findings suggest that current VLM pre-training strategies may be detrimental to acquiring general social competence and highlight the need for more socially-aware training paradigms.
HCMar 7
More Than 1v1: Human-AI Alignment in Early Developmental Communities with Multimodal LLMsWeiyan Shi, Kenny Tsu Wei Choo
In early developmental contexts, particularly in parent-child interaction analysis, alignment involves families and professionals such as speech-language pathologists (SLPs) who interpret children's everyday interactions from different roles. When multimodal large language models (MLLMs) are introduced to support this process, alignment becomes a question of how authority, responsibility, and emotional risk are distributed across stakeholders. Through a three-part study with five families and three SLPs, we trace how MLLM-generated outputs move from expert-facing analysis to parent-facing feedback. We propose layered community alignment: grounding representations in expert-aligned structures, mediating translation through professional guardrails, and enabling family-level adaptation within those boundaries. We argue that alignment in developmental settings should be treated as a community-governed process rather than an individual optimisation problem.
LGOct 28, 2025
Can Aha Moments Be Fake? Identifying True and Decorative Thinking Steps in Chain-of-ThoughtJiachen Zhao, Yiyou Sun, Weiyan Shi et al.
Recent large language models (LLMs) can generate long Chain-of-Thought (CoT) at test time, enabling them to solve complex tasks. These reasoning steps in CoT are often assumed as a faithful reflection of the model's internal thinking process, and used to monitor unsafe intentions. However, we find many reasoning steps don't truly contribute to LLMs' prediction. We measure the step-wise causal influence of each reasoning step on the model's final prediction with a proposed True Thinking Score (TTS). We reveal that LLMs often interleave between true-thinking steps (which are genuinely used to produce the final output) and decorative-thinking steps (which only give the appearance of reasoning but have minimal causal impact). Notably, only a small subset of the total reasoning steps have a high TTS that causally drive the model's prediction: e.g., for the AIME dataset, only an average of 2.3% of reasoning steps in CoT have a TTS >= 0.7 (range: 0-1) under the Qwen-2.5 model. Furthermore, we identify a TrueThinking direction in the latent space of LLMs. By steering along or against this direction, we can force the model to perform or disregard certain CoT steps when computing the final result. Finally, we highlight that self-verification steps in CoT (i.e., aha moments) can also be decorative, where LLMs do not truly verify their solution. Steering along the TrueThinking direction can force internal reasoning over these steps, resulting in a change in the final results. Overall, our work reveals that LLMs often verbalize reasoning steps without actually performing them internally, which undermines both the efficiency of LLM reasoning and the trustworthiness of CoT.
CLOct 17, 2025
PolySkill: Learning Generalizable Skills Through Polymorphic AbstractionSimon Yu, Gang Li, Weiyan Shi et al.
Large language models (LLMs) are moving beyond static uses and are now powering agents that learn continually during their interaction with external environments. For example, agents can learn reusable skills while navigating web pages or toggling new tools. However, existing methods for skill learning often create skills that are over-specialized to a single website and fail to generalize. We introduce PolySkill, a new framework that enables agents to learn generalizable and compositional skills. The core idea, inspired by polymorphism in software engineering, is to decouple a skill's abstract goal (what it accomplishes) and its concrete implementation (how it is executed). Experiments show that our method (1) improves skill reuse by 1.7x on seen websites and (2) boosts success rates by up to 9.4% on Mind2Web and 13.9% on unseen websites, while reducing steps by over 20%. (3) In self-exploration settings without specified tasks, our framework improves the quality of proposed tasks and enables agents to learn generalizable skills that work across different sites. By enabling the agent to identify and refine its own goals, the PolySkill enhances the agent's ability to learn a better curriculum, leading to the acquisition of more generalizable skills compared to baseline methods. This work provides a practical path toward building agents capable of continual learning in adaptive environments. Our findings show that separating a skill's goal from its execution is a crucial step toward developing autonomous agents that can learn and generalize across the open web continuously.
CLMay 6, 2023
Controllable Mixed-Initiative Dialogue Generation through PromptingMaximillian Chen, Xiao Yu, Weiyan Shi et al.
Mixed-initiative dialogue tasks involve repeated exchanges of information and conversational control. Conversational agents gain control by generating responses that follow particular dialogue intents or strategies, prescribed by a policy planner. The standard approach has been fine-tuning pre-trained language models to perform generation conditioned on these intents. However, these supervised generation models are limited by the cost and quality of data annotation. We instead prompt large language models as a drop-in replacement to fine-tuning on conditional generation. We formalize prompt construction for controllable mixed-initiative dialogue. Our findings show improvements over fine-tuning and ground truth responses according to human evaluation and automatic metrics for two tasks: PersuasionForGood and Emotional Support Conversations.
CLJun 3, 2021
Dialoging Resonance: How Users Perceive, Reciprocate and React to Chatbot's Self-Disclosure in Conversational RecommendationsKai-Hui Liang, Weiyan Shi, Yoojung Oh et al.
Using chatbots to deliver recommendations is increasingly popular. The design of recommendation chatbots has primarily been taking an information-centric approach by focusing on the recommended content per se. Limited attention is on how social connection and relational strategies, such as self-disclosure from a chatbot, may influence users' perception and acceptance of the recommendation. In this work, we designed, implemented, and evaluated a social chatbot capable of performing three different levels of self-disclosure: factual information (low), cognitive opinions (medium), and emotions (high). In the evaluation, we recruited 372 participants to converse with the chatbot on two topics: movies and COVID-19 experiences. In each topic, the chatbot performed small talks and made recommendations relevant to the topic. Participants were randomly assigned to four experimental conditions where the chatbot used factual, cognitive, emotional, and adaptive strategies to perform self-disclosures. By training a text classifier to identify users' level of self-disclosure in real-time, the adaptive chatbot can dynamically match its self-disclosure to the level of disclosure exhibited by the users. Our results show that users reciprocate with higher-level self-disclosure when a recommendation chatbot consistently displays emotions throughout the conversation. Chatbot's emotional disclosure also led to increased interactional enjoyment and more positive interpersonal perception towards the bot, fostering a stronger human-chatbot relationship and thus leading to increased recommendation effectiveness, including a higher tendency to accept the recommendation. We discuss the understandings obtained and implications to future design.
CLApr 16, 2021
DEUX: An Attribute-Guided Framework for Sociable Recommendation Dialog SystemsYu Li, Shirley Anugrah Hayati, Weiyan Shi et al.
It is important for sociable recommendation dialog systems to perform as both on-task content and social content to engage users and gain their favor. In addition to understand the user preferences and provide a satisfying recommendation, such systems must be able to generate coherent and natural social conversations to the user. Traditional dialog state tracking cannot be applied to such systems because it does not track the attributes in the social content. To address this challenge, we propose DEUX, a novel attribute-guided framework to create better user experiences while accomplishing a movie recommendation task. DEUX has a module that keeps track of the movie attributes (e.g., favorite genres, actors,etc.) in both user utterances and system responses. This allows the system to introduce new movie attributes in its social content. Then, DEUX has multiple values for the same attribute type which suits the recommendation task since a user may like multiple genres, for instance. Experiments suggest that DEUX outperforms all the baselines on being more consistent, fitting the user preferences better, and providing a more engaging chat experience. Our approach can be used for any similar problems of sociable task-oriented dialog system.
CLMar 12, 2021
Towards Socially Intelligent Agents with Mental State Transition and Human UtilityLiang Qiu, Yizhou Zhao, Yuan Liang et al.
Building a socially intelligent agent involves many challenges. One of which is to track the agent's mental state transition and teach the agent to make decisions guided by its value like a human. Towards this end, we propose to incorporate mental state simulation and value modeling into dialogue agents. First, we build a hybrid mental state parser that extracts information from both the dialogue and event observations and maintains a graphical representation of the agent's mind; Meanwhile, the transformer-based value model learns human preferences from the human value dataset, ValueNet. Empirical results show that the proposed model attains state-of-the-art performance on the dialogue/action/emotion prediction task in the fantasy text-adventure game dataset, LIGHT. We also show example cases to demonstrate: (i) how the proposed mental state parser can assist the agent's decision by grounding on the context like locations and objects, and (ii) how the value model can help the agent make decisions based on its personal priorities.
CLDec 31, 2020
Refine and Imitate: Reducing Repetition and Inconsistency in Persuasion Dialogues via Reinforcement Learning and Human DemonstrationWeiyan Shi, Yu Li, Saurav Sahay et al.
Persuasion dialogue systems reflect the machine's ability to make strategic moves beyond verbal communication, and therefore differentiate themselves from task-oriented or open-domain dialogue systems and have their own unique values. However, the repetition and inconsistency problems still persist in dialogue response generation and could substantially impact user experience and impede the persuasion outcome. Besides, although reinforcement learning (RL) approaches have achieved big success in strategic tasks such as games, they require a sophisticated user simulator to provide real-time feedback to the dialogue system, which limits the application of RL on persuasion dialogues. To address these issues towards a better persuasion dialogue system, we apply RL to refine a language model baseline without user simulators, and distill sentence-level information about repetition, inconsistency, and task relevance through rewards. Moreover, to better accomplish the persuasion task, the model learns from human demonstration to imitate human persuasion behavior and selects the most persuasive responses. Experiments show that our model outperforms previous state-of-the-art dialogue models on both automatic metrics and human evaluation results on a donation persuasion task, and generates more diverse, consistent and persuasive conversations according to the user feedback.
CLSep 29, 2020
INSPIRED: Toward Sociable Recommendation Dialog SystemsShirley Anugrah Hayati, Dongyeop Kang, Qingxiaoyang Zhu et al.
In recommendation dialogs, humans commonly disclose their preference and make recommendations in a friendly manner. However, this is a challenge when developing a sociable recommendation dialog system, due to the lack of dialog dataset annotated with such sociable strategies. Therefore, we present INSPIRED, a new dataset of 1,001 human-human dialogs for movie recommendation with measures for successful recommendations. To better understand how humans make recommendations in communication, we design an annotation scheme related to recommendation strategies based on social science theories and annotate these dialogs. Our analysis shows that sociable recommendation strategies, such as sharing personal opinions or communicating with encouragement, more frequently lead to successful recommendations. Based on our dataset, we train end-to-end recommendation dialog systems with and without our strategy labels. In both automatic and human evaluation, our model with strategy incorporation outperforms the baseline model. This work is a first step for building sociable recommendation dialog systems with a basis of social science theories.
CLSep 17, 2020
Structured Attention for Unsupervised Dialogue Structure InductionLiang Qiu, Yizhou Zhao, Weiyan Shi et al.
Inducing a meaningful structural representation from one or a set of dialogues is a crucial but challenging task in computational linguistics. Advancement made in this area is critical for dialogue system design and discourse analysis. It can also be extended to solve grammatical inference. In this work, we propose to incorporate structured attention layers into a Variational Recurrent Neural Network (VRNN) model with discrete latent states to learn dialogue structure in an unsupervised fashion. Compared to a vanilla VRNN, structured attention enables a model to focus on different parts of the source sentence embeddings while enforcing a structural inductive bias. Experiments show that on two-party dialogue datasets, VRNN with structured attention learns semantic structures that are similar to templates used to generate this dialogue corpus. While on multi-party dialogue datasets, our model learns an interactive structure demonstrating its capability of distinguishing speakers or addresses, automatically disentangling dialogues without explicit human annotation.
CLApr 24, 2020
A Tailored Pre-Training Model for Task-Oriented Dialog GenerationJing Gu, Qingyang Wu, Chongruo Wu et al.
The recent success of large pre-trained language models such as BERT and GPT-2 has suggested the effectiveness of incorporating language priors in downstream dialog generation tasks. However, the performance of pre-trained models on the dialog task is not as optimal as expected. In this paper, we propose a Pre-trained Role Alternating Language model (PRAL), designed specifically for task-oriented conversational systems. We adopted (Wu et al., 2019) that models two speakers separately. We also design several techniques, such as start position randomization, knowledge distillation, and history discount to improve pre-training performance. We introduce a task-oriented dialog pretraining dataset by cleaning 13 existing data sets. We test PRAL on three different downstream tasks. The results show that PRAL performs better or on par with state-of-the-art methods.
HCJan 13, 2020
Effects of Persuasive Dialogues: Testing Bot Identities and Inquiry StrategiesWeiyan Shi, Xuewei Wang, Yoo Jung Oh et al.
Intelligent conversational agents, or chatbots, can take on various identities and are increasingly engaging in more human-centered conversations with persuasive goals. However, little is known about how identities and inquiry strategies influence the conversation's effectiveness. We conducted an online study involving 790 participants to be persuaded by a chatbot for charity donation. We designed a two by four factorial experiment (two chatbot identities and four inquiry strategies) where participants were randomly assigned to different conditions. Findings showed that the perceived identity of the chatbot had significant effects on the persuasion outcome (i.e., donation) and interpersonal perceptions (i.e., competence, confidence, warmth, and sincerity). Further, we identified interaction effects among perceived identities and inquiry strategies. We discuss the findings for theoretical and practical implications for developing ethical and effective persuasive chatbots. Our published data, codes, and analyses serve as the first step towards building competent ethical persuasive chatbots.
CLNov 25, 2019
End-to-End Trainable Non-Collaborative Dialog SystemYu Li, Kun Qian, Weiyan Shi et al.
End-to-end task-oriented dialog models have achieved promising performance on collaborative tasks where users willingly coordinate with the system to complete a given task. While in non-collaborative settings, for example, negotiation and persuasion, users and systems do not share a common goal. As a result, compared to collaborate tasks, people use social content to build rapport and trust in these non-collaborative settings in order to advance their goals. To handle social content, we introduce a hierarchical intent annotation scheme, which can be generalized to different non-collaborative dialog tasks. Building upon TransferTransfo (Wolf et al. 2019), we propose an end-to-end neural network model to generate diverse coherent responses. Our model utilizes intent and semantic slots as the intermediate sentence representation to guide the generation process. In addition, we design a filter to select appropriate responses based on whether these intermediate representations fit the designed task and conversation constraints. Our non-collaborative dialog model guides users to complete the task while simultaneously keeps them engaged. We test our approach on our newly proposed ANTISCAM dataset and an existing PERSUASIONFORGOOD dataset. Both automatic and human evaluations suggest that our model outperforms multiple baselines in these two non-collaborative tasks.
CLSep 3, 2019
How to Build User Simulators to Train RL-based Dialog SystemsWeiyan Shi, Kun Qian, Xuewei Wang et al.
User simulators are essential for training reinforcement learning (RL) based dialog models. The performance of the simulator directly impacts the RL policy. However, building a good user simulator that models real user behaviors is challenging. We propose a method of standardizing user simulator building that can be used by the community to compare dialog system quality using the same set of user simulators fairly. We present implementations of six user simulators trained with different dialog planning and generation methods. We then calculate a set of automatic metrics to evaluate the quality of these simulators both directly and indirectly. We also ask human users to assess the simulators directly and indirectly by rating the simulated dialogs and interacting with the trained systems. This paper presents a comprehensive evaluation framework for user simulator study and provides a better understanding of the pros and cons of different user simulators, as well as their impacts on the trained systems.
CLJun 16, 2019
Persuasion for Good: Towards a Personalized Persuasive Dialogue System for Social GoodXuewei Wang, Weiyan Shi, Richard Kim et al.
Developing intelligent persuasive conversational agents to change people's opinions and actions for social good is the frontier in advancing the ethical development of automated dialogue systems. To do so, the first step is to understand the intricate organization of strategic disclosures and appeals employed in human persuasion conversations. We designed an online persuasion task where one participant was asked to persuade the other to donate to a specific charity. We collected a large dataset with 1,017 dialogues and annotated emerging persuasion strategies from a subset. Based on the annotation, we built a baseline classifier with context information and sentence-level features to predict the 10 persuasion strategies used in the corpus. Furthermore, to develop an understanding of personalized persuasion processes, we analyzed the relationships between individuals' demographic and psychological backgrounds including personality, morality, value systems, and their willingness for donation. Then, we analyzed which types of persuasion strategies led to a greater amount of donation depending on the individuals' personal backgrounds. This work lays the ground for developing a personalized persuasive dialogue system.
CLApr 7, 2019
Unsupervised Dialog Structure LearningWeiyan Shi, Tiancheng Zhao, Zhou Yu
Learning a shared dialog structure from a set of task-oriented dialogs is an important challenge in computational linguistics. The learned dialog structure can shed light on how to analyze human dialogs, and more importantly contribute to the design and evaluation of dialog systems. We propose to extract dialog structures using a modified VRNN model with discrete latent vectors. Different from existing HMM-based models, our model is based on variational-autoencoder (VAE). Such model is able to capture more dynamics in dialogs beyond the surface forms of the language. We find that qualitatively, our method extracts meaningful dialog structure, and quantitatively, outperforms previous models on the ability to predict unseen data. We further evaluate the model's effectiveness in a downstream task, the dialog system building task. Experiments show that, by integrating the learned dialog structure into the reward function design, the model converges faster and to a better outcome in a reinforcement learning setting.