CLNov 16, 2023Code
More Samples or More Prompts? Exploring Effective In-Context Sampling for LLM Few-Shot Prompt EngineeringBingsheng Yao, Guiming Chen, Ruishi Zou et al.
While most existing works on LLM prompting techniques focus only on how to select a better set of data samples inside one single prompt input (In-Context Learning or ICL), why can not we design and leverage multiple prompts together to further improve the LLM's performance? In this work, we propose In-Context Sampling (ICS), a low-resource LLM prompting technique to produce confident predictions by optimizing the construction of multiple ICL prompt inputs. Extensive experiments with three open-source LLMs (FlanT5-XL, Mistral-7B, and Mixtral-8x7B) on four NLI datasets (e-SNLI, Multi-NLI, ANLI, and Contract-NLI) and one QA dataset (CommonsenseQA) illustrate that ICS can consistently enhance LLMs' performance. An in-depth evaluation with three data similarity-based ICS strategies suggests that these strategies can further elevate LLM's performance, which sheds light on a new yet promising future research direction.
HCMar 11
Graphing Inline: Understanding Word-scale Graphics Use in Scientific PapersSiyu Lu, Yanhan Liu, Shiyu Xu et al.
Graphics (e.g., figures and charts) are ubiquitous in scientific papers, yet separating graphics from text increases cognitive load in understanding text-graphic connections. Research has found that word-scale graphics, or visual embellishments at typographic size, can augment original text, making it more expressive and easier to understand. However, whether, if so, how scientific papers adopt word-scale graphics for scholarly communication remains unclear. To address this gap, we conducted a corpus study reviewing 909 word-scale graphics extracted from 126,797 scientific papers. Through analysis, we propose a framework that characterizes where (positioning), why (communicative function), and how (visual representation) authors apply word-scale graphics in scientific papers. Our findings reveal that word-scale graphics are rarely used, that icons dominate visual representation, and that visual representation connects with communicative function (e.g., using quantitative graphs for data annotation). We further discuss opportunities to enhance scholarly communication with word-scale graphics through technical and administrative innovations.
HCMay 5
Deco: Extending Personal Physical Objects into Pervasive AI Companion through a Dual-Embodiment FrameworkZhihan Jiang, Mengyuan Millie Wu, Ruishi Zou et al.
Individuals frequently form deep attachments to physical objects (e.g., plush toys) that usually cannot sense or respond to their emotions. While AI companions offer responsiveness and personalization, they exist independently of these physical objects and lack an ongoing connection to them. To bridge this gap, we conducted a formative study (N=9) to explore how digital agents could inherit and extend the emotional bond, deriving four design principles (Faithful Identity, Calibrated Agency, Ambient Presence, and Reciprocal Memory). We then present the Dual-Embodiment Companion Framework, instantiated as Deco, a mobile system integrating multimodal Large Language Models (LLMs) and Augmented Reality to create synchronized digital embodiments of users' physical companions. A within-subjects study (N=25) showed Deco significantly outperformed a personalized LLM-empowered digital companion baseline on perceived companionship, emotional bond, and design-principle scales (all p<0.01). A seven-day field deployment (N=17) showed sustained engagement, subjective well-being improvement (p=.040), and three key relational patterns: digital activities retroactively vitalized physical objects, bond deepening was driven by emotional engagement depth rather than interaction frequency, and users sustained bonds while actively navigating digital companions' AI nature. This work highlights a promising alternative for designing digital companions: moving from creating new relationships to dual embodiment, where digital agents seamlessly extend the emotional history of physical objects.
HCFeb 18, 2025
Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing AgentsChaoran Chen, Bingsheng Yao, Ruishi Zou et al.
Role-Playing Agent (RPA) is an increasingly popular type of LLM Agent that simulates human-like behaviors in a variety of tasks. However, evaluating RPAs is challenging due to diverse task requirements and agent designs. This paper proposes an evidence-based, actionable, and generalizable evaluation design guideline for LLM-based RPA by systematically reviewing 1,676 papers published between Jan. 2021 and Dec. 2024. Our analysis identifies six agent attributes, seven task attributes, and seven evaluation metrics from existing literature. Based on these findings, we present an RPA evaluation design guideline to help researchers develop more systematic and consistent evaluation methods.