Renkai Ma

HC
5papers
2citations
Novelty35%
AI Score46

5 Papers

80.2HCMay 18
Exploring Needs and Design Opportunities for Proactive Information Support in In-Person Small-Group Conversations

Shaoze Zhou, Diana Nelly Rivera Rodriguez, Pedro Remior et al.

In-person small-group conversations play a crucial role in everyday life; however, facilitating effective group interaction can be challenging, as the real-time nature demands full attention, offers no opportunity for revision, and requires interpreting non-verbal cues. Using Mixed Reality to provide proactive information support shows promise in helping individuals engage in and contribute to group conversations. We present a preliminary participatory design and qualitative study (N = 10) using focus groups and two technology probes to explore the opportunities of designing proactive information support in in-person small-group conversations. We reveal key design opportunities concerning how to maximize the benefits of proactive information support and how to effectively design such supporting information. Our study is crucial for paving the way toward designing future proactive AI agents to enable the paradigm of augmented in-person small-group conversation experience.

20.4CLMay 25
LLM-as-a-Reviewer: Benchmarking Their Ability, Divergence, and Prompt Injection Resistance as Paper Reviewers

Lingyao Li, Junjie Xiong, Changjia Zhu et al.

Large language models (LLMs) are increasingly used in academic peer review, yet their reliability, alignment with human judgment, and robustness to adversarial attacks remain poorly understood. We present a systematic benchmark of LLM-as-a-Reviewer on 898 papers stratified from NeurIPS and ICLR, evaluating 12 LLMs along three axes: rating calibration, divergence from human reviewers, and resistance to prompt injection embedded via an invisible font-mapping attack. We find that LLMs systematically overrate weaker submissions and diverge from humans in topical emphasis, under-flagging Clarity and over-flagging Reproducibility, while producing reviews two to three times longer with lower lexical diversity and a more standardized vocabulary. Prompt injection remains highly effective. Simple hidden instructions can promote low-scoring papers to acceptance-level ratings in a substantial fraction of cases, with effectiveness varying sharply across model families. While LLMs offer utility in structuring evaluations, their integration into peer review requires safeguards against both intrinsic biases and adversarial risks.

90.2CYMay 24
LLM-as-a-Judge in Healthcare: A Scoping Analysis of Applications, Methods, and Human Alignment

Lingyao Li, Deyi Li, Chen Chen et al.

Large language models (LLMs) are increasingly deployed across healthcare applications, including clinical documentation, diagnostic reasoning, medicine recommendation, and medical education. Their outputs are largely unstructured clinical text, which is difficult to reliably evaluate at scale. LLM-as-a-Judge, in which an LLM evaluates another system's output against task-specific criteria, offers a scalable alternative and is increasingly used in clinical evaluation, yet its validity in healthcare remains underexamined. Existing reviews focus on general-purpose LLM evaluation or on risk framework, rather than systematically characterizing how LLM-as-a-Judge is applied in healthcare and how well their judgments align with human experts. We therefore conduct a PRISMA-guided comprehensive review of LLM-as-a-Judge applications in healthcare, searching five databases for studies published between January 2023 and February 2026. After screening 541 records, 134 studies meet the eligibility and are coded by health scenario, judge configuration, technical approach, and validation design. LLM-as-a-Judge is concentrated in clinical decision support, clinical natural language processing (NLP), medical knowledge and question answering (QA), and medical communication. OpenAI models are the most frequently used judges, and prompt engineering appears in nearly all studies, with ensemble, multi-agent, and retrieval-augmented designs as common extensions. Among studies reporting human validation, LLM judges often show moderate to strong alignment with expert judgments, although reliability varies substantially across tasks. Overall, this review positions LLM-as-a-Judge as a promising framework for scalable healthcare AI evaluation, while emphasizing that its clinical value depends on model design and rigorous validation.

89.4HCMar 25
"I Use ChatGPT to Humanize My Words": Affordances and Risks of ChatGPT to Autistic Users

Renkai Ma, Ben Zefeng Zhang, Chen Chen et al.

Large Language Model (LLM) chatbots like ChatGPT have emerged as cognitive scaffolding for autistic users, yet the tension between their utility and risk remains under-articulated. Through an inductive thematic analysis of 3,984 social media posts by self-identified autistic users, we apply a technology affordance lens to examine this duality. We found that while users leveraged ChatGPT to offload executive dysfunction, regulate emotions, translate neurotypical communication, and validate their autistic identity, these affordances coexist with risks to their well-being: reinforcing delusional thinking, erasing authentic identity through automated masking, and triggering conflicts with the autistic sense of justice. As part of our preliminary work, this poster identifies trade-offs in autistic users' interactions with ChatGPT and concludes by outlining our future work on developing neuro-inclusive technologies that address these tensions through beneficial friction, bidirectional translation, and the delineation of emotional validation from reality.

47.0HCMar 25
Negotiating Digital Identities with AI Companions: Motivations, Strategies, and Emotional Outcomes

Renkai Ma, Shuo Niu, Lingyao Li et al.

AI companions enable deep emotional relationships by engaging a user's sense of identity, but they also pose risks like unhealthy emotional dependence. Mitigating these risks requires first understanding the underlying process of identity construction and negotiation with AI companions. Focusing on Character.AI (C.AI), a popular AI companion, we conducted an LLM-assisted thematic analysis of 22,374 online discussions on its subreddit. Using Identity Negotiation Theory as an analytical lens, we identified a three-stage process: 1) five user motivations; 2) an identity negotiation process involving three communication expectations and four identity co-construction strategies; and 3) three emotional outcomes. Our findings surface the identity work users perform as both performers and directors to co-construct identities in negotiation with C.AI. This process takes place within a socio-emotional sandbox where users can experiment with social roles and express emotions without non-human partners. Finally, we offer design implications for emotionally supporting users while mitigating the risks.