Mingming Fan

HC
h-index18
18papers
424citations
Novelty39%
AI Score52

18 Papers

HCMar 7, 2023
Collaboration with Conversational AI Assistants for UX Evaluation: Questions and How to Ask them (Voice vs. Text)

Emily Kuang, Ehsan Jahangirzadeh Soure, Mingming Fan et al.

AI is promising in assisting UX evaluators with analyzing usability tests, but its judgments are typically presented as non-interactive visualizations. Evaluators may have questions about test recordings, but have no way of asking them. Interactive conversational assistants provide a Q&A dynamic that may improve analysis efficiency and evaluator autonomy. To understand the full range of analysis-related questions, we conducted a Wizard-of-Oz design probe study with 20 participants who interacted with simulated AI assistants via text or voice. We found that participants asked for five categories of information: user actions, user mental model, help from the AI assistant, product and task information, and user demographics. Those who used the text assistant asked more questions, but the question lengths were similar. The text assistant was perceived as significantly more efficient, but both were rated equally in satisfaction and trust. We also provide design considerations for future conversational AI assistants for UX evaluation.

HCMar 14
"It Became My Buddy, But I'm Not Afraid to Disagree": A Multi-Session Study of UX Evaluators Collaborating with Conversational AI Assistants

Emily Kuang, Ehsan Jahangirzadeh Soure, Luyao Shen et al.

AI-assisted usability analysis can potentially reduce the time and effort of finding usability problems, yet little is known about how AI's perceived expertise influences evaluators' analytic strategies and perceptions over time. We ran a within-subjects, five-session study (six hours per participant) with 12 professional UX evaluators who worked with two conversational assistants designed to appear novice- or expert-like (differing in suggestion quantity and response accuracy). We logged behavioral measures (number of passes, suggestion acceptance rate), collected subjective ratings (trust, perceived efficiency), and conducted semi-structured interviews. Participants experienced an initial novelty effect and a subsequent dip in trust that recovered over time. Their efficiency improved as they shifted from a two-pass to a one-pass video inspection approach. Evaluators ultimately rated the experienced CA as significantly more efficient, trustworthy, and comprehensive, despite not perceiving expertise differences early on. We conclude with design implications for adapting AI expertise to enable calibrated human-AI collaboration.

GRMar 8, 2024
SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting

Zhijing Shao, Zhaolong Wang, Zhuang Li et al.

We present SplattingAvatar, a hybrid 3D representation of photorealistic human avatars with Gaussian Splatting embedded on a triangle mesh, which renders over 300 FPS on a modern GPU and 30 FPS on a mobile device. We disentangle the motion and appearance of a virtual human with explicit mesh geometry and implicit appearance modeling with Gaussian Splatting. The Gaussians are defined by barycentric coordinates and displacement on a triangle mesh as Phong surfaces. We extend lifted optimization to simultaneously optimize the parameters of the Gaussians while walking on the triangle mesh. SplattingAvatar is a hybrid representation of virtual humans where the mesh represents low-frequency motion and surface deformation, while the Gaussians take over the high-frequency geometry and detailed appearance. Unlike existing deformation methods that rely on an MLP-based linear blend skinning (LBS) field for motion, we control the rotation and translation of the Gaussians directly by mesh, which empowers its compatibility with various animation techniques, e.g., skeletal animation, blend shapes, and mesh editing. Trainable from monocular videos for both full-body and head avatars, SplattingAvatar shows state-of-the-art rendering quality across multiple datasets.

CVDec 1, 2025
SpriteHand: Real-Time Versatile Hand-Object Interaction with Autoregressive Video Generation

Zisu Li, Hengye Lyu, Jiaxin Shi et al.

Modeling and synthesizing complex hand-object interactions remains a significant challenge, even for state-of-the-art physics engines. Conventional simulation-based approaches rely on explicitly defined rigid object models and pre-scripted hand gestures, making them inadequate for capturing dynamic interactions with non-rigid or articulated entities such as deformable fabrics, elastic materials, hinge-based structures, furry surfaces, or even living creatures. In this paper, we present SpriteHand, an autoregressive video generation framework for real-time synthesis of versatile hand-object interaction videos across a wide range of object types and motion patterns. SpriteHand takes as input a static object image and a video stream in which the hands are imagined to interact with the virtual object embedded in a real-world scene, and generates corresponding hand-object interaction effects in real time. Our model employs a causal inference architecture for autoregressive generation and leverages a hybrid post-training approach to enhance visual realism and temporal coherence. Our 1.3B model supports real-time streaming generation at around 18 FPS and 640x368 resolution, with an approximate 150 ms latency on a single NVIDIA RTX 5090 GPU, and more than a minute of continuous output. Experiments demonstrate superior visual quality, physical plausibility, and interaction fidelity compared to both generative and engine-based baselines.

HCApr 3
Beyond Compliance: How AI Could Help Creative Writers by Refusing Them

Hua Xuan Qin, Guangzhi Zhu, Mingming Fan et al.

Mainstream creativity support design prioritizes compliant AI for seamless writing interactions, but concerns over inappropriate AI reliance highlight the need for designs fostering reflection on balanced AI and non-AI resource use. Theoretically, intentional AI non-compliance, refusals (saying ``no'' to requests), could introduce such reflection through friction stronger than other bypass-able solutions. Practically, refusal content/language characteristics lead to nuanced reactions. However, little research empirically focuses on nuances beyond mandatory ethical/technical constraints, on turning refusals into strategic friction for `innocuous' requests. We address this through a qualitative study with 22 creative writers, exploring reactions to refusals to common requests across writing stages (planning, translating, reviewing). Findings suggest that reflective potential depends on heterogeneous preference alignment along situational (e.g., convergent/divergent thinking phases), cognitive (e.g., domain beliefs), and relational (e.g., AI roles) dimensions. We discuss implications for creativity support, broader issues (e.g., AI addiction), and frictional/seamful AI design (e.g., integrating different compliance levels).

AIOct 11, 2025
SwarmSys: Decentralized Swarm-Inspired Agents for Scalable and Adaptive Reasoning

Ruohao Li, Hongjun Liu, Leyi Zhao et al.

Large language model (LLM) agents have shown remarkable reasoning abilities. However, existing multi-agent frameworks often rely on fixed roles or centralized control, limiting scalability and adaptability in long-horizon reasoning. We introduce SwarmSys, a closed-loop framework for distributed multi-agent reasoning inspired by swarm intelligence. Coordination in SwarmSys emerges through iterative interactions among three specialized roles, Explorers, Workers, and Validators, that continuously cycle through exploration, exploitation, and validation. To enable scalable and adaptive collaboration, we integrate adaptive agent and event profiles, embedding-based probabilistic matching, and a pheromone-inspired reinforcement mechanism, supporting dynamic task allocation and self-organizing convergence without global supervision. Across symbolic reasoning, research synthesis, and scientific programming tasks, SwarmSys consistently outperforms baselines, improving both accuracy and reasoning stability. These findings highlight swarm-inspired coordination as a promising paradigm for scalable, robust, and adaptive multi-agent reasoning, suggesting that coordination scaling may rival model scaling in advancing LLM intelligence.

CLAug 11, 2025
Exploring Safety Alignment Evaluation of LLMs in Chinese Mental Health Dialogues via LLM-as-Judge

Yunna Cai, Fan Wang, Haowei Wang et al.

Evaluating the safety alignment of LLM responses in high-risk mental health dialogues is particularly difficult due to missing gold-standard answers and the ethically sensitive nature of these interactions. To address this challenge, we propose PsyCrisis-Bench, a reference-free evaluation benchmark based on real-world Chinese mental health dialogues. It evaluates whether the model responses align with the safety principles defined by experts. Specifically designed for settings without standard references, our method adopts a prompt-based LLM-as-Judge approach that conducts in-context evaluation using expert-defined reasoning chains grounded in psychological intervention principles. We employ binary point-wise scoring across multiple safety dimensions to enhance the explainability and traceability of the evaluation. Additionally, we present a manually curated, high-quality Chinese-language dataset covering self-harm, suicidal ideation, and existential distress, derived from real-world online discourse. Experiments on 3600 judgments show that our method achieves the highest agreement with expert assessments and produces more interpretable evaluation rationales compared to existing approaches. Our dataset and evaluation tool are publicly available to facilitate further research.

GRMar 14, 2024
HeadEvolver: Text to Head Avatars via Expressive and Attribute-Preserving Mesh Deformation

Duotun Wang, Hengyu Meng, Zeyu Cai et al.

Current text-to-avatar methods often rely on implicit representations (e.g., NeRF, SDF, and DMTet), leading to 3D content that artists cannot easily edit and animate in graphics software. This paper introduces a novel framework for generating stylized head avatars from text guidance, which leverages locally learnable mesh deformation and 2D diffusion priors to achieve high-quality digital assets for attribute-preserving manipulation. Given a template mesh, our method represents mesh deformation with per-face Jacobians and adaptively modulates local deformation using a learnable vector field. This vector field enables anisotropic scaling while preserving the rotation of vertices, which can better express identity and geometric details. We employ landmark- and contour-based regularization terms to balance the expressiveness and plausibility of generated avatars from multiple views without relying on any specific shape prior. Our framework can generate realistic shapes and textures that can be further edited via text, while supporting seamless editing using the preserved attributes from the template mesh, such as 3DMM parameters, blendshapes, and UV coordinates. Extensive experiments demonstrate that our framework can generate diverse and expressive head avatars with high-quality meshes that artists can easily manipulate in graphics software, facilitating downstream applications such as efficient asset creation and animation with preserved attributes.

HCFeb 23, 2022
Understanding How Older Adults Comprehend COVID-19 Interactive Visualizations via Think-Aloud Protocol

Mingming Fan, Yiwen Wang, Yuni Xie et al.

Older adults have been hit disproportionally hard by the COVID-19 pandemic. One critical way for older adults to minimize the negative impact of COVID-19 and future pandemics is to stay informed about its latest information, which has been increasingly presented through online interactive visualizations (e.g., live dashboards and websites). Thus, it is imperative to understand how older adults interact with and comprehend online COVID-19 interactive visualizations and what challenges they might encounter to make such visualizations more accessible to older adults. We adopted a user-centered approach by inviting older adults to interact with COVID-19 interactive visualizations while at the same time verbalizing their thought processes using a think-aloud protocol. By analyzing their think-aloud verbalizations, we identified four types of thought processes representing how older adults comprehended the visualizations and uncovered the challenges they encountered. Furthermore, we also identified the challenges they encountered with seven common types of interaction techniques adopted by the visualizations. Based on the findings, we present design guidelines for making interactive visualizations more accessible to older adults.

HCFeb 12, 2022
"I Don't Want People to Look At Me Differently": Designing User-Defined Above-the-Neck Gestures for People with Upper Body Motor Impairments

Xuan Zhao, Mingming Fan, Teng Han

Recent research proposed eyelid gestures for people with upper-body motor impairments (UMI) to interact with smartphones without finger touch. However, such eyelid gestures were designed by researchers. It remains unknown what eyelid gestures people with UMI would want and be able to perform. Moreover, other above-the-neck body parts (e.g., mouth, head) could be used to form more gestures. We conducted a user study in which 17 people with UMI designed above-the-neck gestures for 26 common commands on smartphones. We collected a total of 442 user-defined gestures involving the eyes, the mouth, and the head. Participants were more likely to make gestures with their eyes and preferred gestures that were simple, easy-to-remember, and less likely to draw attention from others. We further conducted a survey (N=24) to validate the usability and acceptance of these user-defined gestures. Results show that user-defined gestures were acceptable to both people with and without motor impairments.

HCFeb 7, 2022
Think-Aloud Verbalizations for Identifying User Experience Problems: Effects of Language Proficiency with Chinese Non-Native English Speakers

Mingming Fan, Lingyun Zhu

Subtle patterns in users' think-aloud (TA) verbalizations (i.e., utterances) are shown to be telltale signs of user experience (UX) problems and used to build artificial intelligence (AI) models or AI-assisted tools to help UX evaluators identify UX problems automatically or semi-automatically. Despite the potential of such verbalization patterns, they were uncovered with native English speakers. As most people who speak English are non-native speakers, it is important to investigate whether similar patterns exist in non-native English speakers' TA verbalizations. As a first step to answer this question, we conducted think-aloud usability testing with Chinese non-native English speakers and native English speakers using three common TA protocols. We compared their verbalizations and UX problems that they encountered to understand the effects of language and TA protocols. Our findings show that both language groups had similar amounts and proportions of verbalization categories, encountered similar problems, and had similar verbalization patterns that indicate UX problems. Furthermore, TA protocols did not significantly affect the correlations between verbalizations and problems. Based on the findings, we present three design implications for UX practitioners and the design of AI-assisted analysis tools.

HCFeb 6, 2022
From `Wow' to `Why': Guidelines for Creating the Opening of a Data Video with Cinematic Styles

Xian Xu, Leni Yang, David Yip et al.

Data videos are an increasingly popular storytelling form. The opening of a data video critically influences its success as the opening either attracts the audience to continue watching or bores them to abandon watching. However, little is known about how to create an attractive opening. We draw inspiration from the openings of famous films to facilitate designing data video openings. First, by analyzing over 200 films from several sources, we derived six primary cinematic opening styles adaptable to data videos. Then, we consulted eight experts from the film industry to formulate 28 guidelines. To validate the usability and effectiveness of the guidelines, we asked participants to create data video openings with and without the guidelines, which were then evaluated by experts and the general public. Results showed that the openings designed with the guidelines were perceived to be more attractive, and the guidelines were praised for clarity and inspiration.

HCFeb 6, 2022
"I Shake The Package To Check If It's Mine": A Study of Package Fetching Practices and Challenges of Blind and Low Vision People in China

Wentao Lei, Mingming Fan, Juliann Thang

With about 230 million packages delivered per day in 2020, fetching packages has become a routine for many city dwellers in China. When fetching packages, people usually need to go to collection sites of their apartment complexes or a KuaiDiGui, an increasingly popular type of self-service package pickup machine. However, little is known whether such processes are accessible to blind and low vision (BLV) city dwellers. We interviewed BLV people (N=20) living in a large metropolitan area in China to understand their practices and challenges of fetching packages. Our findings show that participants encountered difficulties in finding the collection site and localizing and recognizing their packages. When fetching packages from KuaiDiGuis, they had difficulty in identifying the correct KuaiDiGui, interacting with its touch screen, navigating the complex on-screen workflow, and opening the target compartment. We discuss design considerations to make the package fetching process more accessible to the BLV community.

HCDec 23, 2021
Human-AI Collaboration for UX Evaluation: Effects of Explanation and Synchronization

Mingming Fan, Xianyou Yang, Tsz Tung Yu et al.

Analyzing usability test videos is arduous. Although recent research showed the promise of AI in assisting with such tasks, it remains largely unknown how AI should be designed to facilitate effective collaboration between user experience (UX) evaluators and AI. Inspired by the concepts of agency and work context in human and AI collaboration literature, we studied two corresponding design factors for AI-assisted UX evaluation: explanations and synchronization. Explanations allow AI to further inform humans how it identifies UX problems from a usability test session; synchronization refers to the two ways humans and AI collaborate: synchronously and asynchronously. We iteratively designed a tool, AI Assistant, with four versions of UIs corresponding to the two levels of explanations (with/without) and synchronization (sync/async). By adopting a hybrid wizard-of-oz approach to simulating an AI with reasonable performance, we conducted a mixed-method study with 24 UX evaluators identifying UX problems from usability test videos using AI Assistant. Our quantitative and qualitative results show that AI with explanations, regardless of being presented synchronously or asynchronously, provided better support for UX evaluators' analysis and was perceived more positively; when without explanations, synchronous AI better improved UX evaluators' performance and engagement compared to the asynchronous AI. Lastly, we present the design implications for AI-assisted UX evaluation and facilitating more effective human-AI collaboration.

HCFeb 21, 2021
EvoK: Connecting loved ones through Heart Rate sharing

Esha Shandilya, Yiwen Wang, Xuan Zhao et al.

In this work, we present EvoK, a new way of sharing one's heart rate with feedback from their close contacts to alleviate social isolation and loneliness. EvoK consists of a pair of wearable prototype devices (i.e., sender and receiver). The sender is designed as a headband enabling continuous sensing of heart rate with aesthetic designs to maximize social acceptance. The receiver is designed as a wristwatch enabling unobtrusive receiving of the loved one's continuous heart rate with multi-modal notification systems.

HCJan 22, 2021
"I Choose Assistive Devices That Save My Face" A Study on Perceptions of Accessibility and Assistive Technology Use Conducted in China

Franklin Mingzhe Li, Di Laura Chen, Mingming Fan et al.

Despite the potential benefits of assistive technologies (ATs) for people with various disabilities, only around 7% of Chinese with disabilities have had an opportunity to use ATs. Even for those who have used ATs, the abandonment rate was high. Although China has the world's largest population with disabilities, prior research exploring how ATs are used and perceived, and why ATs are abandoned have been conducted primarily in North America and Europe. In this paper, we present an interview study conducted in China with 26 people with various disabilities to understand their practices, challenges, perceptions, and misperceptions of using ATs. From the study, we learned about factors that influence AT adoption practices (e.g., misuse of accessible infrastructure, issues with replicating existing commercial ATs), challenges using ATs in social interactions (e.g., Chinese stigma), and misperceptions about ATs (e.g., ATs should overcome inaccessible social infrastructures). Informed by the findings, we derive a set of design considerations to bridge the existing gaps in AT design (e.g., manual vs. electronic ATs) and to improve ATs' social acceptability in China.

HCOct 12, 2017
An empirical study of touch-based authentication methods on smartwatches

Yue Zhao, Zhongtian Qiu, Yiqing Yang et al.

The emergence of smartwatches poses new challenges to information security. Although there are mature touch-based authentication methods for smartphones, the effectiveness of using these methods on smartwatches is still unclear. We conducted a user study (n=16) to evaluate how authentication methods (PIN and Pattern), UIs (Square and Circular), and display sizes (38mm and 42mm) affect authentication accuracy, speed, and security. Circular UIs are tailored to smartwatches with fewer UI elements. Results show that 1) PIN is more accurate and secure than Pattern; 2) Pattern is much faster than PIN; 3) Square UIs are more secure but less accurate than Circular UIs; 4) display size does not affect accuracy or speed, but security; 5) Square PIN is the most secure method of all. The study also reveals a security concern that participants' favorite method is not the best in any of the measures. We finally discuss implications for future touch-based smartwatch authentication design.

IRJan 5, 2014
Predicting a Business Star in Yelp from Its Reviews Text Alone

Mingming Fan, Maryam Khademi

Yelp online reviews are invaluable source of information for users to choose where to visit or what to eat among numerous available options. But due to overwhelming number of reviews, it is almost impossible for users to go through all reviews and find the information they are looking for. To provide a business overview, one solution is to give the business a 1-5 star(s). This rating can be subjective and biased toward users personality. In this paper, we predict a business rating based on user-generated reviews texts alone. This not only provides an overview of plentiful long review texts but also cancels out subjectivity. Selecting the restaurant category from Yelp Dataset Challenge, we use a combination of three feature generation methods as well as four machine learning models to find the best prediction result. Our approach is to create bag of words from the top frequent words in all raw text reviews, or top frequent words/adjectives from results of Part-of-Speech analysis. Our results show Root Mean Square Error (RMSE) of 0.6 for the combination of Linear Regression with either of the top frequent words from raw data or top frequent adjectives after Part-of-Speech (POS).