Alice Gao

HC
h-index65
7papers
306citations
Novelty37%
AI Score48

7 Papers

CLAug 16, 2024
Risks and NLP Design: A Case Study on Procedural Document QA

Nikita Haduong, Alice Gao, Noah A. Smith · allen-ai, uw

As NLP systems are increasingly deployed at scale, concerns about their potential negative impacts have attracted the attention of the research community, yet discussions of risk have mostly been at an abstract level and focused on generic AI or NLP applications. We argue that clearer assessments of risks and harms to users--and concrete strategies to mitigate them--will be possible when we specialize the analysis to more concrete applications and their plausible users. As an illustration, this paper is grounded in cooking recipe procedural document question answering (ProcDocQA), where there are well-defined risks to users such as injuries or allergic reactions. Our case study shows that an existing language model, applied in "zero-shot" mode, quantitatively answers real-world questions about recipes as well or better than the humans who have answered the questions on the web. Using a novel questionnaire informed by theoretical work on AI risk, we conduct a risk-oriented error analysis that could then inform the design of a future system to be deployed with lower risk of harm and better performance.

97.5HCMay 19
Framing an AI with Values Reduces AI Reliance in AI-supported Writing Tasks

Alice Gao, Andrew N. Meltzoff, Maarten Sap et al.

Despite a global user base adopting large language models (LLMs) for daily writing tasks, model suggestions tend to align with Western values. Research has shown users commonly accept a high fraction of these AI suggestions, homogenizing writing styles and rendering outputs more ``Western'' than intended. While this suggests a need to reduce AI reliance, it remains unknown what kind of interventions could achieve this. Can framing the AI with specific values, and comparing it to one's own, make users less susceptible to overreliance and support more unique writing? We tested this hypothesis in a between-subjects online experiment with Indian and American participants (n=149) in which they were asked to perform AI-supported writing tasks, either 1) without an intervention, 2) after seeing an overview of the AI's framed values, or 3) after seeing an overview of the AI's framed values compared to their own. Our results show that seeing the AI's framed values reduces AI reliance, i.e., the proportion of the final essay generated by the AI, by an average of 20\%. Additionally, when participants saw an overview of the AI's framed values (without comparison to their own values), the final essays contain more unique text than without intervention. Our findings emphasize the importance of educating users about potential value biases in AI, showing that raising awareness with a simple overview of values encourages users to personalize their writing.

16.5CYMar 10
Systematic Review of Academic Procrastination Interventions in Computing Higher Education

Daniel Cheng, Oscar Heath, Daniyaal Farooqi et al.

Academic procrastination is a persistent challenge in computing education, yet evidence on the effectiveness of course-level interventions remains fragmented across diverse designs and contexts. We present a systematic literature review of studies published in the past decade that empirically examine interventions to reduce academic procrastination among post-secondary computing students. Evidence from 19 articles examines interventions that target procrastination through structural, feedback-based, motivational, and self-regulatory mechanisms. Our findings suggest that interventions introducing clear temporal structure consistently promote earlier starts and more distributed work, which act as key mediators of performance gains. The magnitude of these gains depends strongly on task structure, with greater benefits for long-horizon, multi-step assignments than for short, routine tasks. Moreover, supportive designs reliably outperform punitive or restrictive schemes, while uniform interventions yield uneven benefits across students. This review highlights the importance of designing structured, supportive, and personalized interventions to address procrastination in computing education.

85.5HCMar 13
Interrogating Design Homogenization in Web Vibe Coding

Donghoon Shin, Alice Gao, Rock Yuren Pang et al.

Generative AI is known for its tendency to homogenize, often reproducing dominant style conventions found in training data. However, it remains unclear how these homogenizing effects extend to complex structural tasks like web design. As lay creators increasingly turn to LLMs to 'vibe-code' websites -- prompting for aesthetic and functional goals rather than writing code -- they may inadvertently narrow the diversity of their designs, and limit creative expression throughout the internet. In this paper, we interrogate the possibility of design homogenization in web vibe coding. We first characterize the vibe coding lifecycle, pinpointing stages where homogenization risks may arise. We then conduct a sociotechnical risk analysis unpacking the potential harms of web vibe coding and their interaction with design homogenization. We identify that the push for frictionless generation can exacerbate homogenization and its harms. Finally, we propose a mitigation framework centered on the idea of productive friction. Through case studies at the micro, meso, and macro levels, we show how centering productive friction can empower creators to challenge default outputs and preserve diverse expression in AI-mediated web design.

84.2CLApr 28
Training Computer Use Agents to Assess the Usability of Graphical User Interfaces

Alice Gao, Weixi Tong, Rishab Vempati et al.

Usability testing with experts and potential users can assess the effectiveness, efficiency, and user satisfaction of graphical user interfaces (GUIs) but doing so remains a costly and time-intensive process. Prior work has used computer use agents (CUAs) and other generative agents that can simulate user interactions and preference, but we show that agents still struggle to provide accurate usability assessments. In this work, we present a novel machine learning method that operationalizes a computational definition of usability to train CUAs to assess GUI usability by i) prioritizing important interaction flows, ii) executing them through human-like interactions, and iii) predicting a learned numerical usability score. We train a computer use agent, uxCUA, with our algorithm on a large-scale dataset of fully interactive user interfaces (UIs) paired with usability labels and human preferences. We show that uxCUA outperforms larger models in accurate usability assessments and produces realistic critiques of both synthetic and real UIs. More broadly, our work aims to build a principled, data-driven foundation for automated usability assessment in HCI.

HCApr 26, 2024
Don't Look at the Camera: Achieving Perceived Eye Contact

Alice Gao, Samyukta Jayakumar, Marcello Maniglia et al. · uw

We consider the question of how to best achieve the perception of eye contact when a person is captured by camera and then rendered on a 2D display. For single subjects photographed by a camera, conventional wisdom tells us that looking directly into the camera achieves eye contact. Through empirical user studies, we show that it is instead preferable to {\em look just below the camera lens}. We quantitatively assess where subjects should direct their gaze relative to a camera lens to optimize the perception that they are making eye contact.

IVJul 4, 2021
COVID-VIT: Classification of COVID-19 from CT chest images based on vision transformer models

Xiaohong Gao, Yu Qian, Alice Gao

This paper is responding to the MIA-COV19 challenge to classify COVID from non-COVID based on CT lung images. The COVID-19 virus has devastated the world in the last eighteen months by infecting more than 182 million people and causing over 3.9 million deaths. The overarching aim is to predict the diagnosis of the COVID-19 virus from chest radiographs, through the development of explainable vision transformer deep learning techniques, leading to population screening in a more rapid, accurate and transparent way. In this competition, there are 5381 three-dimensional (3D) datasets in total, including 1552 for training, 374 for evaluation and 3455 for testing. While most of the data volumes are in axial view, there are a number of subjects' data are in coronal or sagittal views with 1 or 2 slices are in axial view. Hence, while 3D data based classification is investigated, in this competition, 2D images remains the main focus. Two deep learning methods are studied, which are vision transformer (ViT) based on attention models and DenseNet that is built upon conventional convolutional neural network (CNN). Initial evaluation results based on validation datasets whereby the ground truth is known indicate that ViT performs better than DenseNet with F1 scores being 0.76 and 0.72 respectively. Codes are available at GitHub at <https://github/xiaohong1/COVID-ViT>.