MLDec 7, 2022
Metric Elicitation; Moving from Theory to PracticeSafinah Ali, Sohini Upadhyay, Gaurush Hiranandani et al. · harvard
Metric Elicitation (ME) is a framework for eliciting classification metrics that better align with implicit user preferences based on the task and context. The existing ME strategy so far is based on the assumption that users can most easily provide preference feedback over classifier statistics such as confusion matrices. This work examines ME, by providing a first ever implementation of the ME strategy. Specifically, we create a web-based ME interface and conduct a user study that elicits users' preferred metrics in a binary classification setting. We discuss the study findings and present guidelines for future research in this direction.
HCNov 22, 2023
Studying Artist Sentiments around AI-generated ArtworkSafinah Ali, Cynthia Breazeal
Art created using generated Artificial Intelligence has taken the world by storm and generated excitement for many digital creators and technologists. However, the reception and reaction from artists have been mixed. Concerns about plagiarizing their artworks and styles for datasets and uncertainty around the future of digital art sparked movements in artist communities shunning the use of AI for generating art and protecting artists' rights. Collaborating with these tools for novel creative use cases also sparked hope from some creators. Artists are an integral stakeholder in the rapidly evolving digital creativity industry and understanding their concerns and hopes inform responsible development and use of creativity support tools. In this work, we study artists' sentiments about AI-generated art. We interviewed 7 artists and analyzed public posts from artists on social media platforms Reddit, Twitter and Artstation. We report artists' main concerns and hopes around AI-generated artwork, informing a way forward for inclusive development of these tools.
AIApr 24, 2025
Auditing the Ethical Logic of Generative AI ModelsW. Russell Neuman, Chad Coleman, Ali Dasdan et al.
As generative AI models become increasingly integrated into high-stakes domains, the need for robust methods to evaluate their ethical reasoning becomes increasingly important. This paper introduces a five-dimensional audit model -- assessing Analytic Quality, Breadth of Ethical Considerations, Depth of Explanation, Consistency, and Decisiveness -- to evaluate the ethical logic of leading large language models (LLMs). Drawing on traditions from applied ethics and higher-order thinking, we present a multi-battery prompt approach, including novel ethical dilemmas, to probe the models' reasoning across diverse contexts. We benchmark seven major LLMs finding that while models generally converge on ethical decisions, they vary in explanatory rigor and moral prioritization. Chain-of-Thought prompting and reasoning-optimized models significantly enhance performance on our audit metrics. This study introduces a scalable methodology for ethical benchmarking of AI systems and highlights the potential for AI to complement human moral reasoning in complex decision-making contexts.
CLMay 20, 2025
MAATS: A Multi-Agent Automated Translation System Based on MQM EvaluationGeorge Wang, Jiaqian Hu, Safinah Ali
We present MAATS, a Multi Agent Automated Translation System that leverages the Multidimensional Quality Metrics (MQM) framework as a fine-grained signal for error detection and refinement. MAATS employs multiple specialized AI agents, each focused on a distinct MQM category (e.g., Accuracy, Fluency, Style, Terminology), followed by a synthesis agent that integrates the annotations to iteratively refine translations. This design contrasts with conventional single-agent methods that rely on self-correction. Evaluated across diverse language pairs and Large Language Models (LLMs), MAATS outperforms zero-shot and single-agent baselines with statistically significant gains in both automatic metrics and human assessments. It excels particularly in semantic accuracy, locale adaptation, and linguistically distant language pairs. Qualitative analysis highlights its strengths in multi-layered error diagnosis, omission detection across perspectives, and context-aware refinement. By aligning modular agent roles with interpretable MQM dimensions, MAATS narrows the gap between black-box LLMs and human translation workflows, shifting focus from surface fluency to deeper semantic and contextual fidelity.
AIApr 27, 2025
The Convergent Ethics of AI? Analyzing Moral Foundation Priorities in Large Language Models with a Multi-Framework ApproachChad Coleman, W. Russell Neuman, Ali Dasdan et al.
As large language models (LLMs) are increasingly deployed in consequential decision-making contexts, systematically assessing their ethical reasoning capabilities becomes a critical imperative. This paper introduces the Priorities in Reasoning and Intrinsic Moral Evaluation (PRIME) framework--a comprehensive methodology for analyzing moral priorities across foundational ethical dimensions including consequentialist-deontological reasoning, moral foundations theory, and Kohlberg's developmental stages. We apply this framework to six leading LLMs through a dual-protocol approach combining direct questioning and response analysis to established ethical dilemmas. Our analysis reveals striking patterns of convergence: all evaluated models demonstrate strong prioritization of care/harm and fairness/cheating foundations while consistently underweighting authority, loyalty, and sanctity dimensions. Through detailed examination of confidence metrics, response reluctance patterns, and reasoning consistency, we establish that contemporary LLMs (1) produce decisive ethical judgments, (2) demonstrate notable cross-model alignment in moral decision-making, and (3) generally correspond with empirically established human moral preferences. This research contributes a scalable, extensible methodology for ethical benchmarking while highlighting both the promising capabilities and systematic limitations in current AI moral reasoning architectures--insights critical for responsible development as these systems assume increasingly significant societal roles.
CLJul 8, 2025
"Amazing, They All Lean Left" -- Analyzing the Political Temperaments of Current LLMsW. Russell Neuman, Chad Coleman, Ali Dasdan et al.
Recent studies have revealed a consistent liberal orientation in the ethical and political responses generated by most commercial large language models (LLMs), yet the underlying causes and resulting implications remain unclear. This paper systematically investigates the political temperament of seven prominent LLMs - OpenAI's GPT-4o, Anthropic's Claude Sonnet 4, Perplexity (Sonar Large), Google's Gemini 2.5 Flash, Meta AI's Llama 4, Mistral 7b Le Chat and High-Flyer's DeepSeek R1 -- using a multi-pronged approach that includes Moral Foundations Theory, a dozen established political ideology scales and a new index of current political controversies. We find strong and consistent prioritization of liberal-leaning values, particularly care and fairness, across most models. Further analysis attributes this trend to four overlapping factors: Liberal-leaning training corpora, reinforcement learning from human feedback (RLHF), the dominance of liberal frameworks in academic ethical discourse and safety-driven fine-tuning practices. We also distinguish between political "bias" and legitimate epistemic differences, cautioning against conflating the two. A comparison of base and fine-tuned model pairs reveals that fine-tuning generally increases liberal lean, an effect confirmed through both self-report and empirical testing. We argue that this "liberal tilt" is not a programming error or the personal preference of programmers but an emergent property of training on democratic rights-focused discourse. Finally, we propose that LLMs may indirectly echo John Rawls' famous veil-of ignorance philosophical aspiration, reflecting a moral stance unanchored to personal identity or interest. Rather than undermining democratic discourse, this pattern may offer a new lens through which to examine collective reasoning.
AIMay 5, 2025
Evaluating the Impact of AI-Powered Audiovisual Personalization on Learner Emotion, Focus, and Learning OutcomesGeorge Xi Wang, Jingying Deng, Safinah Ali
Independent learners often struggle with sustaining focus and emotional regulation in unstructured or distracting settings. Although some rely on ambient aids such as music, ASMR, or visual backgrounds to support concentration, these tools are rarely integrated into cohesive, learner-centered systems. Moreover, existing educational technologies focus primarily on content adaptation and feedback, overlooking the emotional and sensory context in which learning takes place. Large language models have demonstrated powerful multimodal capabilities including the ability to generate and adapt text, audio, and visual content. Educational research has yet to fully explore their potential in creating personalized audiovisual learning environments. To address this gap, we introduce an AI-powered system that uses LLMs to generate personalized multisensory study environments. Users select or generate customized visual themes (e.g., abstract vs. realistic, static vs. animated) and auditory elements (e.g., white noise, ambient ASMR, familiar vs. novel sounds) to create immersive settings aimed at reducing distraction and enhancing emotional stability. Our primary research question investigates how combinations of personalized audiovisual elements affect learner cognitive load and engagement. Using a mixed-methods design that incorporates biometric measures and performance outcomes, this study evaluates the effectiveness of LLM-driven sensory personalization. The findings aim to advance emotionally responsive educational technologies and extend the application of multimodal LLMs into the sensory dimension of self-directed learning.
CYMay 29, 2023
AI Audit: A Card Game to Reflect on Everyday AI SystemsSafinah Ali, Vishesh Kumar, Cynthia Breazeal
An essential element of K-12 AI literacy is educating learners about the ethical and societal implications of AI systems. Previous work in AI ethics literacy have developed curriculum and classroom activities that engage learners in reflecting on the ethical implications of AI systems and developing responsible AI. There is little work in using game-based learning methods in AI literacy. Games are known to be compelling media to teach children about complex STEM concepts. In this work, we developed a competitive card game for middle and high school students called "AI Audit" where they play as AI start-up founders building novel AI-powered technology. Players can challenge other players with potential harms of their technology or defend their own businesses by features that mitigate these harms. The game mechanics reward systems that are ethically developed or that take steps to mitigate potential harms. In this paper, we present the game design, teacher resources for classroom deployment and early playtesting results. We discuss our reflections about using games as teaching tools for AI literacy in K-12 classrooms.
HCMay 19, 2023
Constructing Dreams using Generative AISafinah Ali, Daniella DiPaola, Randi Williams et al.
Generative AI tools introduce new and accessible forms of media creation for youth. They also raise ethical concerns about the generation of fake media, data protection, privacy and ownership of AI-generated art. Since generative AI is already being used in products used by youth, it is critical that they understand how these tools work and how they can be used or misused. In this work, we facilitated students' generative AI learning through expression of their imagined future identities. We designed a learning workshop - Dreaming with AI - where students learned about the inner workings of generative AI tools, used text-to-image generation algorithms to create their imaged future dreams, reflected on the potential benefits and harms of generative AI tools and voiced their opinions about policies for the use of these tools in classrooms. In this paper, we present the learning activities and experiences of 34 high school students who engaged in our workshops. Students reached creative learning objectives by using prompt engineering to create their future dreams, gained technical knowledge by learning the abilities, limitations, text-visual mappings and applications of generative AI, and identified most potential societal benefits and harms of generative AI.
CYNov 13, 2021
Introducing Variational Autoencoders to High School StudentsZhuoyue Lyu, Safinah Ali, Cynthia Breazeal
Generative Artificial Intelligence (AI) models are a compelling way to introduce K-12 students to AI education using an artistic medium, and hence have drawn attention from K-12 AI educators. Previous Creative AI curricula mainly focus on Generative Adversarial Networks (GANs) while paying less attention to Autoregressive Models, Variational Autoencoders (VAEs), or other generative models, which have since become common in the field of generative AI. VAEs' latent-space structure and interpolation ability could effectively ground the interdisciplinary learning of AI, creative arts, and philosophy. Thus, we designed a lesson to teach high school students about VAEs. We developed a web-based game and used Plato's cave, a philosophical metaphor, to introduce how VAEs work. We used a Google Colab notebook for students to re-train VAEs with their hand-written digits to consolidate their understandings. Finally, we guided the exploration of creative VAE tools such as SketchRNN and MusicVAE to draw the connection between what they learned and real-world applications. This paper describes the lesson design and shares insights from the pilot studies with 22 students. We found that our approach was effective in teaching students about a novel AI concept.
HCOct 27, 2021
Telling Creative Stories Using Generative Visual AidsSafinah Ali, Devi Parikh
Can visual artworks created using generative visual algorithms inspire human creativity in storytelling? We asked writers to write creative stories from a starting prompt, and provided them with visuals created by generative AI models from the same prompt. Compared to a control group, writers who used the visuals as story writing aid wrote significantly more creative, original, complete and visualizable stories, and found the task more fun. Of the generative algorithms used (BigGAN, VQGAN, DALL-E, CLIPDraw), VQGAN was the most preferred. The control group that did not view the visuals did significantly better in integrating the starting prompts. Findings indicate that cross modality inputs by AI can benefit divergent aspects of creativity in human-AI co-creation, but hinders convergent thinking.
ROMay 1, 2021
Designing Games for Enabling Co-creation with Social AgentsSafinah Ali, Nisha Devasia, Cynthia Breazeal
Digital tools have long been used for supporting children's creativity. Digital games that allow children to create artifacts and express themselves in a playful environment serve as efficient Creativity Support Tools (or CSTs). Creativity is also scaffolded by social interactions with others in their environment. In our work, we explore the use of game-based interactions with a social agent to scaffold children's creative expression as game players. We designed three collaborative games and play-tested with 146 5-10 year old children played with the social robot Jibo, which affords three different kinds of creativity: verbal creativity, figural creativity and divergent thinking during creative problem solving. In this paper, we reflect on game mechanic practices that we incorporated to design for stimulating creativity in children. These strategies may be valuable to game designers and HCI researchers designing games and social agents for supporting children's creativity.