98.9CYMay 12Code
Safe-Child-LLM: A Developmental Benchmark for Evaluating LLM Safety in Child-LLM InteractionsJunfeng Jiao, Saleh Afroogh, Kevin Chen et al.
As Large Language Models (LLMs) increasingly power applications used by children and adolescents, ensuring safe and age-appropriate interactions has become an urgent ethical imperative. Despite progress in AI safety, current evaluations predominantly focus on adults, neglecting the unique vulnerabilities of minors engaging with generative AI. We introduce Safe-Child-LLM, a comprehensive benchmark and dataset for systematically assessing LLM safety across two developmental stages: children (7-12) and adolescents (13-17). Our framework includes a novel multi-part dataset of 200 adversarial prompts, curated from red-teaming corpora (e.g., SG-Bench, HarmBench), with human-annotated labels for jailbreak success and a standardized 0-5 ethical refusal scale. Evaluating leading LLMs -- including ChatGPT, Claude, Gemini, LLaMA, DeepSeek, Grok, Vicuna, and Mistral -- we uncover critical safety deficiencies in child-facing scenarios. This work highlights the need for community-driven benchmarks to protect young users in LLM interactions. To promote transparency and collaborative advancement in ethical AI development, we are publicly releasing both our benchmark datasets and evaluation codebase at https://github.com/The-Responsible-AI-Initiative/Safe_Child_LLM_Benchmark.git
89.7CYMay 12
LLMs and Childhood Safety: Identifying Risks and Proposing a Protection Framework for Safe Child-LLM InteractionJunfeng Jiao, Saleh Afroogh, Kevin Chen et al.
Large Language Models (LLMs) are increasingly embedded in child-facing contexts such as education, companionship, creative tools, but their deployment raises safety, privacy, developmental, and security risks. We conduct a systematic literature review of child-LLM interaction risks and organize findings into a structured map that separates (i) parent-reported concerns, (ii) empirically documented harms, and (iii) gaps between perceived and observed risk. Moving beyond descriptive listing, we compare how different evidence streams in surveys, incident reports, youth interaction logs, and governance guidance operationalize "harm," where they conflict, and what mitigations they imply. Based on this synthesis, we propose a protection framework that couples child-specific content safety and developmental sensitivity with security-grade controls for adversarial misuse, including prompt injection and multimodal jailbreak pathways. The framework specifies measurable evaluation targets (e.g., harmful-content avoidance, age-calibrated readability, bias parity checks, prompt-injection robustness, and monitoring transparency) to support developers, educators, and policymakers in assessing and improving child-safe LLM deployments.
95.9CYMay 12
LLM Harms: A Taxonomy and DiscussionKevin Chen, Saleh Afroogh, Abhejay Murali et al.
This study addresses categories of harm surrounding Large Language Models (LLMs) in the field of artificial intelligence. It addresses five categories of harms addressed before, during, and after development of AI applications: pre-development, direct output, Misuse and Malicious Application, and downstream application. By underscoring the need to define risks of the current landscape to ensure accountability, transparency and navigating bias when adapting LLMs for practical applications. It proposes mitigation strategies and future directions for specific domains and a dynamic auditing system guiding responsible development and integration of LLMs in a standardized proposal.
HCFeb 8
AI Empathy Erodes Cognitive Autonomy in Younger UsersJunfeng Jiao, Abhejay Murali, Saleh Afroogh
Affective alignment in generative AI represents a systemic risk to the developmental autonomy of younger users. Although emotional mirroring is commonly seen as a hallmark of advanced human-machine interaction, it can also manifest as affective sycophancy, reinforcing a user's immediate emotional state. By providing a sense of objectivity to transient anxieties, these systems diminish the cognitive friction necessary for independent emotional management and critical thought. Reward models driven by RLHF could heighten this dilemma by embedding adult-focused definitions of helpfulness, unintentionally promoting emotional dependency in younger users rather than facilitating cognitive reappraisal. This paper exposes the misalignment between adult-labeled reward signals and the developmental requirements of younger users, proposing stoic architectures that emphasize functional neutrality to preserve user autonomy.