CL AIAug 14, 2025

SproutBench: A Benchmark for Safe and Ethical Large Language Models for Youth

Wenpeng Xing, Lanyi Wei, Haixiao Hu, Rongchang Li, Mohan Li, Changting Lin, Meng Han

arXiv:2508.11009v12 citationsh-index: 7

Originality Synthesis-oriented

AI Analysis

This addresses safety risks for children and adolescents using LLMs, representing a domain-specific incremental advance by extending existing benchmarks to cover age-specific vulnerabilities.

The paper tackled the problem of inadequate safety frameworks for large language models (LLMs) used by youth by introducing SproutBench, a benchmark with 1,283 prompts, and found substantial safety vulnerabilities in 47 LLMs, including correlations like an inverse relationship between Interactivity and Age Appropriateness.

The rapid proliferation of large language models (LLMs) in applications targeting children and adolescents necessitates a fundamental reassessment of prevailing AI safety frameworks, which are largely tailored to adult users and neglect the distinct developmental vulnerabilities of minors. This paper highlights key deficiencies in existing LLM safety benchmarks, including their inadequate coverage of age-specific cognitive, emotional, and social risks spanning early childhood (ages 0--6), middle childhood (7--12), and adolescence (13--18). To bridge these gaps, we introduce SproutBench, an innovative evaluation suite comprising 1,283 developmentally grounded adversarial prompts designed to probe risks such as emotional dependency, privacy violations, and imitation of hazardous behaviors. Through rigorous empirical evaluation of 47 diverse LLMs, we uncover substantial safety vulnerabilities, corroborated by robust inter-dimensional correlations (e.g., between Safety and Risk Prevention) and a notable inverse relationship between Interactivity and Age Appropriateness. These insights yield practical guidelines for advancing child-centric AI design and deployment.

View on arXiv PDF

Similar