CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety

Heajun An, Qi Zhang, Vedanth Achanta, Jin-Hee Cho

arXiv:2605.2160931.3

AI Analysis

For developers of LLM-based systems used by adolescents, this work addresses the need for safety mechanisms that avoid conversational dead-ends and provide constructive guidance, moving beyond adult-centric refusal-oriented approaches.

The paper proposes CR4T, a framework that rewrites unsafe or refusal-style LLM outputs into age-appropriate, guidance-oriented responses for adolescents, reducing unsafe and refusal outcomes while preserving benign intent.

Large language models (LLMs) are increasingly embedded in adolescent digital environments, mediating information seeking, advice, and emotionally sensitive interactions. Yet existing safety mechanisms remain largely grounded in adult-centric norms and operationalize safety through refusal-oriented suppression. While such approaches may reduce immediate policy violations, they can also create conversational dead-ends, limit constructive guidance, and fail to address the developmental vulnerabilities inherent in adolescent-AI interactions. We argue that adolescent LLM safety should be framed not solely as a filtering problem, but as a socio-technical, developmentally aligned transformation problem. To operationalize this perspective, we propose Critique-and-Revise-for-Teenagers (CR4T), a model-agnostic safeguarding framework that selectively reconstructs unsafe or refusal-style outputs into ageappropriate, guidance-oriented responses while preserving benign intent. CR4T combines lightweight risk detection with domain-conditioned rewriting to remove risk-amplifying content, reduce unnecessary conversational shutdown, and introduce developmentally appropriate guidance. Experimental results show that targeted rewriting substantially reduces unsafe and refusal-oriented outcomes while avoiding unnecessary intervention on acceptable interactions. These findings suggest that selective response reconstruction offers a more human-centered alternative to refusal-centric guardrails for adolescent-facing LLM systems.

View on arXiv PDF

Similar