Saleh Afroogh

CY
h-index8
12papers
450citations
Novelty31%
AI Score51

12 Papers

CYMay 12Code
Safe-Child-LLM: A Developmental Benchmark for Evaluating LLM Safety in Child-LLM Interactions

Junfeng Jiao, Saleh Afroogh, Kevin Chen et al.

As Large Language Models (LLMs) increasingly power applications used by children and adolescents, ensuring safe and age-appropriate interactions has become an urgent ethical imperative. Despite progress in AI safety, current evaluations predominantly focus on adults, neglecting the unique vulnerabilities of minors engaging with generative AI. We introduce Safe-Child-LLM, a comprehensive benchmark and dataset for systematically assessing LLM safety across two developmental stages: children (7-12) and adolescents (13-17). Our framework includes a novel multi-part dataset of 200 adversarial prompts, curated from red-teaming corpora (e.g., SG-Bench, HarmBench), with human-annotated labels for jailbreak success and a standardized 0-5 ethical refusal scale. Evaluating leading LLMs -- including ChatGPT, Claude, Gemini, LLaMA, DeepSeek, Grok, Vicuna, and Mistral -- we uncover critical safety deficiencies in child-facing scenarios. This work highlights the need for community-driven benchmarks to protect young users in LLM interactions. To promote transparency and collaborative advancement in ethical AI development, we are publicly releasing both our benchmark datasets and evaluation codebase at https://github.com/The-Responsible-AI-Initiative/Safe_Child_LLM_Benchmark.git

CYMay 25
Intelligent Environmental Empathy (IEE): A new power and platform to fostering green obligation for climate peace and justice

Saleh Afroogh, Ali Mostafavi, Junfeng Jiao

In this paper, we propose Intelligent Environmental Empathy (IEE) as a new driver for climate peace and justice, as an emerging issue in the age of big data. We first show that the authoritarian top-down intergovernmental cooperation, through international organizations (e.g., UNEP) for climate justice, could not overcome environmental issues and crevices so far. We elaborate on four grounds of climate injustice (i.e., teleological origin, axiological origin, formation cause, and social epistemic cause), and explain how the lack of empathy and environmental motivation on a global scale causes the failure of all the authoritarian top-down intergovernmental cooperation. Addressing all these issues requires a new button-up approach to climate peace and justice. Secondly, focusing on the intersection of AI, environmental empathy, and climate justice, we propose a model of Intelligent Environmental Empathy (IEE) for climate peace and justice at the operational level. IEE is empowered by the new power of environmental empathy (as a driver of green obligation for climate justice) and putative decentralized platform of AI (as an operative system against free riders), which Initially, impact citizens and some middle-class decision makers, such as city planners and local administrators, but will eventually affect global decision-makers as well.

CYMay 12
A Task-Driven Human-AI Collaboration: When to Automate, When to Collaborate, When to Challenge

Saleh Afroogh, Kush R. Varshney, Jason D'Cruz

According to several empirical investigations, despite enhancing human capabilities, human-AI cooperation frequently falls short of expectations and fails to reach true synergy. We propose a task-driven framework that reverses prevalent approaches by assigning AI roles according to how the task's requirements align with the capabilities of AI technology. Three major AI roles are identified through task analysis across risk and complexity dimensions: autonomous, assistive/collaborative, and adversarial. We show how proper human-AI integration maintains meaningful agency while improving performance by methodically mapping these roles to various task types based on current empirical findings. This framework lays the foundation for practically effective and morally sound human-AI collaboration that unleashes human potential by aligning task attributes to AI capabilities. It also provides structured guidance for context-sensitive automation that complements human strengths rather than replacing human judgment.

CYMar 11
Beyond Explainable AI (XAI): An Overdue Paradigm Shift and Post-XAI Research Directions

Saleh Afroogh, Seyd Ishtiaque Ahmed, Petra Ahrweiler et al. · cmu

This study provides a cross-disciplinary examination of Explainable Artificial Intelligence (XAI) approaches-focusing on deep neural networks (DNNs) and large language models (LLMs)-and identifies empirical and conceptual limitations in current XAI. We discuss critical symptoms that stem from deeper root causes (i.e., two paradoxes, two conceptual confusions, and five false assumptions). These fundamental problems within the current XAI research field reveal three insights: experimentally, XAI exhibits significant flaws; conceptually, it is paradoxical; and pragmatically, further attempts to reform the paradoxical XAI might exacerbate its confusion-demanding fundamental shifts and new research directions. To move beyond XAI's limitations, we propose a four-pronged synthesized paradigm shift toward reliable and certified AI development. These four components include: verification-focused Interactive AI (IAI) to establish scientific community protocols for certifying AI system performance rather than attempting post-hoc explanations, AI Epistemology for rigorous scientific foundations, User-Sensible AI to create context-aware systems tailored to specific user communities, and Model-Centered Interpretability for faithful technical analysis-together offering comprehensive post-XAI research directions.

CYMay 12
LLMs and Childhood Safety: Identifying Risks and Proposing a Protection Framework for Safe Child-LLM Interaction

Junfeng Jiao, Saleh Afroogh, Kevin Chen et al.

Large Language Models (LLMs) are increasingly embedded in child-facing contexts such as education, companionship, creative tools, but their deployment raises safety, privacy, developmental, and security risks. We conduct a systematic literature review of child-LLM interaction risks and organize findings into a structured map that separates (i) parent-reported concerns, (ii) empirically documented harms, and (iii) gaps between perceived and observed risk. Moving beyond descriptive listing, we compare how different evidence streams in surveys, incident reports, youth interaction logs, and governance guidance operationalize "harm," where they conflict, and what mitigations they imply. Based on this synthesis, we propose a protection framework that couples child-specific content safety and developmental sensitivity with security-grade controls for adversarial misuse, including prompt injection and multimodal jailbreak pathways. The framework specifies measurable evaluation targets (e.g., harmful-content avoidance, age-calibrated readability, bias parity checks, prompt-injection robustness, and monitoring transparency) to support developers, educators, and policymakers in assessing and improving child-safe LLM deployments.

CYMay 12
LLM Harms: A Taxonomy and Discussion

Kevin Chen, Saleh Afroogh, Abhejay Murali et al.

This study addresses categories of harm surrounding Large Language Models (LLMs) in the field of artificial intelligence. It addresses five categories of harms addressed before, during, and after development of AI applications: pre-development, direct output, Misuse and Malicious Application, and downstream application. By underscoring the need to define risks of the current landscape to ensure accountability, transparency and navigating bias when adapting LLMs for practical applications. It proposes mitigation strategies and future directions for specific domains and a dynamic auditing system guiding responsible development and integration of LLMs in a standardized proposal.

CLMay 13
AERIC: Anticipatory Hidden-State Monitoring for Implicit Harmful Dialogue

Jihyung Park, Saleh Afroogh, Junfeng Jiao

Current language models create two safety challenges: risk must be detected early enough to avoid exposing harmful continuation, and the harmfulness itself may be implicit rather than signaled by overtly toxic text. Existing response-level guards are strong at judging completed text, and native streaming guards move closer to token time, but both settings leave open whether a lightweight monitor can anticipate implicit harmful drift from the generator's own internal trajectory. We study anticipatory same-pass monitoring, where a safety monitor may read hidden states produced during ordinary decoding but may not invoke an additional forward pass through the base model. We introduce AERIC, a transfer-oriented hidden-state approach for implicit harmful dialogue that combines short-horizon hazard forecasting, support-sensitive suppression, and prompt-conditioned residual scoring under a same-pass exponential moving average decision rule. The default linear monitor contains only 387 trainable head parameters. Against Qwen3GuardStream-4B on balanced benchmarks, AERIC improves AUROC from 0.6830 to 0.7143 on DiaSafety and from 0.8219 to 0.8582 on Harmful Advice. For promptlevel trigger benchmarks, we calibrate the AERIC threshold by a source-side safe-budget rule that maximizes trigger coverage while constraining the safe-trigger rate to at most 10%. Under that rule, trigger@64 reaches 0.6438 and 0.4656 on HarmBench DirectRequest and 0.6849 and 0.7363 on SocialHarmBench for Qwen and Gemma, respectively, withholding between 23.53 and 41.86 answer tokens on average. Same-pass deployment is also efficient: on a 63-prompt harmfulprompt fixed-generation benchmark aggregated over HarmBench DirectRequest and SocialHarmBench under Qwen3-8B, the monitor increases mean latency by only 2.34%, whereas Qwen3Guard-Stream-4B increases it by 79.40%.

CYMar 12, 2024
Trust in AI: Progress, Challenges, and Future Directions

Saleh Afroogh, Ali Akbari, Evan Malone et al.

The increasing use of artificial intelligence (AI) systems in our daily life through various applications, services, and products explains the significance of trust/distrust in AI from a user perspective. AI-driven systems (as opposed to other technologies) have ubiquitously diffused in our life not only as some beneficial tools to be used by human agents but also are going to be substitutive agents on our behalf, or manipulative minds that would influence human thought, decision, and agency. Trust/distrust in AI plays the role of a regulator and could significantly control the level of this diffusion, as trust can increase, and distrust may reduce the rate of adoption of AI. Recently, varieties of studies have paid attention to the variant dimension of trust/distrust in AI, and its relevant considerations. In this systematic literature review, after conceptualization of trust in the current AI literature review, we will investigate trust in different types of human-Machine interaction, and its impact on technology acceptance in different domains. In addition to that, we propose a taxonomy of technical (i.e., safety, accuracy, robustness) and non-technical axiological (i.e., ethical, legal, and mixed) trustworthiness metrics, and some trustworthy measurements. Moreover, we examine some major trust-breakers in AI (e.g., autonomy and dignity threat), and trust makers; and propose some future directions and probable solutions for the transition to a trustworthy AI.

CYMay 14, 2024
Navigating LLM Ethics: Advancements, Challenges, and Future Directions

Junfeng Jiao, Saleh Afroogh, Yiming Xu et al.

This study addresses ethical issues surrounding Large Language Models (LLMs) within the field of artificial intelligence. It explores the common ethical challenges posed by both LLMs and other AI systems, such as privacy and fairness, as well as ethical challenges uniquely arising from LLMs. It highlights challenges such as hallucination, verifiable accountability, and decoding censorship complexity, which are unique to LLMs and distinct from those encountered in traditional AI systems. The study underscores the need to tackle these complexities to ensure accountability, reduce biases, and enhance transparency in the influential role that LLMs play in shaping information dissemination. It proposes mitigation strategies and future directions for LLM ethics, advocating for interdisciplinary collaboration. It recommends ethical frameworks tailored to specific domains and dynamic auditing systems adapted to diverse contexts. This roadmap aims to guide responsible development and integration of LLMs, envisioning a future where ethical considerations govern AI advancements in society.

HCFeb 8
AI Empathy Erodes Cognitive Autonomy in Younger Users

Junfeng Jiao, Abhejay Murali, Saleh Afroogh

Affective alignment in generative AI represents a systemic risk to the developmental autonomy of younger users. Although emotional mirroring is commonly seen as a hallmark of advanced human-machine interaction, it can also manifest as affective sycophancy, reinforcing a user's immediate emotional state. By providing a sense of objectivity to transient anxieties, these systems diminish the cognitive friction necessary for independent emotional management and critical thought. Reward models driven by RLHF could heighten this dilemma by embedding adult-focused definitions of helpfulness, unintentionally promoting emotional dependency in younger users rather than facilitating cognitive reappraisal. This paper exposes the misalignment between adult-labeled reward signals and the developmental requirements of younger users, proposing stoic architectures that emphasize functional neutrality to preserve user autonomy.

CLDec 5, 2025
Do You Feel Comfortable? Detecting Hidden Conversational Escalation in AI Chatbots

Jihyung Park, Saleh Afroogh, David Atkinson et al.

Large Language Models (LLM) are increasingly integrated into everyday interactions, serving not only as information assistants but also as emotional companions. Even in the absence of explicit toxicity, repeated emotional reinforcement or affective drift can gradually escalate distress in a form of \textit{implicit harm} that traditional toxicity filters fail to detect. Existing guardrail mechanisms often rely on external classifiers or clinical rubrics that may lag behind the nuanced, real-time dynamics of a developing conversation. To address this gap, we propose GAUGE (Guarding Affective Utterance Generation Escalation), logit-based framework for the real-time detection of hidden conversational escalation. GAUGE measures how an LLM's output probabilistically shifts the affective state of a dialogue.

CLJan 3, 2025
AGGA: A Dataset of Academic Guidelines for Generative AI and Large Language Models

Junfeng Jiao, Saleh Afroogh, Kevin Chen et al.

This study introduces AGGA, a dataset comprising 80 academic guidelines for the use of Generative AIs (GAIs) and Large Language Models (LLMs) in academic settings, meticulously collected from official university websites. The dataset contains 188,674 words and serves as a valuable resource for natural language processing tasks commonly applied in requirements engineering, such as model synthesis, abstraction identification, and document structure assessment. Additionally, AGGA can be further annotated to function as a benchmark for various tasks, including ambiguity detection, requirements categorization, and the identification of equivalent requirements. Our methodologically rigorous approach ensured a thorough examination, with a selection of universities that represent a diverse range of global institutions, including top-ranked universities across six continents. The dataset captures perspectives from a variety of academic fields, including humanities, technology, and both public and private institutions, offering a broad spectrum of insights into the integration of GAIs and LLMs in academia.