Shaanan Cohney

AI
3papers
11citations
Novelty42%
AI Score43

3 Papers

84.9AIJun 4Code
From Risk Classification to Action Plan Remediation: A Guardrail Feedback Driven Framework for LLM Agents

Yuhao Sun, Jiacheng Zhang, Shaanan Cohney et al.

LLM-based guardrails typically safeguard agents by evaluating proposed actions or inputs before execution, producing safety signals such as binary allow/deny decisions, risk categories, and/or explanatory rationales about potential policy violations. However, agent risks often arise when otherwise benign tasks are contaminated by untrusted external content, unsafe instructions, or risky tool use. Existing guardrails often flag the entire task uniformly as unsafe, thereby blocking the threat but sacrificing the benign part. Moreover, existing work largely evaluates guardrails in isolation, leaving unclear whether their interventions lead to safer downstream agent behavior. To address this, we introduce TRIAD (Tripartite Response for Iterative Agent Guardrailing), a guardrail-integrated agent framework that leverages guardrail-generated verbal feedback as a guiding signal to keep the agent aligned with benign objectives at each planning step. We finetune a language model on a self-curated training dataset to output one of three decisions: proceed, refuse, or update, together with structured natural-language feedback. Rather than merely allowing or blocking execution, update guides the agent to revise its plan, avoid harmful components, and preserve the benign task where possible. TRIAD injects this feedback into the agent's context, enabling subsequent plan revision and forming a closed loop between guardrail feedback and agent planning. Extensive experiments on ASB and AgentHarm show that TRIAD reduces the average attack success rate to 10.42%, while achieving the best safety-utility trade-off among guardrail-integrated baselines. Our code is available at: https://github.com/YUHAOSUNABC/TRIAD.

16.7HCMay 1
From Phreaking to Sneaking: Children's Circumvention of Social Media Age Verification Systems

Bjorn Nansen, Helena Sandberg, Lauren Bliss et al.

Australia's social media ban is now in force. It requires platforms to take reasonable steps to stop users under 16 from holding accounts. Drawing on five focus groups with fifteen young people aged 12--16, this paper examines how children understood the ban's effectiveness, impact, and legitimacy as they encountered the platforms charged with enforcing it. Participants widely saw the ban as unfair and ineffective. Through platform access controls, they learned how the ban worked, where it failed, and how they and their peers could evade it. We also asked participants to imagine better approaches to age verification and youth digital governance. This paper develops sneaking as a theoretical lens for these practices. The concept names more than evasion: it captures the social encounter between children, platforms, techno-regulation, and the access controls that mediate digital participation. Our findings show that children are not passive subjects of platform regulation. They interpret, test, and negotiate digital infrastructure. They also expose a central weakness in age-based platform regulation: technological controls struggle to solve the social and governance problems they are asked to contain.

CRDec 10, 2020
Virtual Classrooms and Real Harms: Remote Learning at U.S. Universities

Shaanan Cohney, Ross Teixeira, Anne Kohlbrenner et al.

Universities have been forced to rely on remote educational technology to facilitate the rapid shift to online learning. In doing so, they acquire new risks of security vulnerabilities and privacy violations. To help universities navigate this landscape, we develop a model that describes the actors, incentives, and risks, informed by surveying 49 educators and 14 administrators at U.S. universities. Next, we develop a methodology for administrators to assess security and privacy risks of these products. We then conduct a privacy and security analysis of 23 popular platforms using a combination of sociological analyses of privacy policies and 129 state laws, alongside a technical assessment of platform software. Based on our findings, we develop recommendations for universities to mitigate the risks to their stakeholders.