CLOct 9, 2025

Role-Conditioned Refusals: Evaluating Access Control Reasoning in Large Language Models

arXiv:2510.07642v13 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses the problem of secure computing for users relying on LLMs for access control, but it is incremental as it builds on existing datasets and methods.

The paper tackled the problem of large language models (LLMs) producing unrestricted responses that blur role boundaries in access control, by evaluating role-conditioned refusals to adhere to policies. The result showed that explicit verification improved refusal precision and lowered false permits, while fine-tuning balanced safety and utility, with longer and more complex policies reducing reliability across all systems.

Access control is a cornerstone of secure computing, yet large language models often blur role boundaries by producing unrestricted responses. We study role-conditioned refusals, focusing on the LLM's ability to adhere to access control policies by answering when authorized and refusing when not. To evaluate this behavior, we created a novel dataset that extends the Spider and BIRD text-to-SQL datasets, both of which have been modified with realistic PostgreSQL role-based policies at the table and column levels. We compare three designs: (i) zero or few-shot prompting, (ii) a two-step generator-verifier pipeline that checks SQL against policy, and (iii) LoRA fine-tuned models that learn permission awareness directly. Across multiple model families, explicit verification (the two-step framework) improves refusal precision and lowers false permits. At the same time, fine-tuning achieves a stronger balance between safety and utility (i.e., when considering execution accuracy). Longer and more complex policies consistently reduce the reliability of all systems. We release RBAC-augmented datasets and code.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes