AICYJan 5

COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs

arXiv:2601.01836v11 citationsh-index: 22
Originality Incremental advance
AI Analysis

This addresses the critical need for policy alignment in enterprise AI deployments, such as healthcare and finance, by introducing a systematic evaluation framework, though it is incremental as it builds on existing safety evaluation concepts.

The researchers tackled the problem of ensuring large language models adhere to organization-specific policies in high-stakes applications, revealing that models reliably handle legitimate requests (>95% accuracy) but catastrophically fail at enforcing prohibitions, refusing only 13-40% of adversarial denylist violations.

As large language models are deployed in high-stakes enterprise applications, from healthcare to finance, ensuring adherence to organization-specific policies has become essential. Yet existing safety evaluations focus exclusively on universal harms. We present COMPASS (Company/Organization Policy Alignment Assessment), the first systematic framework for evaluating whether LLMs comply with organizational allowlist and denylist policies. We apply COMPASS to eight diverse industry scenarios, generating and validating 5,920 queries that test both routine compliance and adversarial robustness through strategically designed edge cases. Evaluating seven state-of-the-art models, we uncover a fundamental asymmetry: models reliably handle legitimate requests (>95% accuracy) but catastrophically fail at enforcing prohibitions, refusing only 13-40% of adversarial denylist violations. These results demonstrate that current LLMs lack the robustness required for policy-critical deployments, establishing COMPASS as an essential evaluation framework for organizational AI safety.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes