SEJun 3

Beyond Single-Policy: Evaluating Composed Organization-Specific Policy Alignment in LLM Chatbots

arXiv:2606.0439474.1
Predicted impact top 21% in SE · last 90 daysOriginality Incremental advance
AI Analysis

For organizations deploying LLM chatbots in high-stakes domains, this work provides a method to detect composed-policy failures, which are common but previously unaddressed.

The paper identifies that composed-policy violations are prevalent in LLM chatbots but overlooked by existing benchmarks, and introduces COPAL, an automated tool that generates queries to evaluate such alignment. Across 9 models, COPAL queries yield a 33.1% average error rate, highlighting a significant gap in policy alignment.

Large language model chatbots are increasingly deployed in organizational settings such as healthcare, finance, and public services. Evaluating policy alignment is therefore critical to reliable chatbot deployment. By analyzing real-world user queries, we identify composed-policy violation is prevalent in various chatbots but overlooked by existing benchmarks. This paper present COPAL, an automated tool for evaluating composed-policy alignment in chatbots. COPAL efficiently generates queries that trigger composed-policy failures in chatbots via empirically derived interaction patterns and explicit handling contracts. Queries generated by COPAL expose substantial query handling failures: across 9 served models, composed-policy queries yield a 33.1% error rate on average, indicating that composed-policy alignment warrants further investigation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes