Neuro-Symbolic Verification on Instruction Following of LLMs
This addresses the critical issue of unreliable instruction-following in LLMs for applications like agentic workflows, where violations can cause failures, though it is an incremental improvement over existing verification methods.
The paper tackles the problem of verifying whether Large Language Models (LLMs) follow instructions by introducing NSVIF, a neuro-symbolic framework that models instructions as constraints and solves them with a unified solver. Experiments show NSVIF significantly outperforms LLM-based approaches on a new benchmark, VIFBENCH, and provides interpretable feedback that improves LLMs' instruction-following without post-training.
A fundamental problem of applying Large Language Models (LLMs) to important applications is that LLMs do not always follow instructions, and violations are often hard to observe or check. In LLM-based agentic workflows, such violations can propagate and amplify along reasoning chains, causing task failures and system incidents. This paper presents NSVIF, a neuro-symbolic framework for verifying whether an LLM's output follows the instructions used to prompt the LLM. NSVIF is a universal, general-purpose verifier; it makes no assumption about the instruction or the LLM. NSVIF formulates instruction-following verification as a constraint-satisfaction problem by modeling user instructions as constraints. NSVIF models both logical and semantic constraints; constraint solving is done by a unified solver that orchestrates logical reasoning and semantic analysis. To evaluate NSVIF, we develop VIFBENCH, a new benchmark for instruction-following verifiers with fine-grained data labels. Experiments show that NSVIF significantly outperforms LLM-based approaches and provides interpretable feedback. We also show that feedback from NSVIF helps improve LLMs' instruction-following capability without post-training.