AISEMar 18, 2025

Inference-Time Intervention in Large Language Models for Reliable Requirement Verification

arXiv:2503.14130v1h-index: 2
Originality Incremental advance
AI Analysis

This work addresses the problem of reliable requirement verification for engineers in systems engineering, offering a dynamic control method that is incremental over existing fine-tuning and prompting approaches.

The paper tackled the challenge of steering Large Language Models (LLMs) for precise control in engineering applications by using inference-time intervention techniques to automate requirement verification in Model-Based Systems Engineering (MBSE). The result was robust and reliable outputs, achieving perfect precision on a holdout test set when combined with self-consistency.

Steering the behavior of Large Language Models (LLMs) remains a challenge, particularly in engineering applications where precision and reliability are critical. While fine-tuning and prompting methods can modify model behavior, they lack the dynamic and exact control necessary for engineering applications. Inference-time intervention techniques provide a promising alternative, allowing targeted adjustments to LLM outputs. In this work, we demonstrate how interventions enable fine-grained control for automating the usually time-intensive requirement verification process in Model-Based Systems Engineering (MBSE). Using two early-stage Capella SysML models of space missions with associated requirements, we apply the intervened LLMs to reason over a graph representation of the model to determine whether a requirement is fulfilled. Our method achieves robust and reliable outputs, significantly improving over both a baseline model and a fine-tuning approach. By identifying and modifying as few as one to three specialised attention heads, we can significantly change the model's behavior. When combined with self-consistency, this allows us to achieve perfect precision on our holdout test set.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes