Robust Planning with Compound LLM Architectures: An LLM-Modulo Approach
This addresses the robustness issue in LLM-based planning for applications requiring reliable outputs, though it is an incremental improvement over existing compound architectures.
The paper tackles the problem of unreliable LLM outputs in planning tasks by introducing the LLM-Modulo framework, which pairs an LLM with sound verifiers to guarantee correctness, resulting in significant performance gains across four scheduling domains.
Previous work has attempted to boost Large Language Model (LLM) performance on planning and scheduling tasks through a variety of prompt engineering techniques. While these methods can work within the distributions tested, they are neither robust nor predictable. This limitation can be addressed through compound LLM architectures where LLMs work in conjunction with other components to ensure reliability. In this paper, we present a technical evaluation of a compound LLM architecture--the LLM-Modulo framework. In this framework, an LLM is paired with a complete set of sound verifiers that validate its output, re-prompting it if it fails. This approach ensures that the system can never output any fallacious output, and therefore that every output generated is guaranteed correct--something previous techniques have not been able to claim. Our results, evaluated across four scheduling domains, demonstrate significant performance gains with the LLM-Modulo framework using various models. Additionally, we explore modifications to the base configuration of the framework and assess their impact on overall system performance.