CLAIOct 16, 2023

NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails

arXiv:2310.10501v1369 citationsh-index: 20Has Code
Originality Incremental advance
AI Analysis

This toolkit addresses the need for controllable and safe LLM applications for developers and providers, offering a novel runtime approach that is incremental in building upon existing dialogue management techniques.

The paper tackles the problem of controlling and ensuring safety in LLM-based conversational systems by introducing NeMo Guardrails, an open-source toolkit that allows developers to add programmable, interpretable guardrails independent of the underlying LLM, with initial results demonstrating its applicability across multiple LLM providers.

NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems. Guardrails (or rails for short) are a specific way of controlling the output of an LLM, such as not talking about topics considered harmful, following a predefined dialogue path, using a particular language style, and more. There are several mechanisms that allow LLM providers and developers to add guardrails that are embedded into a specific model at training, e.g. using model alignment. Differently, using a runtime inspired from dialogue management, NeMo Guardrails allows developers to add programmable rails to LLM applications - these are user-defined, independent of the underlying LLM, and interpretable. Our initial results show that the proposed approach can be used with several LLM providers to develop controllable and safe LLM applications using programmable rails.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes