CL AIOct 15, 2025

Protect: Towards Robust Guardrailing Stack for Trustworthy Enterprise LLM Systems

Karthik Avinash, Nikhil Pareek, Rishav Hada

arXiv:2510.13351v13 citationsh-index: 9

Originality Highly original

AI Analysis

This addresses the need for robust, multi-modal safety systems in regulated enterprise environments, representing a strong specific gain rather than a foundational advancement.

The paper tackled the problem of ensuring safety, reliability, and compliance in enterprise LLM systems by introducing Protect, a multi-modal guardrailing model that achieved state-of-the-art performance across toxicity, sexism, data privacy, and prompt injection dimensions, surpassing models like WildGuard, LlamaGuard-4, and GPT-4.1.

The increasing deployment of Large Language Models (LLMs) across enterprise and mission-critical domains has underscored the urgent need for robust guardrailing systems that ensure safety, reliability, and compliance. Existing solutions often struggle with real-time oversight, multi-modal data handling, and explainability -- limitations that hinder their adoption in regulated environments. Existing guardrails largely operate in isolation, focused on text alone making them inadequate for multi-modal, production-scale environments. We introduce Protect, natively multi-modal guardrailing model designed to operate seamlessly across text, image, and audio inputs, designed for enterprise-grade deployment. Protect integrates fine-tuned, category-specific adapters trained via Low-Rank Adaptation (LoRA) on an extensive, multi-modal dataset covering four safety dimensions: toxicity, sexism, data privacy, and prompt injection. Our teacher-assisted annotation pipeline leverages reasoning and explanation traces to generate high-fidelity, context-aware labels across modalities. Experimental results demonstrate state-of-the-art performance across all safety dimensions, surpassing existing open and proprietary models such as WildGuard, LlamaGuard-4, and GPT-4.1. Protect establishes a strong foundation for trustworthy, auditable, and production-ready safety systems capable of operating across text, image, and audio modalities.

View on arXiv PDF

Similar