CL AIAug 25, 2025

Backprompting: Leveraging Synthetic Production Data for Health Advice Guardrails

Kellen Tan Cheng, Anna Lisa Gentile, Chad DeLuca, Guang-Jie Ren

arXiv:2508.18384v11 citationsh-index: 3

Originality Incremental advance

AI Analysis

This addresses the problem of data scarcity for enterprise LLM safety in health advice, though it is incremental as it builds on existing guardrail methods.

The paper tackles the challenge of acquiring production-quality labeled data for developing guardrails to filter LLM outputs, particularly for health advice, by proposing backprompting to generate synthetic data and a sparse human-in-the-loop clustering technique, resulting in a detector that outperforms GPT-4o by up to 3.73% with 400x fewer parameters.

The pervasiveness of large language models (LLMs) in enterprise settings has also brought forth a significant amount of risks associated with their usage. Guardrails technologies aim to mitigate this risk by filtering LLMs' input/output text through various detectors. However, developing and maintaining robust detectors faces many challenges, one of which is the difficulty in acquiring production-quality labeled data on real LLM outputs prior to deployment. In this work, we propose backprompting, a simple yet intuitive solution to generate production-like labeled data for health advice guardrails development. Furthermore, we pair our backprompting method with a sparse human-in-the-loop clustering technique to label the generated data. Our aim is to construct a parallel corpus roughly representative of the original dataset yet resembling real LLM output. We then infuse existing datasets with our synthetic examples to produce robust training data for our detector. We test our technique in one of the most difficult and nuanced guardrails: the identification of health advice in LLM output, and demonstrate improvement versus other solutions. Our detector is able to outperform GPT-4o by up to 3.73%, despite having 400x less parameters.

View on arXiv PDF

Similar