Expert-guided Clinical Text Augmentation via Query-Based Model Collaboration
This addresses safety concerns for healthcare applications where incorrect synthetic data could have serious consequences.
The paper tackles the problem of generating clinically incorrect information when using LLMs for data augmentation in healthcare by proposing a query-based model collaboration framework that integrates expert knowledge. Experiments show this approach outperforms existing LLM augmentation methods while reducing factual errors.
Data augmentation is a widely used strategy to improve model robustness and generalization by enriching training datasets with synthetic examples. While large language models (LLMs) have demonstrated strong generative capabilities for this purpose, their applications in high-stakes domains like healthcare present unique challenges due to the risk of generating clinically incorrect or misleading information. In this work, we propose a novel query-based model collaboration framework that integrates expert-level domain knowledge to guide the augmentation process to preserve critical medical information. Experiments on clinical prediction tasks demonstrate that our lightweight collaboration-based approach consistently outperforms existing LLM augmentation methods while improving safety through reduced factual errors. This framework addresses the gap between LLM augmentation potential and the safety requirements of specialized domains.