Scalable multilingual PII annotation for responsible AI in LLMs
This work addresses the need for responsible AI in LLMs by improving PII annotation for underrepresented locales, though it is incremental as it builds on existing annotation methodologies with a focus on scalability and quality.
The paper tackles the problem of ensuring reliable handling of Personally Identifiable Information (PII) in Large Language Models across diverse regulatory contexts by introducing a scalable multilingual data curation framework for high-quality PII annotation across 13 underrepresented locales, resulting in substantial improvements in recall and false positive rates from pilot, training, and production phases.
As Large Language Models (LLMs) gain wider adoption, ensuring their reliable handling of Personally Identifiable Information (PII) across diverse regulatory contexts has become essential. This work introduces a scalable multilingual data curation framework designed for high-quality PII annotation across 13 underrepresented locales, covering approximately 336 locale-specific PII types. Our phased, human-in-the-loop annotation methodology combines linguistic expertise with rigorous quality assurance, leading to substantial improvements in recall and false positive rates from pilot, training, and production phases. By leveraging inter-annotator agreement metrics and root-cause analysis, the framework systematically uncovers and resolves annotation inconsistencies, resulting in high-fidelity datasets suitable for supervised LLM fine-tuning. Beyond reporting empirical gains, we highlight common annotator challenges in multilingual PII labeling and demonstrate how iterative, analytics-driven pipelines can enhance both annotation quality and downstream model reliability.