LG AI CRJan 21, 2025

Adaptive PII Mitigation Framework for Large Language Models

Shubhi Asthana, Ruchi Mahindru, Bing Zhang, Jorge Sanz

arXiv:2501.12465v111.47 citationsh-index: 10

Originality Incremental advance

AI Analysis

This addresses privacy compliance problems for enterprises using LLMs, offering a scalable solution, though it appears incremental as it builds on existing NLP techniques for a specific regulatory bottleneck.

The paper tackles the challenge of ensuring regulatory compliance for Large Language Models (LLMs) regarding personal data protection under laws like GDPR and CCPA, by introducing an adaptive system for mitigating Personally Identifiable Information (PII) and Sensitive Personal Information (SPI). The system achieved an F1 score of 0.95 for Passport Numbers, outperforming existing tools, and an average user trust score of 4.6/5 in evaluations.

Artificial Intelligence (AI) faces growing challenges from evolving data protection laws and enforcement practices worldwide. Regulations like GDPR and CCPA impose strict compliance requirements on Machine Learning (ML) models, especially concerning personal data use. These laws grant individuals rights such as data correction and deletion, complicating the training and deployment of Large Language Models (LLMs) that rely on extensive datasets. Public data availability does not guarantee its lawful use for ML, amplifying these challenges. This paper introduces an adaptive system for mitigating risk of Personally Identifiable Information (PII) and Sensitive Personal Information (SPI) in LLMs. It dynamically aligns with diverse regulatory frameworks and integrates seamlessly into Governance, Risk, and Compliance (GRC) systems. The system uses advanced NLP techniques, context-aware analysis, and policy-driven masking to ensure regulatory compliance. Benchmarks highlight the system's effectiveness, with an F1 score of 0.95 for Passport Numbers, outperforming tools like Microsoft Presidio (0.33) and Amazon Comprehend (0.54). In human evaluations, the system achieved an average user trust score of 4.6/5, with participants acknowledging its accuracy and transparency. Observations demonstrate stricter anonymization under GDPR compared to CCPA, which permits pseudonymization and user opt-outs. These results validate the system as a scalable and robust solution for enterprise privacy compliance.

View on arXiv PDF

Similar