CR LGApr 17

SafeLM: Unified Privacy-Aware Optimization for Trustworthy Federated Large Language Models

arXiv:2604.1660632.5h-index: 1

AI Analysis

Provides a unified framework for deploying trustworthy LLMs in high-stakes domains, addressing multiple safety challenges simultaneously.

SafeLM jointly addresses privacy, security, misinformation, and adversarial robustness in LLMs, achieving 98.0% harmful content detection accuracy, 96.9% communication reduction, and lowering gradient inversion PSNR from 31.7 dB to 15.1 dB.

Large language models (LLMs) are increasingly deployed in high-stakes domains, yet a unified treatment of their overlapping safety challenges remains lacking. We present SafeLM, a framework that jointly addresses four pillars of LLM safety: privacy, security, misinformation, and adversarial robustness. SafeLM combines federated training with gradient smartification and Paillier encryption for privacy, integrates defenses against training and inference-time attacks, employs contrastive grounding with calibrated decoding to reduce hallucinations, and introduces alignment-aware binarized aggregation to enhance robustness while maintaining bounded reconstruction quality. Across benchmarks on factuality, toxicity, and membership inference, SafeLM achieves 98.0% harmful content detection accuracy, reduces communication by 96.9%, and lowers gradient inversion PSNR from 31.7 dB to 15.1 dB. Ablations show that each component contributes independently, whereas their integration yields a strong privacy utility efficiency trade-off for deploying trustworthy LLMs.

View on arXiv PDF

Similar