CLLGSep 23, 2025

Silent Tokens, Loud Effects: Padding in LLMs

arXiv:2510.01238v23 citationsh-index: 6Has Code
Originality Incremental advance
AI Analysis

This research highlights a robustness risk in LLM deployment that could affect developers and users by compromising model reliability and safety, though it is incremental in focusing on a specific implementation detail.

The study investigated how padding tokens, often assumed to be neutral, can inadvertently affect large language models (LLMs) by shifting activations, degrading generation quality, altering bias, and weakening safety measures, with small amounts causing measurable impacts across models like Llama, Gemma, and Qwen.

Padding tokens are widely used in large language models (LLMs) to equalize sequence lengths during batched inference. While they should be fully masked, implementation errors can cause them to influence computation, and the extent of this influence is not well understood. We systematically study this effect across three open-source model families (Llama, Gemma, Qwen), inserting controlled amounts of padding and evaluating outcomes along four axes: activations, generation quality, bias, and safety. Even small amounts of padding shift hidden representations, degrade quality in smaller models, alter bias in unpredictable ways, and weaken safety guardrails. These findings demonstrate that padding is not a harmless detail but a robustness risk that must be carefully handled in deployment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes