CLJan 11, 2025

Scaling Down Semantic Leakage: Investigating Associative Bias in Smaller Language Models

arXiv:2501.06638v12 citationsh-index: 1Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of controlling undesirable biases in language models for AI safety and fairness, but it is incremental as it extends prior research to smaller models.

The study investigated whether smaller language models (500M to 7B parameters) exhibit less semantic leakage—unexpected associations from training data—than larger models, finding that smaller models generally show reduced leakage, though the trend is not strictly linear.

Semantic leakage is a phenomenon recently introduced by Gonen et al. (2024). It refers to a situation in which associations learnt from the training data emerge in language model generations in an unexpected and sometimes undesired way. Prior work has focused on leakage in large language models (7B+ parameters). In this study, I use Qwen2.5 model family to explore whether smaller models, ranging from 500M to 7B parameters, demonstrate less semantic leakage due to their limited capacity for capturing complex associations. Building on the previous dataset from Gonen et al. (2024), I introduce a new dataset of color-focused prompts, categorized into specific types of semantic associations, to systematically evaluate the models' performance. Results indicate that smaller models exhibit less semantic leakage overall, although this trend is not strictly linear, with medium-sized models sometimes surpassing larger ones in leaking behavior. The dataset, the model generations, and the evaluation code are publicly available at https://github.com/smilni/semantic_leakage_project.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes