The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models
This work addresses the problem of unclear paradigms for LLM alignment, offering a conceptual tool for researchers and practitioners to improve transparency and evaluation, though it is incremental in nature.
The paper tackles the vague concept of 'alignment' in large language models by analyzing it as an 'empty signifier' from socio-political theory, proposing a framework to clarify how alignment is operationalized in datasets through dimensions of behavior and definitions.
In this paper, we address the concept of "alignment" in large language models (LLMs) through the lens of post-structuralist socio-political theory, specifically examining its parallels to empty signifiers. To establish a shared vocabulary around how abstract concepts of alignment are operationalised in empirical datasets, we propose a framework that demarcates: 1) which dimensions of model behaviour are considered important, then 2) how meanings and definitions are ascribed to these dimensions, and by whom. We situate existing empirical literature and provide guidance on deciding which paradigm to follow. Through this framework, we aim to foster a culture of transparency and critical evaluation, aiding the community in navigating the complexities of aligning LLMs with human populations.