Concept Attractors in LLMs and their Applications
This provides an efficient, generalizable alternative to fine-tuning for practical LLM applications, though it appears incremental as it builds on known representation patterns.
The paper tackles the problem of LLMs mapping semantically related prompts to similar internal representations by explaining this behavior through Iterated Function Systems with concept-specific attractors, and develops training-free methods using these attractors that match or exceed specialized baselines in tasks like translation and hallucination reduction.
Large language models (LLMs) often map semantically related prompts to similar internal representations at specific layers, even when their surface forms differ widely. We show that this behavior can be explained through Iterated Function Systems (IFS), where layers act as contractive mappings toward concept-specific Attractors. We leverage this insight and develop simple, training-free methods that operate directly on these Attractors to solve a wide range of practical tasks, including language translation, hallucination reduction, guardrailing, and synthetic data generation. Despite their simplicity, these Attractor-based interventions match or exceed specialized baselines, offering an efficient alternative to heavy fine-tuning, generalizable in scenarios where baselines underperform.