CLMay 12

StoicLLM: Preference Optimization for Philosophical Alignment in Small Language Models

Ishmam Khan, Sindhuja Thogarrati, Shuo Zhang

arXiv:2605.1148371.4

Predicted impact top 40% in CL · last 90 daysOriginality Synthesis-oriented

AI Analysis

For researchers in AI alignment and philosophy, this work demonstrates the feasibility of micro-dataset philosophical alignment while identifying a persistent representational bottleneck in small models.

The paper shows that preference optimization on just 300 Stoic text examples can align small language models with inward-facing Stoic virtues, but all models fail on outward-facing cosmopolitan duties, revealing a fundamental limitation of small models.

While large language models excel at factual adaptation, their ability to internalize nuanced philosophical frameworks under severe data constraints remains underexplored. We investigate this by specializing small LLMs on micro-datasets of foundational Stoic texts using preference optimization (ORPO, AlphaPO). Evaluated via a multi-model critic bank, our results show that just 300 high-fidelity examples can induce strong alignment with inward-facing Stoic virtues, closely approaching few-shot prompting while freeing the context window. Critically, however, all models, including few-shot baselines, exhibit a persistent failure on Stoicism's outward-facing cosmopolitan duties, pointing to a representational limitation of small models that micro-dataset adaptation alone cannot overcome.

View on arXiv PDF

Similar