Structured Prompt Optimization Meets Reinforcement Learning for Global and Local Interpretability over Complex Text
For NLP practitioners needing interpretable yet high-performing text classifiers, eXTC offers a solution that balances accuracy, local reasoning traces, and global rule-based explanations.
eXTC introduces a three-stage framework combining structured prompt optimization, SOP-grounded reasoning distillation, and reinforcement learning to build an explainable text classifier that outperforms existing paradigms in both classification performance and explanation quality across diverse benchmarks.
LLMs have advanced text classification, yet existing paradigms face a trade-off: supervised (label only) fine-tuning is scalable but offers limited reasoning on complex text and lacks broader model transparency, while discrete prompt optimization offers human-readable instructions but struggles with performance and scalability. We introduce eXTC (eXplainable Text Classifier) with three progressive stages: (1) learning a Standard Operating Procedure (SOP, or rulebook) in natural language via a new Structured Prompt Optimization algorithm; (2) SOP-grounded reasoning distillation from a large teacher LLM into a compact LM; and (3) expanding reasoning capabilities beyond the initial SOP via reinforcement learning. This design enables eXTC to provide (i) fast inference via a compact LM, with (ii) inference-time local reasoning traces, alongside a global, modular explanation of its learned domain rules, while (iii) significantly outperforming existing paradigms across diverse benchmarks in both classification performance and explanation quality, with stage-by-stage gains.