Controllable Mathematical Reasoning via Self-Optimizing Thought Vectors
This addresses the problem of guiding AI reasoning patterns without external annotations for researchers in mathematical AI, though it appears incremental as it builds on existing large language models.
The paper tackles controllable mathematical reasoning by introducing self-optimizing thought vectors with entropy minimization, achieving 90.1% accuracy and a controllability score of 0.42 on GSM8K using Gemma-2-9B.
We present a novel approach for controllable mathematical reasoning that leverages self-optimizing thought vectors with entropy minimization. Our method introduces learnable thought vectors that dynamically modulate the internal reasoning process of large language models. Using Gemma-2-9B on GSM8K, we achieve 90.1% accuracy with a controllability score of 0.42, demonstrating that entropy-based rewards effectively guide focused reasoning patterns without requiring external reward annotations. Our analysis reveals distinct thought vector clusters and consistent low-entropy distributions across control conditions, validating our framework for controllable AI reasoning.