Pheromone-based Learning of Optimal Reasoning Paths
This addresses the problem of inefficient reasoning path discovery in LLMs for complex tasks, offering a biologically inspired solution that is incremental but shows strong gains.
The paper tackles the challenge of discovering effective reasoning methods for complex problems by introducing ACO-ToT, which combines Ant Colony Optimization with LLMs to find optimal reasoning paths, achieving significant performance improvements on tasks like GSM8K, ARC-Challenge, and MATH compared to existing approaches.
Large Language Models (LLMs) have demonstrated remarkable reasoning capabilities through chain-of-thought prompting, yet discovering effective reasoning methods for complex problems remains challenging due to the vast space of possible intermediate steps. We introduce Ant Colony Optimization-guided Tree of Thought (ACO-ToT), a novel algorithm that combines ACO with LLMs to discover optimal reasoning paths for complex problems efficiently. Drawing inspiration from Hebbian learning in neurological systems, our method employs a collection of distinctly fine-tuned LLM "ants" to traverse and lay pheromone trails through a centralized tree of thought, with each ant's movement governed by a weighted combination of existing pheromone trails and its own specialized expertise. The algorithm evaluates complete reasoning paths using a mixture-of-experts-based scoring function, with pheromones reinforcing productive reasoning paths across iterations. Experiments on three challenging reasoning tasks (GSM8K, ARC-Challenge, and MATH) demonstrate that ACO-ToT performs significantly better than existing chain-of-thought optimization approaches, suggesting that incorporating biologically inspired collective search mechanisms into LLM inference can substantially enhance reasoning capabilities.