CLNov 27, 2024Code
Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTSJinyang Wu, Mingkuan Feng, Shuai Zhang et al.
In-context learning (ICL) enables large language models (LLMs) to perform downstream tasks through advanced prompting and high-quality demonstrations. However, traditional ICL paradigms encounter significant limitations in complex reasoning tasks, stemming primarily from their dependence on example quality and absence of explicit reasoning guidance. To address these challenges, we introduce HiAR-ICL, a **Hi**gh-level **A**utomated **R**easoning paradigm in **ICL** that shifts focus from specific examples to abstract reasoning patterns, thereby extending the conventional concept of "context" in ICL. Our approach begins by defining five atomic reasoning actions, upon which we employ Monte Carlo Tree Search to systematically construct high-level reasoning patterns. During inference, HiAR-ICL dynamically selects appropriate reasoning patterns based on problem attributes, providing explicit guidance for the model's reasoning process. Experiments demonstrate HiAR-ICL's effectiveness and efficiency: utilizing only 200 prior samples with Qwen2.5-7B-Instruct, our method achieves 80.6% accuracy on MATH and 62.5% on AMC, exceeding GPT-4o's 77.2% and 57.5%. Our approach enhances performance across models of varying sizes while generalizing effectively across domains. Further analysis reveals that HiAR-ICL can also serve as a plug-and-play inference method compatible with post-training techniques like GRPO. Code and data are available at https://github.com/jinyangwu/HiARICL.
CLFeb 4, 2025
Boosting Multimodal Reasoning with Automated Structured ThinkingJinyang Wu, Mingkuan Feng, Shuai Zhang et al.
Multimodal large language models excel across diverse domains but struggle with complex visual reasoning tasks. Current approaches aim to incorporate structured thinking via two strategies: explicit search methods and post-training techniques. However, both approaches face significant limitations: Search-based methods suffer from computational inefficiency due to extensive solution space exploration, while post-training methods require substantial data, computational resources, and often encounter training instability. To address these limitations, we propose AStar, an \textbf{A}utomated \textbf{S}tructured \textbf{t}hinking paradigm for multimod\textbf{a}l \textbf{r}easoning. Our method introduces "thought cards", a lightweight library of high-level reasoning patterns abstracted from 500 prior samples using Monte Carlo Tree Search. For each test problem, AStar adaptively retrieves the optimal thought cards and seamlessly integrates these external explicit guidelines with the model's internal implicit reasoning capabilities. Extensive experiments demonstrate AStar's effectiveness and efficiency: using only 500 prior samples and a 7B backbone, our training-free framework achieves 53.9$\%$ accuracy on MathVerse (surpassing GPT-4o's 50.2%) and 32.7% on MathVision (versus GPT-4o's 30.4%). Further analysis reveals that AStar generalizes beyond multimodal reasoning to visual perception and understanding domains, and serves as a plug-and-play test-time inference method compatible with mainstream post-training techniques like GRPO.