Towards Stepwise Domain Knowledge-Driven Reasoning Optimization and Reflection Improvement
This work addresses the problem of enhancing logical reasoning for domain-specific tasks like legal analysis, though it appears incremental as it builds on existing CoT and MCTS methods.
The paper tackles the challenge of applying stepwise supervision and Monte Carlo Tree Search (MCTS) to tasks requiring domain-specific knowledge, such as legal problems, by proposing a framework for reasoning optimization and reflection improvement, with empirical results showing effectiveness on legal-domain tasks.
Recently, stepwise supervision on Chain of Thoughts (CoTs) presents an enhancement on the logical reasoning tasks such as coding and math, with the help of Monte Carlo Tree Search (MCTS). However, its contribution to tasks requiring domain-specific expertise and knowledge remains unexplored. Motivated by the interest, we identify several potential challenges of vanilla MCTS within this context, and propose the framework of Stepwise Domain Knowledge-Driven Reasoning Optimization, employing the MCTS algorithm to develop step-level supervision for problems that require essential comprehension, reasoning, and specialized knowledge. Additionally, we also introduce the Preference Optimization towards Reflection Paths, which iteratively learns self-reflection on the reasoning thoughts from better perspectives. We have conducted extensive experiments to evaluate the advantage of the methodologies. Empirical results demonstrate the effectiveness on various legal-domain problems. We also report a diverse set of valuable findings, hoping to encourage the enthusiasm to the research of domain-specific LLMs and MCTS.