Explaining Puzzle Solutions in Natural Language: An Exploratory Study on 6x6 Sudoku
This highlights challenges for LLMs in human-AI collaboration, where clear explanations are crucial, but the work is incremental as it builds on existing puzzle-solving benchmarks.
The study evaluated five large language models (LLMs) on solving and explaining 6x6 Sudoku puzzles, finding that while one model had limited success in solving, none could provide explanations reflecting strategic reasoning or intuitive problem-solving.
The success of Large Language Models (LLMs) in human-AI collaborative decision-making hinges on their ability to provide trustworthy, gradual, and tailored explanations. Solving complex puzzles, such as Sudoku, offers a canonical example of this collaboration, where clear and customized explanations often hold greater importance than the final solution. In this study, we evaluate the performance of five LLMs in solving and explaining \sixsix{} Sudoku puzzles. While one LLM demonstrates limited success in solving puzzles, none can explain the solution process in a manner that reflects strategic reasoning or intuitive problem-solving. These findings underscore significant challenges that must be addressed before LLMs can become effective partners in human-AI collaborative decision-making.