AIJul 4, 2025

Effects of structure on reasoning in instance-level Self-Discover

arXiv:2507.03347v1h-index: 2Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of optimizing reasoning methods for LLMs in compound systems, showing that unstructured formats can be more effective, which is incremental but challenges current practices.

The paper tackled the performance trade-offs between structured and unstructured reasoning in LLMs by introducing iSelf-Discover, finding that unstructured reasoning consistently outperformed structured approaches, with up to 18.90% relative improvement on the MATH benchmark.

The drive for predictable LLM reasoning in their integration with compound systems has popularized structured outputs, yet concerns remain about performance trade-offs compared to unconstrained natural language. At the same time, training on unconstrained Chain of Thought (CoT) traces has brought about a new class of strong reasoning models that nevertheless present novel compute budget and faithfulness challenges. This paper introduces iSelf-Discover, an instance-level adaptation of the Self-Discover framework, and using it compares dynamically generated structured JSON reasoning with its unstructured counterpart. Our empirical evaluation across diverse benchmarks using state-of-the-art open-source models supports a consistent advantage for unstructured reasoning. Notably, on the complex MATH benchmark, unstructured plans achieved relative performance improvements of up to 18.90\% over structured approaches. Zero-shot unstructured iSelf-Discover variants are also shown to outperform their five-shot structured counterparts, underscoring the significance of this gap, even when structured plans are dynamically generated to ensure reasoning precedes the final answer. We further demonstrate that the optimal granularity of plan generation (instance-level vs. task-level) is context-dependent. These findings invite re-evaluation of the reliance on structured formats for complex problem-solving and how compound systems should be organized.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes