LLMSR@XLLM25: An Empirical Study of LLM for Structural Reasoning
This work addresses the challenge of structural reasoning in AI, but it is incremental as it applies an existing method to a new shared task without fine-tuning or novel techniques.
The authors tackled the problem of evaluating large language models on producing fine-grained, controllable, and interpretable reasoning processes, achieving a 5th-place ranking with macro F1 scores comparable to more complex methods using only an off-the-shelf model and a few-shot prompt.
We present Team asdfo123's submission to the LLMSR@XLLM25 shared task, which evaluates large language models on producing fine-grained, controllable, and interpretable reasoning processes. Systems must extract all problem conditions, decompose a chain of thought into statement-evidence pairs, and verify the logical validity of each pair. Leveraging only the off-the-shelf Meta-Llama-3-8B-Instruct, we craft a concise few-shot, multi-turn prompt that first enumerates all conditions and then guides the model to label, cite, and adjudicate every reasoning step. A lightweight post-processor based on regular expressions normalises spans and enforces the official JSON schema. Without fine-tuning, external retrieval, or ensembling, our method ranks 5th overall, achieving macro F1 scores on par with substantially more complex and resource-consuming pipelines. We conclude by analysing the strengths and limitations of our approach and outlining directions for future research in structural reasoning with LLMs. Our code is available at https://github.com/asdfo123/LLMSR-asdfo123.