Reward Engineering for Generating Semi-structured Explanation
This work addresses the problem of improving explanation generation for verifying model reasoning, which is incremental as it builds on existing RL methods with tailored rewards.
The paper tackled the challenge of generating semi-structured explanations to verify reasoning in language models, particularly for not-so-large models like FLAN-T5-XXL, by introducing a reward engineering method in reinforcement learning that achieved new state-of-the-art results on benchmarks such as ExplaGraph and COPA-SSE.
Semi-structured explanation depicts the implicit process of a reasoner with an explicit representation. This explanation highlights how available information in a specific query is utilised and supplemented with information a reasoner produces from its internal weights towards generating an answer. Despite the recent improvements in generative capabilities of language models, producing structured explanations to verify a model's true reasoning capabilities remains a challenge. This issue is particularly pronounced for not-so-large LMs (e.g., FLAN-T5-XXL). In this work, we first underscore the limitations of supervised fine-tuning (SFT) in tackling this challenge, and then introduce a carefully crafted reward engineering method in reinforcement learning (RL) to better address this problem. We investigate multiple reward aggregation methods and provide a detailed discussion which sheds light on the promising potential of RL for future research. Our proposed method on two semi-structured explanation generation benchmarks (ExplaGraph and COPA-SSE) achieves new state-of-the-art results.