Review, Remask, Refine (R3): Process-Guided Block Diffusion for Text Generation
This work addresses error correction in text generation for users of pre-trained masked diffusion models, but it is incremental as it builds on existing methods without introducing new training.
The paper tackles the challenge of enabling iterative text generation models to efficiently identify and correct their own errors by proposing the Review, Remask, Refine (R3) framework, which uses a Process Reward Model to guide targeted remasking and refinement, resulting in improved final output without requiring additional model training.
A key challenge for iterative text generation is enabling models to efficiently identify and correct their own errors. We propose Review, Remask, Refine (R3), a relatively simple yet elegant framework that requires no additional model training and can be applied to any pre-trained masked text diffusion model (e.g., LLaDA or BD3-LM). In R3, a Process Reward Model (PRM) is utilized for the Review of intermediate generated blocks. The framework then translates these PRM scores into a Remask strategy: the lower a block's PRM score, indicating potential mistakes, the greater the proportion of tokens within that block are remasked. Finally, the model is compelled to Refine these targeted segments, focusing its efforts more intensively on specific sub-optimal parts of past generations, leading to improved final output.