CVCLDec 10, 2025

Diffusion Is Your Friend in Show, Suggest and Tell

arXiv:2512.10038v1h-index: 15Has Code
Originality Highly original
AI Analysis

This work addresses a bottleneck in generative AI for tasks like image captioning, offering a novel hybrid method that improves performance incrementally over existing approaches.

The paper tackles the problem of diffusion models underperforming autoregressive models in discrete domains by proposing a hybrid approach that uses diffusion models to suggest improvements to autoregressive generation, achieving state-of-the-art results with 125.1 CIDEr-D on COCO, outperforming existing methods by 1.5 to 2.5 points.

Diffusion Denoising models demonstrated impressive results across generative Computer Vision tasks, but they still fail to outperform standard autoregressive solutions in the discrete domain, and only match them at best. In this work, we propose a different paradigm by adopting diffusion models to provide suggestions to the autoregressive generation rather than replacing them. By doing so, we combine the bidirectional and refining capabilities of the former with the strong linguistic structure provided by the latter. To showcase its effectiveness, we present Show, Suggest and Tell (SST), which achieves State-of-the-Art results on COCO, among models in a similar setting. In particular, SST achieves 125.1 CIDEr-D on the COCO dataset without Reinforcement Learning, outperforming both autoregressive and diffusion model State-of-the-Art results by 1.5 and 2.5 points. On top of the strong results, we performed extensive experiments to validate the proposal and analyze the impact of the suggestion module. Results demonstrate a positive correlation between suggestion and caption quality, overall indicating a currently underexplored but promising research direction. Code will be available at: https://github.com/jchenghu/show\_suggest\_tell.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes