AIJan 28

Endogenous Reprompting: Self-Evolving Cognitive Alignment for Unified Multimodal Models

arXiv:2601.20305v11 citationsh-index: 6
Originality Incremental advance
AI Analysis

This addresses a key limitation in multimodal AI by enabling models to self-enhance generation, though it appears incremental as it builds on existing reinforcement learning and training frameworks.

The paper tackles the cognitive gap in Unified Multimodal Models where understanding fails to guide generation, proposing Endogenous Reprompting to transform understanding into generative reasoning, resulting in improved evaluation accuracy, reprompting efficiency, and generation quality over state-of-the-art baselines.

Unified Multimodal Models (UMMs) exhibit strong understanding, yet this capability often fails to effectively guide generation. We identify this as a Cognitive Gap: the model lacks the understanding of how to enhance its own generation process. To bridge this gap, we propose Endogenous Reprompting, a mechanism that transforms the model's understanding from a passive encoding process into an explicit generative reasoning step by generating self-aligned descriptors during generation. To achieve this, we introduce SEER (Self-Evolving Evaluator and Reprompter), a training framework that establishes a two-stage endogenous loop using only 300 samples from a compact proxy task, Visual Instruction Elaboration. First, Reinforcement Learning with Verifiable Rewards (RLVR) activates the model's latent evaluation ability via curriculum learning, producing a high-fidelity endogenous reward signal. Second, Reinforcement Learning with Model-rewarded Thinking (RLMT) leverages this signal to optimize the generative reasoning policy. Experiments show that SEER consistently outperforms state-of-the-art baselines in evaluation accuracy, reprompting efficiency, and generation quality, without sacrificing general multimodal capabilities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes