CVFeb 2

Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image Generation

arXiv:2602.01756v17 citationsh-index: 12
Originality Highly original
AI Analysis

This work addresses the challenge of complex knowledge reasoning and real-world adaptation in image generation for users needing more accurate and dynamic visual outputs, representing a novel approach rather than an incremental improvement.

The paper tackles the problem of text-to-image models failing to grasp implicit user intentions and adapt to real-world dynamics by introducing Mind-Brush, a unified agentic framework that integrates cognitive search and reasoning, resulting in significant enhancements such as a zero-to-one capability leap for the Qwen-Image baseline on the proposed Mind-Bench benchmark.

While text-to-image generation has achieved unprecedented fidelity, the vast majority of existing models function fundamentally as static text-to-pixel decoders. Consequently, they often fail to grasp implicit user intentions. Although emerging unified understanding-generation models have improved intent comprehension, they still struggle to accomplish tasks involving complex knowledge reasoning within a single model. Moreover, constrained by static internal priors, these models remain unable to adapt to the evolving dynamics of the real world. To bridge these gaps, we introduce Mind-Brush, a unified agentic framework that transforms generation into a dynamic, knowledge-driven workflow. Simulating a human-like 'think-research-create' paradigm, Mind-Brush actively retrieves multimodal evidence to ground out-of-distribution concepts and employs reasoning tools to resolve implicit visual constraints. To rigorously evaluate these capabilities, we propose Mind-Bench, a comprehensive benchmark comprising 500 distinct samples spanning real-time news, emerging concepts, and domains such as mathematical and Geo-Reasoning. Extensive experiments demonstrate that Mind-Brush significantly enhances the capabilities of unified models, realizing a zero-to-one capability leap for the Qwen-Image baseline on Mind-Bench, while achieving superior results on established benchmarks like WISE and RISE.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes