CVMay 29, 2025

UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

arXiv:2505.23380v116 citationsh-index: 6Has Code
Originality Incremental advance
AI Analysis

This addresses the need for more efficient and balanced training in multimodal AI, though it is incremental as it builds on existing models like Show-o and Janus.

The paper tackles the problem of high data and computation requirements in unified multimodal models by introducing UniRL, a self-improving post-training approach that generates its own training data, achieving GenEval scores of 0.77 and 0.65 on Show-o and Janus models.

Unified multimodal large language models such as Show-o and Janus have achieved strong performance across both generation and understanding tasks. However, these models typically rely on large-scale datasets and require substantial computation during the pretraining stage. In addition, several post-training methods have been proposed, but they often depend on external data or are limited to task-specific customization. In this work, we introduce UniRL, a self-improving post-training approach. Our approach enables the model to generate images from prompts and use them as training data in each iteration, without relying on any external image data. Moreover, it enables the two tasks to enhance each other: the generated images are used for understanding, and the understanding results are used to supervise generation. We explore supervised fine-tuning (SFT) and Group Relative Policy Optimization (GRPO) to optimize the models. UniRL offers three key advantages: (1) it requires no external image data, as all training samples are generated by the model itself during training; (2) it not only improves individual task performance, but also reduces the imbalance between generation and understanding; and (3) it requires only several additional training steps during the post-training stage. We evaluate UniRL on top of Show-o and Janus, achieving a GenEval score of 0.77 for Show-o and 0.65 for Janus. Code and models will be released in https://github.com/showlab/UniRL.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes