AICLCVJan 29, 2025

Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

arXiv:2501.17811v1732 citationsh-index: 15
Originality Synthesis-oriented
AI Analysis

This is an incremental improvement for researchers in multimodal AI, building on a previous model.

The authors tackled the problem of improving multimodal understanding and text-to-image generation by scaling data and model size, resulting in significant advancements in capabilities and enhanced stability.

In this work, we introduce Janus-Pro, an advanced version of the previous work Janus. Specifically, Janus-Pro incorporates (1) an optimized training strategy, (2) expanded training data, and (3) scaling to larger model size. With these improvements, Janus-Pro achieves significant advancements in both multimodal understanding and text-to-image instruction-following capabilities, while also enhancing the stability of text-to-image generation. We hope this work will inspire further exploration in the field. Code and models are publicly available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes