AI CL CVJan 29, 2025

Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

Xiaokang Chen, Zhiyu Wu, Xingchao Liu, Zizheng Pan, Wen Liu, Zhenda Xie, Xingkai Yu, Chong Ruan

arXiv:2501.17811v160.5775 citationsh-index: 20Has Code

Originality Synthesis-oriented

AI Analysis

This is an incremental improvement for researchers in multimodal AI, building on a previous model.

The authors tackled the problem of improving multimodal understanding and text-to-image generation by scaling data and model size, resulting in significant advancements in capabilities and enhanced stability.

In this work, we introduce Janus-Pro, an advanced version of the previous work Janus. Specifically, Janus-Pro incorporates (1) an optimized training strategy, (2) expanded training data, and (3) scaling to larger model size. With these improvements, Janus-Pro achieves significant advancements in both multimodal understanding and text-to-image instruction-following capabilities, while also enhancing the stability of text-to-image generation. We hope this work will inspire further exploration in the field. Code and models are publicly available.

View on arXiv PDF Code

Similar