IVCVMay 25, 2025

MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation

arXiv:2505.19225v14 citationsh-index: 17Has Code
Originality Highly original
AI Analysis

This addresses the problem of limited multimodal AI applications in medical imaging for clinicians and researchers, representing a novel method rather than an incremental improvement.

The authors tackled the lack of a unified visual tokenizer for medical imaging by developing MedITok, which encodes both structural details and clinical semantics, achieving state-of-the-art performance on over 30 datasets across 9 modalities and 4 tasks.

Advanced autoregressive models have reshaped multimodal AI. However, their transformative potential in medical imaging remains largely untapped due to the absence of a unified visual tokenizer -- one capable of capturing fine-grained visual structures for faithful image reconstruction and realistic image synthesis, as well as rich semantics for accurate diagnosis and image interpretation. To this end, we present MedITok, the first unified tokenizer tailored for medical images, encoding both low-level structural details and high-level clinical semantics within a unified latent space. To balance these competing objectives, we introduce a novel two-stage training framework: a visual representation alignment stage that cold-starts the tokenizer reconstruction learning with a visual semantic constraint, followed by a textual semantic representation alignment stage that infuses detailed clinical semantics into the latent space. Trained on the meticulously collected large-scale dataset with over 30 million medical images and 2 million image-caption pairs, MedITok achieves state-of-the-art performance on more than 30 datasets across 9 imaging modalities and 4 different tasks. By providing a unified token space for autoregressive modeling, MedITok supports a wide range of tasks in clinical diagnostics and generative healthcare applications. Model and code will be made publicly available at: https://github.com/Masaaki-75/meditok.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes