CVAIFeb 14, 2025

HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation

arXiv:2502.09838v395 citationsh-index: 14Has CodeICML
Originality Incremental advance
AI Analysis

This work addresses the need for unified medical AI models that can handle both comprehension and generation, potentially benefiting healthcare professionals and researchers, though it appears incremental as it builds on existing LLM adaptation techniques.

The authors tackled the problem of integrating medical visual comprehension and generation by developing HealthGPT, a Medical Large Vision-Language Model, which demonstrated exceptional performance and scalability in medical visual unified tasks.

We present HealthGPT, a powerful Medical Large Vision-Language Model (Med-LVLM) that integrates medical visual comprehension and generation capabilities within a unified autoregressive paradigm. Our bootstrapping philosophy is to progressively adapt heterogeneous comprehension and generation knowledge to pre-trained large language models (LLMs). This is achieved through a novel heterogeneous low-rank adaptation (H-LoRA) technique, which is complemented by a tailored hierarchical visual perception approach and a three-stage learning strategy. To effectively learn the HealthGPT, we devise a comprehensive medical domain-specific comprehension and generation dataset called VL-Health. Experimental results demonstrate exceptional performance and scalability of HealthGPT in medical visual unified tasks. Our project can be accessed at https://github.com/DCDmllm/HealthGPT.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes