CVAIDec 18, 2023

UniDCP: Unifying Multiple Medical Vision-language Tasks via Dynamic Cross-modal Learnable Prompts

arXiv:2312.11171v117 citationsh-index: 5IEEE transactions on multimedia
Originality Incremental advance
AI Analysis

This addresses the problem of task-specific inefficiency in medical diagnostics for AI researchers and practitioners, offering a more flexible approach, though it is incremental as it builds on existing Med-VLP methods.

The paper tackles the inflexibility of medical vision-language pre-training models across multiple fine-tuning tasks by proposing UniDCP, a unified model with dynamic cross-modal learnable prompts, which achieves superior results on 8 medical tasks over 14 datasets.

Medical vision-language pre-training (Med-VLP) models have recently accelerated the fast-growing medical diagnostics application. However, most Med-VLP models learn task-specific representations independently from scratch, thereby leading to great inflexibility when they work across multiple fine-tuning tasks. In this work, we propose UniDCP, a Unified medical vision-language model with Dynamic Cross-modal learnable Prompts, which can be plastically applied to multiple medical vision-language tasks. Specifically, we explicitly construct a unified framework to harmonize diverse inputs from multiple pretraining tasks by leveraging cross-modal prompts for unification, which accordingly can accommodate heterogeneous medical fine-tuning tasks. Furthermore, we conceive a dynamic cross-modal prompt optimizing strategy that optimizes the prompts within the shareable space for implicitly processing the shareable clinic knowledge. UniDCP is the first Med-VLP model capable of performing all 8 medical uni-modal and cross-modal tasks over 14 corresponding datasets, consistently yielding superior results over diverse state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes