CVAug 12, 2024

Learning Collaborative Knowledge with Multimodal Representation for Polyp Re-Identification

arXiv:2408.05914v31 citationsh-index: 9Has Code
Originality Incremental advance
AI Analysis

This work addresses a domain-specific problem in medical imaging for colorectal cancer diagnosis, offering an incremental improvement by incorporating multimodal data.

The paper tackles polyp re-identification in colonoscopy by proposing a deep multimodal collaborative learning framework that integrates visual and textual data, achieving improved retrieval performance over unimodal state-of-the-art models on standard benchmarks.

Colonoscopic Polyp Re-Identification aims to match the same polyp from a large gallery with images from different views taken using different cameras, which plays an important role in the prevention and treatment of colorectal cancer in computer-aided diagnosis. However, traditional methods for object ReID directly adopting CNN models trained on the ImageNet dataset usually produce unsatisfactory retrieval performance on colonoscopic datasets due to the large domain gap. Worsely, these solutions typically learn unimodal modal representations on the basis of visual samples, which fails to explore complementary information from other different modalities. To address this challenge, we propose a novel Deep Multimodal Collaborative Learning framework named DMCL for polyp re-identification, which can effectively encourage multimodal knowledge collaboration and reinforce generalization capability in medical scenarios. On the basis of it, a dynamic multimodal feature fusion strategy is introduced to leverage the optimized visual-text representations for multimodal fusion via end-to-end training. Experiments on the standard benchmarks show the benefits of the multimodal setting over state-of-the-art unimodal ReID models, especially when combined with the collaborative multimodal fusion strategy. The code is publicly available at https://github.com/JeremyXSC/DMCL.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes