CVJun 21, 2025

Histopathology Image Report Generation by Vision Language Model with Multimodal In-Context Learning

Shih-Wen Liu, Hsuan-Yu Fan, Wei-Ta Chu, Fu-En Yang, Yu-Chiang Frank Wang

arXiv:2506.17645v13.6h-index: 13

Originality Incremental advance

AI Analysis

This work addresses the problem of automating histopathology reporting for medical professionals, offering an incremental improvement through a novel in-context learning mechanism.

The paper tackled automated medical report generation from histopathology images by proposing PathGenIC, a framework using multimodal in-context learning to retrieve similar image-report pairs and incorporate adaptive feedback, achieving state-of-the-art results on the HistGen benchmark with significant improvements in BLEU, METEOR, and ROUGE-L metrics.

Automating medical report generation from histopathology images is a critical challenge requiring effective visual representations and domain-specific knowledge. Inspired by the common practices of human experts, we propose an in-context learning framework called PathGenIC that integrates context derived from the training set with a multimodal in-context learning (ICL) mechanism. Our method dynamically retrieves semantically similar whole slide image (WSI)-report pairs and incorporates adaptive feedback to enhance contextual relevance and generation quality. Evaluated on the HistGen benchmark, the framework achieves state-of-the-art results, with significant improvements across BLEU, METEOR, and ROUGE-L metrics, and demonstrates robustness across diverse report lengths and disease categories. By maximizing training data utility and bridging vision and language with ICL, our work offers a solution for AI-driven histopathology reporting, setting a strong foundation for future advancements in multimodal clinical applications.

View on arXiv PDF

Similar