ReinPath: A Multimodal Reinforcement Learning Approach for Pathology
This work addresses interpretability issues in computational pathology for researchers and clinicians, representing an incremental improvement with a new dataset and method.
The paper tackles the problem of limited interpretability in multimodal pathology methods by introducing a novel multimodal pathology large language model with strong reasoning capabilities, which outperforms state-of-the-art methods on a new VQA dataset even when trained with only 20% of the data.
Interpretability is significant in computational pathology, leading to the development of multimodal information integration from histopathological image and corresponding text data.However, existing multimodal methods have limited interpretability due to the lack of high-quality dataset that support explicit reasoning and inference and simple reasoning process.To address the above problems, we introduce a novel multimodal pathology large language model with strong reasoning capabilities.To improve the generation of accurate and contextually relevant textual descriptions, we design a semantic reward strategy integrated with group relative policy optimization.We construct a high-quality pathology visual question answering (VQA) dataset, specifically designed to support complex reasoning tasks.Comprehensive experiments conducted on this dataset demonstrate that our method outperforms state-of-the-art methods, even when trained with only 20% of the data.Our method also achieves comparable performance on downstream zero-shot image classification task compared with CLIP.