CVJan 5

CTIS-QA: Clinical Template-Informed Slide-level Question Answering for Pathology

Hao Lu, Ziniu Qian, Yifu Li, Yang Zhou, Bingzheng Wei, Yan Xu

arXiv:2601.01769v11.5h-index: 9Has CodeBIBM

Originality Incremental advance

AI Analysis

This work addresses the need for standardized and clinically grounded vision-language models in pathology, offering incremental improvements through a template-based pipeline and dual-stream architecture for pathologists.

The paper tackles the problem of extracting structured pathological information from pathology reports and using it for slide-level question answering in clinical diagnostics, resulting in the creation of CTIS-Align (80k slide-description pairs) and CTIS-Bench (977 WSIs with 14,879 QA pairs) datasets, and a model that outperforms state-of-the-art models across multiple metrics.

In this paper, we introduce a clinical diagnosis template-based pipeline to systematically collect and structure pathological information. In collaboration with pathologists and guided by the the College of American Pathologists (CAP) Cancer Protocols, we design a Clinical Pathology Report Template (CPRT) that ensures comprehensive and standardized extraction of diagnostic elements from pathology reports. We validate the effectiveness of our pipeline on TCGA-BRCA. First, we extract pathological features from reports using CPRT. These features are then used to build CTIS-Align, a dataset of 80k slide-description pairs from 804 WSIs for vision-language alignment training, and CTIS-Bench, a rigorously curated VQA benchmark comprising 977 WSIs and 14,879 question-answer pairs. CTIS-Bench emphasizes clinically grounded, closed-ended questions (e.g., tumor grade, receptor status) that reflect real diagnostic workflows, minimize non-visual reasoning, and require genuine slide understanding. We further propose CTIS-QA, a Slide-level Question Answering model, featuring a dual-stream architecture that mimics pathologists' diagnostic approach. One stream captures global slide-level context via clustering-based feature aggregation, while the other focuses on salient local regions through attention-guided patch perception module. Extensive experiments on WSI-VQA, CTIS-Bench, and slide-level diagnostic tasks show that CTIS-QA consistently outperforms existing state-of-the-art models across multiple metrics. Code and data are available at https://github.com/HLSvois/CTIS-QA.

View on arXiv PDF Code

Similar