CVMay 26

Clinically-Grounded Counterfactual Reasoning for Medical Video Diagnosis

Jianzhe Gao, Churan Wang, Weiyi Zhang, Jianghua Li, Li-An Li, Wenguan Wang, Yixin Zhu, Yizhou Wang

arXiv:2605.2648344.1

AI Analysis

For medical video diagnosis, this work addresses the lack of clinical priors and counterfactual reasoning in existing methods, offering a clinically-grounded approach.

MedVCR introduces a counterfactual reasoning framework for medical video diagnosis that synthesizes tissue evolution under pathological states, achieving 2.6%-10.2% performance gains over baselines in colposcopy and colonoscopy tasks.

Medical video diagnosis involves inferring clinical decisions from dynamic tissue responses throughout examination processes. Existing methods rely on an end-to-end learning paradigm that i) focuses on appearance rather than pathology, ii) lacks clinical priors, and iii) reasons solely from observations without counterfactual comparison. This work introduces MedVCR, a counterfactual reasoning framework that mimics clinical diagnostic thinking. MedVCR comprises three components: a Counterfactual Generator that synthesizes tissue evolution under specified pathological states via a diffusion-based manner; a Counterfactual Representation Learning module that encodes diagnostic knowledge through clinical rules (i.e., temporal consistency, pathological separability, and counterfactual alignment); and a Dual Diagnostic Prediction strategy that integrates video-level assessment with frame-level counterfactual analysis. MedVCR is evaluated under both fully supervised (e.g., colposcopy) and weakly supervised (e.g., colonoscopy) video diagnosis settings, yielding 2.6%-10.2% performance gains compared with leading baselines. Comprehensive ablation studies further validate the effectiveness of each component. The code will be released.

View on arXiv PDF

Similar