CVDec 24, 2025
DGSAN: Dual-Graph Spatiotemporal Attention Network for Pulmonary Nodule Malignancy PredictionXiao Yu, Zhaojie Fang, Guanyu Zhou et al.
Lung cancer continues to be the leading cause of cancer-related deaths globally. Early detection and diagnosis of pulmonary nodules are essential for improving patient survival rates. Although previous research has integrated multimodal and multi-temporal information, outperforming single modality and single time point, the fusion methods are limited to inefficient vector concatenation and simple mutual attention, highlighting the need for more effective multimodal information fusion. To address these challenges, we introduce a Dual-Graph Spatiotemporal Attention Network, which leverages temporal variations and multimodal data to enhance the accuracy of predictions. Our methodology involves developing a Global-Local Feature Encoder to better capture the local, global, and fused characteristics of pulmonary nodules. Additionally, a Dual-Graph Construction method organizes multimodal features into inter-modal and intra-modal graphs. Furthermore, a Hierarchical Cross-Modal Graph Fusion Module is introduced to refine feature integration. We also compiled a novel multimodal dataset named the NLST-cmst dataset as a comprehensive source of support for related research. Our extensive experiments, conducted on both the NLST-cmst and curated CSTL-derived datasets, demonstrate that our DGSAN significantly outperforms state-of-the-art methods in classifying pulmonary nodules with exceptional computational efficiency.
CVFeb 2, 2025Code
TMI-CLNet: Triple-Modal Interaction Network for Chronic Liver Disease Prognosis From Imaging, Clinical, and Radiomic Data FusionLinglong Wu, Xuhao Shan, Ruiquan Ge et al.
Chronic liver disease represents a significant health challenge worldwide and accurate prognostic evaluations are essential for personalized treatment plans. Recent evidence suggests that integrating multimodal data, such as computed tomography imaging, radiomic features, and clinical information, can provide more comprehensive prognostic information. However, modalities have an inherent heterogeneity, and incorporating additional modalities may exacerbate the challenges of heterogeneous data fusion. Moreover, existing multimodal fusion methods often struggle to adapt to richer medical modalities, making it difficult to capture inter-modal relationships. To overcome these limitations, We present the Triple-Modal Interaction Chronic Liver Network (TMI-CLNet). Specifically, we develop an Intra-Modality Aggregation module and a Triple-Modal Cross-Attention Fusion module, which are designed to eliminate intra-modality redundancy and extract cross-modal information, respectively. Furthermore, we design a Triple-Modal Feature Fusion loss function to align feature representations across modalities. Extensive experiments on the liver prognosis dataset demonstrate that our approach significantly outperforms existing state-of-the-art unimodal models and other multi-modal techniques. Our code is available at https://github.com/Mysterwll/liver.git.
CVJan 4, 2021
Stereo Correspondence and Reconstruction of Endoscopic Data ChallengeMax Allan, Jonathan Mcleod, Congcong Wang et al.
The stereo correspondence and reconstruction of endoscopic data sub-challenge was organized during the Endovis challenge at MICCAI 2019 in Shenzhen, China. The task was to perform dense depth estimation using 7 training datasets and 2 test sets of structured light data captured using porcine cadavers. These were provided by a team at Intuitive Surgical. 10 teams participated in the challenge day. This paper contains 3 additional methods which were submitted after the challenge finished as well as a supplemental section from these teams on issues they found with the dataset.
CVFeb 18, 2019
2017 Robotic Instrument Segmentation ChallengeMax Allan, Alex Shvets, Thomas Kurmann et al.
In mainstream computer vision and machine learning, public datasets such as ImageNet, COCO and KITTI have helped drive enormous improvements by enabling researchers to understand the strengths and limitations of different algorithms via performance comparison. However, this type of approach has had limited translation to problems in robotic assisted surgery as this field has never established the same level of common datasets and benchmarking methods. In 2015 a sub-challenge was introduced at the EndoVis workshop where a set of robotic images were provided with automatically generated annotations from robot forward kinematics. However, there were issues with this dataset due to the limited background variation, lack of complex motion and inaccuracies in the annotation. In this work we present the results of the 2017 challenge on robotic instrument segmentation which involved 10 teams participating in binary, parts and type based segmentation of articulated da Vinci robotic instruments.