IVCVJul 6, 2025

ViTaL: A Multimodality Dataset and Benchmark for Multi-pathological Ovarian Tumor Recognition

arXiv:2507.04383v1h-index: 16Has CodeExpert syst appl
Originality Incremental advance
AI Analysis

This work addresses a critical gap in early detection of ovarian tumors for women's health by providing a new dataset and benchmark, though it is incremental as it builds on existing multimodal methods.

The authors tackled the problem of limited public datasets for ovarian tumor recognition by introducing ViTaL, a multimodal dataset with visual, tabular, and linguistic data from 496 patients across six pathological categories, and proposed ViTaL-Net, which achieved accuracies exceeding 90% on two common types and an overall performance of 85%.

Ovarian tumor, as a common gynecological disease, can rapidly deteriorate into serious health crises when undetected early, thus posing significant threats to the health of women. Deep neural networks have the potential to identify ovarian tumors, thereby reducing mortality rates, but limited public datasets hinder its progress. To address this gap, we introduce a vital ovarian tumor pathological recognition dataset called \textbf{ViTaL} that contains \textbf{V}isual, \textbf{T}abular and \textbf{L}inguistic modality data of 496 patients across six pathological categories. The ViTaL dataset comprises three subsets corresponding to different patient data modalities: visual data from 2216 two-dimensional ultrasound images, tabular data from medical examinations of 496 patients, and linguistic data from ultrasound reports of 496 patients. It is insufficient to merely distinguish between benign and malignant ovarian tumors in clinical practice. To enable multi-pathology classification of ovarian tumor, we propose a ViTaL-Net based on the Triplet Hierarchical Offset Attention Mechanism (THOAM) to minimize the loss incurred during feature fusion of multi-modal data. This mechanism could effectively enhance the relevance and complementarity between information from different modalities. ViTaL-Net serves as a benchmark for the task of multi-pathology, multi-modality classification of ovarian tumors. In our comprehensive experiments, the proposed method exhibited satisfactory performance, achieving accuracies exceeding 90\% on the two most common pathological types of ovarian tumor and an overall performance of 85\%. Our dataset and code are available at https://github.com/GGbond-study/vitalnet.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes