CVJan 20, 2025

A baseline for machine-learning-based hepatocellular carcinoma diagnosis using multi-modal clinical data

arXiv:2501.11535v11 citationsh-index: 16
Originality Synthesis-oriented
AI Analysis

This provides a benchmark for researchers in medical AI working on liver cancer diagnosis, but it is incremental as it applies existing methods to a new dataset.

The paper tackled hepatocellular carcinoma diagnosis by establishing a baseline using a novel multimodal dataset combining CT/MRI images and clinical tabular data, achieving a prediction accuracy of 0.89 ± 0.05 and an AUC of 0.93 ± 0.03 for TNM staging.

The objective of this paper is to provide a baseline for performing multi-modal data classification on a novel open multimodal dataset of hepatocellular carcinoma (HCC), which includes both image data (contrast-enhanced CT and MRI images) and tabular data (the clinical laboratory test data as well as case report forms). TNM staging is the classification task. Features from the vectorized preprocessed tabular data and radiomics features from contrast-enhanced CT and MRI images are collected. Feature selection is performed based on mutual information. An XGBoost classifier predicts the TNM staging and it shows a prediction accuracy of $0.89 \pm 0.05$ and an AUC of $0.93 \pm 0.03$. The classifier shows that this high level of prediction accuracy can only be obtained by combining image and clinical laboratory data and therefore is a good example case where multi-model classification is mandatory to achieve accurate results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes