LGOct 14, 2025

Towards Cross-Modal Error Detection with Tables and Images

arXiv:2510.12383v19.42 citationsh-index: 4

Originality Synthesis-oriented

AI Analysis

This addresses data quality issues for organizations handling multiple data modalities, but it is incremental as it benchmarks existing methods without introducing new techniques.

The paper tackled the problem of cross-modal error detection in tabular data by benchmarking methods like Cleanlab and DataScope with an AutoML framework, achieving the highest F1 scores across four datasets and five baselines.

Ensuring data quality at scale remains a persistent challenge for large organizations. Despite recent advances, maintaining accurate and consistent data is still complex, especially when dealing with multiple data modalities. Traditional error detection and correction methods tend to focus on a single modality, typically a table, and often miss cross-modal errors that are common in domains like e-Commerce and healthcare, where image, tabular, and text data co-exist. To address this gap, we take an initial step towards cross-modal error detection in tabular data, by benchmarking several methods. Our evaluation spans four datasets and five baseline approaches. Among them, Cleanlab, a label error detection framework, and DataScope, a data valuation method, perform the best when paired with a strong AutoML framework, achieving the highest F1 scores. Our findings indicate that current methods remain limited, particularly when applied to heavy-tailed real-world data, motivating further research in this area.

View on arXiv PDF

Similar