CV LGMar 1, 2023

Aligning benchmark datasets for table structure recognition

Brandon Smock, Rohith Pesala, Robin Abraham

arXiv:2303.00716v29.824 citationsh-index: 8Has Code

Originality Synthesis-oriented

AI Analysis

This addresses a data quality problem for researchers and practitioners in table structure recognition, though it is an incremental improvement focused on benchmark refinement.

The paper tackles the problem of inconsistent annotations across table structure recognition benchmark datasets, which harms model performance. By aligning these benchmarks through error removal and inconsistency reduction, they achieved substantial performance improvements: exact match accuracy on ICDAR-2013 increased from 65% to 75% when trained on PubTables-1M, from 42% to 65% on FinTabNet, and from 69% to 81% when combined.

Benchmark datasets for table structure recognition (TSR) must be carefully processed to ensure they are annotated consistently. However, even if a dataset's annotations are self-consistent, there may be significant inconsistency across datasets, which can harm the performance of models trained and evaluated on them. In this work, we show that aligning these benchmarks$\unicode{x2014}$removing both errors and inconsistency between them$\unicode{x2014}$improves model performance significantly. We demonstrate this through a data-centric approach where we adopt one model architecture, the Table Transformer (TATR), that we hold fixed throughout. Baseline exact match accuracy for TATR evaluated on the ICDAR-2013 benchmark is 65% when trained on PubTables-1M, 42% when trained on FinTabNet, and 69% combined. After reducing annotation mistakes and inter-dataset inconsistency, performance of TATR evaluated on ICDAR-2013 increases substantially to 75% when trained on PubTables-1M, 65% when trained on FinTabNet, and 81% combined. We show through ablations over the modification steps that canonicalization of the table annotations has a significantly positive effect on performance, while other choices balance necessary trade-offs that arise when deciding a benchmark dataset's final composition. Overall we believe our work has significant implications for benchmark design for TSR and potentially other tasks as well. Dataset processing and training code will be released at https://github.com/microsoft/table-transformer.

View on arXiv PDF Code

Similar