CVLGApr 20, 2024

Multi-Cell Decoder and Mutual Learning for Table Structure and Character Recognition

arXiv:2404.13268v28 citationsh-index: 5ICDAR
Originality Incremental advance
AI Analysis

This addresses the need for converting tables in documents like scientific papers and financial reports into a format usable by large language models, representing an incremental improvement over existing methods.

The paper tackles the problem of extracting table contents from documents by improving end-to-end recognition of both table structure and cell contents, achieving performance comparable to state-of-the-art models on large datasets, including for long tables with hundreds of cells.

Extracting table contents from documents such as scientific papers and financial reports and converting them into a format that can be processed by large language models is an important task in knowledge information processing. End-to-end approaches, which recognize not only table structure but also cell contents, achieved performance comparable to state-of-the-art models using external character recognition systems, and have potential for further improvements. In addition, these models can now recognize long tables with hundreds of cells by introducing local attention. However, the models recognize table structure in one direction from the header to the footer, and cell content recognition is performed independently for each cell, so there is no opportunity to retrieve useful information from the neighbor cells. In this paper, we propose a multi-cell content decoder and bidirectional mutual learning mechanism to improve the end-to-end approach. The effectiveness is demonstrated on two large datasets, and the experimental results show comparable performance to state-of-the-art models, even for long tables with large numbers of cells.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes