CLJul 11, 2025

DocPolarBERT: A Pre-trained Model for Document Understanding with Relative Polar Coordinate Encoding of Layout Structures

Benno Uthayasooriyar, Antoine Ly, Franck Vermet, Caio Corro

arXiv:2507.08606v32.7h-index: 1

Originality Incremental advance

AI Analysis

This work addresses document understanding for AI applications by offering a more efficient pre-training method, though it is incremental as it builds on existing BERT and layout-aware models.

The authors tackled document understanding by introducing DocPolarBERT, a layout-aware BERT model that uses relative polar coordinate encoding instead of absolute 2D positional embeddings, achieving state-of-the-art results despite being pre-trained on a dataset more than six times smaller than the widely used IIT-CDIP corpus.

We introduce DocPolarBERT, a layout-aware BERT model for document understanding that eliminates the need for absolute 2D positional embeddings. We extend self-attention to take into account text block positions in relative polar coordinate system rather than the Cartesian one. Despite being pre-trained on a dataset more than six times smaller than the widely used IIT-CDIP corpus, DocPolarBERT achieves state-of-the-art results. These results demonstrate that a carefully designed attention mechanism can compensate for reduced pre-training data, offering an efficient and effective alternative for document understanding.

View on arXiv PDF

Similar