DocPolarBERT: A Pre-trained Model for Document Understanding with Relative Polar Coordinate Encoding of Layout Structures
This work addresses document understanding for AI applications by offering a more efficient pre-training method, though it is incremental as it builds on existing BERT and layout-aware models.
The authors tackled document understanding by introducing DocPolarBERT, a layout-aware BERT model that uses relative polar coordinate encoding instead of absolute 2D positional embeddings, achieving state-of-the-art results despite being pre-trained on a dataset more than six times smaller than the widely used IIT-CDIP corpus.
We introduce DocPolarBERT, a layout-aware BERT model for document understanding that eliminates the need for absolute 2D positional embeddings. We extend self-attention to take into account text block positions in relative polar coordinate system rather than the Cartesian one. Despite being pre-trained on a dataset more than six times smaller than the widely used IIT-CDIP corpus, DocPolarBERT achieves state-of-the-art results. These results demonstrate that a carefully designed attention mechanism can compensate for reduced pre-training data, offering an efficient and effective alternative for document understanding.