CLOct 20, 2021

Contrastive Document Representation Learning with Graph Attention Networks

arXiv:2110.10778v1662 citations
Originality Incremental advance
AI Analysis

This addresses the problem of handling long documents for NLP researchers and practitioners, but it is incremental as it builds on existing Transformer models.

The paper tackles the challenge of modeling long documents by proposing a graph attention network on top of pretrained Transformers to learn document embeddings, and it demonstrates effectiveness in document classification and retrieval tasks with empirical results.

Recent progress in pretrained Transformer-based language models has shown great success in learning contextual representation of text. However, due to the quadratic self-attention complexity, most of the pretrained Transformers models can only handle relatively short text. It is still a challenge when it comes to modeling very long documents. In this work, we propose to use a graph attention network on top of the available pretrained Transformers model to learn document embeddings. This graph attention network allows us to leverage the high-level semantic structure of the document. In addition, based on our graph document model, we design a simple contrastive learning strategy to pretrain our models on a large amount of unlabeled corpus. Empirically, we demonstrate the effectiveness of our approaches in document classification and document retrieval tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes