AI CLOct 23, 2020

Long Document Ranking with Query-Directed Sparse Transformer

Jyun-Yu Jiang, Chenyan Xiong, Chia-Jung Lee, Wei Wang

arXiv:2010.12683v160.41002 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of handling long documents in ranking tasks for information retrieval, offering an incremental improvement by integrating IR axioms into sparse attention.

The paper tackles the problem of ranking long documents with transformers by proposing Query-Directed Sparse attention, which incorporates IR principles to improve efficiency and performance. Results show consistent advantages over previous methods on TREC benchmarks and a 2x efficiency gain in computing complexity.

The computing cost of transformer self-attention often necessitates breaking long documents to fit in pretrained models in document ranking tasks. In this paper, we design Query-Directed Sparse attention that induces IR-axiomatic structures in transformer self-attention. Our model, QDS-Transformer, enforces the principle properties desired in ranking: local contextualization, hierarchical representation, and query-oriented proximity matching, while it also enjoys efficiency from sparsity. Experiments on one fully supervised and three few-shot TREC document ranking benchmarks demonstrate the consistent and robust advantage of QDS-Transformer over previous approaches, as they either retrofit long documents into BERT or use sparse attention without emphasizing IR principles. We further quantify the computing complexity and demonstrates that our sparse attention with TVM implementation is twice more efficient than the fully-connected self-attention. All source codes, trained model, and predictions of this work are available at https://github.com/hallogameboy/QDS-Transformer.

View on arXiv PDF Code

Similar