CLApr 29, 2019

Semantic Matching of Documents from Heterogeneous Collections: A Simple and Transparent Method for Practical Applications

arXiv:1904.12550v11089 citations
Originality Synthesis-oriented
AI Analysis

This work addresses document matching for practical applications, but it is incremental as it builds on standard resources without domain-specific modifications.

The authors tackled the problem of pairwise matching documents from heterogeneous collections, specifically the Concept-Project binary classification task, and demonstrated that their simple, unsupervised method outperforms a more complex existing system.

We present a very simple, unsupervised method for the pairwise matching of documents from heterogeneous collections. We demonstrate our method with the Concept-Project matching task, which is a binary classification task involving pairs of documents from heterogeneous collections. Although our method only employs standard resources without any domain- or task-specific modifications, it clearly outperforms the more complex system of the original authors. In addition, our method is transparent, because it provides explicit information about how a similarity score was computed, and efficient, because it is based on the aggregation of (pre-computable) word-level similarities.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes