CLSIMay 10, 2018

hyperdoc2vec: Distributed Representations of Hypertext Documents

arXiv:1805.03793v11095 citations
Originality Incremental advance
AI Analysis

This addresses the need for better representation of hypertext documents in domains like academic research, but it is incremental as it builds on existing embedding methods.

The paper tackled the problem of embedding hypertext documents like web pages and academic papers, which conventional text embedding methods fail to handle effectively, by proposing hyperdoc2vec and validating its superiority in tasks such as paper classification and citation recommendation.

Hypertext documents, such as web pages and academic papers, are of great importance in delivering information in our daily life. Although being effective on plain documents, conventional text embedding methods suffer from information loss if directly adapted to hyper-documents. In this paper, we propose a general embedding approach for hyper-documents, namely, hyperdoc2vec, along with four criteria characterizing necessary information that hyper-document embedding models should preserve. Systematic comparisons are conducted between hyperdoc2vec and several competitors on two tasks, i.e., paper classification and citation recommendation, in the academic paper domain. Analyses and experiments both validate the superiority of hyperdoc2vec to other models w.r.t. the four criteria.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes