IR CL LGDec 6, 2019

Document Network Embedding: Coping for Missing Content and Missing Links

Jean Dupuy, Adrien Guille, Julien Jacques

arXiv:1912.03048v11.7

Originality Incremental advance

AI Analysis

This work tackles the issue of incomplete data in document networks for information retrieval systems, but it is incremental as it builds on existing embedding and machine translation techniques.

The paper addresses the problem of learning representations for documents with missing content or links by proposing a linear transformation method that projects between content and node embeddings, improving neighborhood prediction and document retrieval.

Searching through networks of documents is an important task. A promising path to improve the performance of information retrieval systems in this context is to leverage dense node and content representations learned with embedding techniques. However, these techniques cannot learn representations for documents that are either isolated or whose content is missing. To tackle this issue, assuming that the topology of the network and the content of the documents correlate, we propose to estimate the missing node representations from the available content representations, and conversely. Inspired by recent advances in machine translation, we detail in this paper how to learn a linear transformation from a set of aligned content and node representations. The projection matrix is efficiently calculated in terms of the singular value decomposition. The usefulness of the proposed method is highlighted by the improved ability to predict the neighborhood of nodes whose links are unobserved based on the projected content representations, and to retrieve similar documents when content is missing, based on the projected node representations.

View on arXiv PDF

Similar