Representation Learning for Recommender Systems with Application to the Scientific Literature
This work addresses challenges in real-time scientific paper recommendation and expert search for researchers and institutions, but it appears incremental as it builds on existing attributed network embedding approaches.
The authors tackled the problem of extracting relevant content from the dynamic heterogeneous attributed network of scientific literature by learning node and attribute representations for recommendation tasks, but existing methods inadequately handle textual attributes and fail to infer representations for new documents without network information.
The scientific literature is a large information network linking various actors (laboratories, companies, institutions, etc.). The vast amount of data generated by this network constitutes a dynamic heterogeneous attributed network (HAN), in which new information is constantly produced and from which it is increasingly difficult to extract content of interest. In this article, I present my first thesis works in partnership with an industrial company, Digital Scientific Research Technology. This later offers a scientific watch tool, Peerus, addressing various issues, such as the real time recommendation of newly published papers or the search for active experts to start new collaborations. To tackle this diversity of applications, a common approach consists in learning representations of the nodes and attributes of this HAN and use them as features for a variety of recommendation tasks. However, most works on attributed network embedding pay too little attention to textual attributes and do not fully take advantage of recent natural language processing techniques. Moreover, proposed methods that jointly learn node and document representations do not provide a way to effectively infer representations for new documents for which network information is missing, which happens to be crucial in real time recommender systems. Finally, the interplay between textual and graph data in text-attributed heterogeneous networks remains an open research direction.