SIOct 25, 2017
Early identification of important patents through network centralityManuel Sebastian Mariani, Matus Medo, François Lafond
One of the most challenging problems in technological forecasting is to identify as early as possible those technologies that have the potential to lead to radical changes in our society. In this paper, we use the US patent citation network (1926-2010) to test our ability to early identify a list of historically significant patents through citation network analysis. We show that in order to effectively uncover these patents shortly after they are issued, we need to go beyond raw citation counts and take into account both the citation network topology and temporal information. In particular, an age-normalized measure of patent centrality, called rescaled PageRank, allows us to identify the significant patents earlier than citation count and PageRank score. In addition, we find that while high-impact patents tend to rely on other high-impact patents in a similar way as scientific papers, the patents' citation dynamics is significantly slower than that of papers, which makes the early identification of significant patents more challenging than that of significant papers.
SOC-PHApr 26, 2017
Ranking in evolving complex networksHao Liao, Manuel Sebastian Mariani, Matus Medo et al.
Complex networks have emerged as a simple yet powerful framework to represent and analyze a wide range of complex systems. The problem of ranking the nodes and the edges in complex networks is critical for a broad range of real-world problems because it affects how we access online information and products, how success and talent are evaluated in human activities, and how scarce resources are allocated by companies and policymakers, among others. This calls for a deep understanding of how existing ranking algorithms perform, and which are their possible biases that may impair their effectiveness. Well-established ranking algorithms (such as the popular Google's PageRank) are static in nature and, as a consequence, they exhibit important shortcomings when applied to real networks that rapidly evolve in time. The recent advances in the understanding and modeling of evolving networks have enabled the development of a wide and diverse range of ranking algorithms that take the temporal dimension into account. The aim of this review is to survey the existing ranking algorithms, both static and time-aware, and their applications to evolving networks. We emphasize both the impact of network evolution on well-established static algorithms and the benefits from including the temporal dimension for tasks such as prediction of real network traffic, prediction of future links, and identification of highly-significant nodes.
SOC-PHMar 23, 2017
Quantifying and suppressing ranking bias in a large citation networkGiacomo Vaccario, Matus Medo, Nicolas Wider et al.
It is widely recognized that citation counts for papers from different fields cannot be directly compared because different scientific fields adopt different citation practices. Citation counts are also strongly biased by paper age since older papers had more time to attract citations. Various procedures aim at suppressing these biases and give rise to new normalized indicators, such as the relative citation count. We use a large citation dataset from Microsoft Academic Graph and a new statistical framework based on the Mahalanobis distance to show that the rankings by well known indicators, including the relative citation count and Google's PageRank score, are significantly biased by paper field and age. We propose a general normalization procedure motivated by the $z$-score which produces much less biased rankings when applied to citation count and PageRank score.
SOC-PHAug 30, 2016
Identification of milestone papers through time-balanced network centralityManuel Sebastian Mariani, Matus Medo, Yi-Cheng Zhang
Citations between scientific papers and related bibliometric indices, such as the $h$-index for authors and the impact factor for journals, are being increasingly used - often in controversial ways - as quantitative tools for research evaluation. Yet, a fundamental research question remains still open: to which extent do quantitative metrics capture the significance of scientific works? We analyze the network of citations among the $449,935$ papers published by the American Physical Society (APS) journals between 1893 and 2009, and focus on the comparison of metrics built on the citation count with network-based metrics. We contrast five article-level metrics with respect to the rankings that they assign to a set of fundamental papers, called Milestone Letters, carefully selected by the APS editors for "making long-lived contributions to physics, either by announcing significant discoveries, or by initiating new areas of research". A new metric, which combines PageRank centrality with the explicit requirement that paper score is not biased by paper age, is the best-performing metric overall in identifying the Milestone Letters. The lack of time bias in the new metric makes it also possible to use it to compare papers of different age on the same scale. We find that network-based metrics identify the Milestone Letters better than metrics based on the citation count, which suggests that the structure of the citation network contains information that can be used to improve the ranking of scientific publications. The methods and results presented here are relevant for all evolving systems where network centrality metrics are applied, for example the World Wide Web and online social networks. An interactive Web platform where it is possible to view the ranking of the APS papers by rescaled PageRank is available at the address \url{http://www.sciencenow.info}.
IRJun 15, 2016
The essential role of time in network-based recommendationAlexandre Vidmer, Matus Medo
Random walks on bipartite networks have been used extensively to design personalized recommendation methods. While aging has been identified as a key component in the growth of information networks, most research has focused on the networks' structural properties and neglected the often available time information. Time has been largely ignored both by the investigated recommendation methods as well as by the methodology used to evaluate them. We show that this time-unaware approach overestimates the methods' recommendation performance. Motivated by microscopic rules of network growth, we propose a time-aware modification of an existing recommendation method and show that by combining the temporal and structural aspects, it outperforms the existing methods. The performance improvements are particularly striking in systems with fast aging.
IRNov 19, 2015
Network-based recommendation algorithms: A reviewFei Yu, An Zeng, Sebastien Gillard et al.
Recommender systems are a vital tool that helps us to overcome the information overload problem. They are being used by most e-commerce web sites and attract the interest of a broad scientific community. A recommender system uses data on users' past preferences to choose new items that might be appreciated by a given individual user. While many approaches to recommendation exist, the approach based on a network representation of the input data has gained considerable attention in the past. We review here a broad range of network-based recommendation algorithms and for the first time compare their performance on three distinct real datasets. We present recommendation topics that go beyond the mere question of which algorithm to use - such as the possible influence of recommendation on the evolution of systems that use it - and finally discuss open research directions and challenges.
SOC-PHSep 3, 2015
Ranking nodes in growing networks: When PageRank failsManuel Sebastian Mariani, Matus Medo, Yi-Cheng Zhang
PageRank is arguably the most popular ranking algorithm which is being applied in real systems ranging from information to biological and infrastructure networks. Despite its outstanding popularity and broad use in different areas of science, the relation between the algorithm's efficacy and properties of the network on which it acts has not yet been fully understood. We study here PageRank's performance on a network model supported by real data, and show that realistic temporal effects make PageRank fail in individuating the most valuable nodes for a broad range of model parameters. Results on real data are in qualitative agreement with our model-based findings. This failure of PageRank reveals that the static approach to information filtering is inappropriate for a broad class of growing systems, and suggest that time-dependent algorithms that are based on the temporal linking patterns of these systems are needed to better rank the nodes.
IRAug 7, 2015
Modeling mutual feedback between users and recommender systemsAn Zeng, Chi Ho Yeung, Matus Medo et al.
Recommender systems daily influence our decisions on the Internet. While considerable attention has been given to issues such as recommendation accuracy and user privacy, the long-term mutual feedback between a recommender system and the decisions of its users has been neglected so far. We propose here a model of network evolution which allows us to study the complex dynamics induced by this feedback, including the hysteresis effect which is typical for systems with non-linear dynamics. Despite the popular belief that recommendation helps users to discover new things, we find that the long-term use of recommendation can contribute to the rise of extremely popular items and thus ultimately narrow the user choice. These results are supported by measurements of the time evolution of item popularity inequality in real systems. We show that this adverse effect of recommendation can be tamed by sacrificing part of short-term recommendation accuracy.
SINov 13, 2013
Ranking users, papers and authors in online scientific communitiesHao Liao, Rui Xiao, Giulio Cimini et al.
The ever-increasing quantity and complexity of scientific production have made it difficult for researchers to keep track of advances in their own fields. This, together with growing popularity of online scientific communities, calls for the development of effective information filtering tools. We propose here a method to simultaneously compute reputation of users and quality of scientific artifacts in an online scientific community. Evaluation on artificially-generated data and real data from the Econophysics Forum is used to determine the method's best-performing variants. We show that when the method is extended by considering author credit, its performance improves on multiple levels. In particular, top papers have higher citation count and top authors have higher $h$-index than top papers and top authors chosen by other algorithms.
IRAug 31, 2013
Information filtering via hybridization of similarity preferential diffusion processesAn Zeng, Alexandre Vidmer, Matus Medo et al.
The recommender system is one of the most promising ways to address the information overload problem in online systems. Based on the personal historical record, the recommender system can find interesting and relevant objects for the user within a huge information space. Many physical processes such as the mass diffusion and heat conduction have been applied to design the recommendation algorithms. The hybridization of these two algorithms has been shown to provide both accurate and diverse recommendation results. In this paper, we proposed two similarity preferential diffusion processes. Extensive experimental analyses on two benchmark data sets demonstrate that both recommendation and accuracy and diversity are improved duet to the similarity preference in the diffusion. The hybridization of the similarity preferential diffusion processes is shown to significantly outperform the state-of-art recommendation algorithm. Finally, our analysis on network sparsity show that there is significant difference between dense and sparse system, indicating that all the former conclusions on recommendation in the literature should be reexamined in sparse system.
SIAug 22, 2012
Network-based information filtering algorithms: ranking and recommendationMatus Medo
After the Internet and the World Wide Web have become popular and widely-available, the electronically stored online interactions of individuals have fast emerged as a challenge for researchers and, perhaps even faster, as a source of valuable information for entrepreneurs. We now have detailed records of informal friendship relations in social networks, purchases on e-commerce sites, various sorts of information being sent from one user to another, online collections of web bookmarks, and many other data sets that allow us to pose questions that are of interest from both academical and commercial point of view. For example, which other users of a social network you might want to be friend with? Which other items you might be interested to purchase? Who are the most influential users in a network? Which web page you might want to visit next? All these questions are not only interesting per se but the answers to them may help entrepreneurs provide better service to their customers and, ultimately, increase their profits.
SOC-PHFeb 6, 2012
Recommender SystemsLinyuan Lü, Matus Medo, Chi Ho Yeung et al.
The ongoing rapid expansion of the Internet greatly increases the necessity of effective recommender systems for filtering the abundant information. Extensive research for recommender systems is conducted by a broad range of communities including social and computer scientists, physicists, and interdisciplinary researchers. Despite substantial theoretical and practical achievements, unification and comparison of different approaches are lacking, which impedes further advances. In this article, we review recent developments in recommender systems and discuss the major challenges. We compare and evaluate available algorithms and examine their roles in the future developments. In addition to algorithms, physical aspects are described to illustrate macroscopic behavior of recommender systems. Potential impacts and future directions are discussed. We emphasize that recommendation has a great scientific depth and combines diverse research fields which makes it of interests for physicists as well as interdisciplinary researchers.