Staša Milojević

DL
4papers
213citations
Novelty50%
AI Score25

4 Papers

LGDec 4, 2020
Unsupervised embedding of trajectories captures the latent structure of scientific migration

Dakota Murray, Jisung Yoon, Sadamori Kojaku et al.

Human migration and mobility drives major societal phenomena including epidemics, economies, innovation, and the diffusion of ideas. Although human mobility and migration have been heavily constrained by geographic distance throughout the history, advances and globalization are making other factors such as language and culture increasingly more important. Advances in neural embedding models, originally designed for natural language, provide an opportunity to tame this complexity and open new avenues for the study of migration. Here, we demonstrate the ability of the model word2vec to encode nuanced relationships between discrete locations from migration trajectories, producing an accurate, dense, continuous, and meaningful vector-space representation. The resulting representation provides a functional distance between locations, as well as a digital double that can be distributed, re-used, and itself interrogated to understand the many dimensions of migration. We show that the unique power of word2vec to encode migration patterns stems from its mathematical equivalence with the gravity model of mobility. Focusing on the case of scientific migration, we apply word2vec to a database of three million migration trajectories of scientists derived from the affiliations listed on their publication records. Using techniques that leverage its semantic structure, we demonstrate that embeddings can learn the rich structure that underpins scientific migration, such as cultural, linguistic, and prestige relationships at multiple levels of granularity. Our results provide a theoretical foundation and methodological framework for using neural embeddings to represent and understand migration both within and beyond science.

DLJan 8, 2020
Practical method to reclassify Web of Science articles into unique subject categories and broad disciplines

Staša Milojević

Classification of bibliographic items into subjects and disciplines in large databases is essential for many quantitative science studies. The Web of Science classification of journals into ~250 subject categories, which has served as a basis for many studies, is known to have some fundamental problems and several practical limitations that may affect the results from such studies. Here we present an easily reproducible method to perform reclassification of the Web of Science into existing subject categories and into 14 broad areas. Our reclassification is at a level of articles, so it preserves disciplinary differences that may exist among individual articles published in the same journal. Reclassification also eliminates ambiguous (multiple) categories that are found for 50% of items, and assigns a discipline/field category to all articles that come from broad-coverage journals such as Nature and Science. The correctness of the assigned subject categories is evaluated manually and is found to be ~95%.

DLNov 27, 2019
Recency predicts bursts in the evolution of author citations

Filipi Nascimento Silva, Aditya Tandon, Diego Raphael Amancio et al.

The citations process for scientific papers has been studied extensively. But while the citations accrued by authors are the sum of the citations of their papers, translating the dynamics of citation accumulation from the paper to the author level is not trivial. Here we conduct a systematic study of the evolution of author citations, and in particular their bursty dynamics. We find empirical evidence of a correlation between the number of citations most recently accrued by an author and the number of citations they receive in the future. Using a simple model where the probability for an author to receive new citations depends only on the number of citations collected in the previous 12-24 months, we are able to reproduce both the citation and burst size distributions of authors across multiple decades.

DLOct 30, 2015
Quantifying the Cognitive Extent of Science

Staša Milojević

While the modern science is characterized by an exponential growth in scientific literature, the increase in publication volume clearly does not reflect the expansion of the cognitive boundaries of science. Nevertheless, most of the metrics for assessing the vitality of science or for making funding and policy decisions are based on productivity. Similarly, the increasing level of knowledge production by large science teams, whose results often enjoy greater visibility, does not necessarily mean that "big science" leads to cognitive expansion. Here we present a novel, big-data method to quantify the extents of cognitive domains of different bodies of scientific literature independently from publication volume, and apply it to 20 million articles published over 60-130 years in physics, astronomy, and biomedicine. The method is based on the lexical diversity of titles of fixed quotas of research articles. Owing to large size of quotas, the method overcomes the inherent stochasticity of article titles to achieve <1% precision. We show that the periods of cognitive growth do not necessarily coincide with the trends in publication volume. Furthermore, we show that the articles produced by larger teams cover significantly smaller cognitive territory than (the same quota of) articles from smaller teams. Our findings provide a new perspective on the role of small teams and individual researchers in expanding the cognitive boundaries of science. The proposed method of quantifying the extent of the cognitive territory can also be applied to study many other aspects of "science of science."