DLLGApr 26

Beyond coauthorship: semantic structure and phantom collaborators in transportation research, 1967--2025

arXiv:2604.2369947.7
AI Analysis

For transportation researchers and bibliometricians, this work provides a method to predict future collaborations from semantic similarity, with a strong empirical validation (16–33x improvement over baselines).

The authors built a semantic-structural atlas of 120,323 transportation research papers (1967–2025) and found that semantic and coauthor communities align only weakly (NMI 0.23). They introduced 'phantom collaborators'—authors who are top semantic neighbors but far apart in the coauthor graph—and showed these pairs become real coauthors at 16–33 times above baselines in a temporal hold-out (2020–2025).

We present a semantic-structural atlas of transportation research built from 120{,}323 papers across 34 peer-reviewed journals published between 1967 and 2025, roughly an order of magnitude larger than and a decade beyond Sun and Rahwan's~(2017) coauthorship study. We use OpenAlex and Crossref as open, CC0-licensed data sources, resolve author identity through OpenAlex author IDs, ORCID records, and manual alias resolution, and embed every paper with SPECTER2 with Arora-style whitening concatenated with concept TF--IDF and venue linear-discriminant projections. On this substrate we report three findings. First, Leiden on the author-level semantic k-nearest-neighbor graph yields 23 topic communities that agree only weakly with the 172 coauthor communities (normalized mutual information $0.23$), opening room for a predictive layer that neither source encodes alone. Second, a multiplex Leiden partition combining both edge types recovers 181 communities and localizes where collaboration and topic structure decouple. Third -- the paper's core methodological contribution -- we define \emph{phantom collaborators}, pairs of authors who are top-$K$ semantic neighbors yet $\geq 3$ hops apart in the coauthor graph, and show via a temporal hold-out (training cutoff 2019) that phantom pairs become real coauthors in 2020--2025 at a rate $16$ to $33$ times above random, popularity-weighted, and same-venue baselines, with a $68$-fold monotone gradient between the highest- and lowest-similarity buckets. All artifacts are released as a live, reproducible web atlas at https://choi-seongjin.github.io/transport-atlas/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes