IMIRApr 10

Predicting New Concept-Object Associations in Astronomy by Mining the Literature

arXiv:2602.1433521.3h-index: 12
AI Analysis

For astronomers needing to prioritize follow-up targets for limited telescope time, this work demonstrates a predictive tool that outperforms simple heuristics, though the gains are modest and domain-specific.

The authors construct a concept-object knowledge graph from astronomy literature and show that a matrix factorization model with concept-similarity smoothing predicts new associations 16.8% better in NDCG@100 and 19.8% better in Recall@100 than the best baseline, indicating historical literature encodes predictive structure.

We construct a concept-object knowledge graph from the full astro-ph corpus through July 2025. Using an automated pipeline, we extract named astrophysical objects from OCR-processed papers, resolve them to SIMBAD identifiers, and link them to scientific concepts annotated in the source corpus. We then test whether historical graph structure can forecast new concept-object associations before they appear in print. Because the concepts are derived from clustering and therefore overlap semantically, we apply an inference-time concept-similarity smoothing step uniformly to all methods. Across four temporal cutoffs on a physically meaningful subset of concepts, an implicit-feedback matrix factorization model (alternating least squares, ALS) with smoothing outperforms the strongest neighborhood baseline (KNN using text-embedding concept similarity) by 16.8% on NDCG@100 (0.144 vs 0.123) and 19.8% on Recall@100 (0.175 vs 0.146), and exceeds the best recency heuristic by 96% and 88%, respectively. These results indicate that historical literature encodes predictive structure not captured by global heuristics or local neighborhood voting, suggesting a path toward tools that could help triage follow-up targets for scarce telescope time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes