AIMay 13

Strikingness-Aware Evaluation for Temporal Knowledge Graph Reasoning

arXiv:2605.1315344.8

Predicted impact top 60% in AI · last 90 daysOriginality Synthesis-oriented

AI Analysis

For researchers in temporal knowledge graph reasoning, this work provides a more rigorous evaluation framework that refocuses the field on predicting outstanding events, but it is an incremental contribution as it introduces a new evaluation metric rather than a new model or paradigm.

Current evaluation in Temporal Knowledge Graph Reasoning uniformly weights all events, ignoring that most are trivial repetitions, which overestimates true reasoning ability. The authors propose a strikingness-aware evaluation framework that weights events by their rarity, revealing that all models perform worse on high-strikingness events and that ensemble gains come from fitting trivial events rather than reasoning improvement.

Temporal Knowledge Graph Reasoning (TKGR) aims at inferring missing (especially future) events from historical data. Current evaluation in TKGR uniformly weights all events, ignoring that most are trivial repetitions, which overestimate the true reasoning ability. Therefore, the rare outstanding events, whose prediction demands deeper reasoning, should be distinguished and emphasized. To this end, we propose a strikingness-aware evaluation framework, which introduces a rule-based strikingness measuring framework (RSMF) to quantify event strikingness by comparing its expected occurrence with peer events derived from temporal rules. Strikingness is then integrated as a weighting factor into metrics like weighted MRR and Hits@k. Experiments on four TKG benchmarks reveal: 1) All representative models perform worse as event strikingness increases, 2) Path-based methods excel on low-strikingness events and representation-based ones on high-strikingness events, 3) We design an ensemble method whose gains stem from fitting trivial events rather than reasoning improvement. Our framework provides a more rigorous evaluation, refocusing the field on predicting outstanding events.

View on arXiv PDF

Similar