Reevaluation of Inductive Link Prediction
This work exposes critical evaluation issues in inductive link prediction, impacting researchers and practitioners by revealing inflated performance claims and necessitating protocol revisions.
The paper identifies a major flaw in the evaluation protocol for inductive link prediction, where a simple rule-based baseline achieves state-of-the-art results due to limited negative sampling, and proposes an improved protocol that drastically changes evaluation outcomes.
Within this paper, we show that the evaluation protocol currently used for inductive link prediction is heavily flawed as it relies on ranking the true entity in a small set of randomly sampled negative entities. Due to the limited size of the set of negatives, a simple rule-based baseline can achieve state-of-the-art results, which simply ranks entities higher based on the validity of their type. As a consequence of these insights, we reevaluate current approaches for inductive link prediction on several benchmarks using the link prediction protocol usually applied to the transductive setting. As some inductive methods suffer from scalability issues when evaluated in this setting, we propose and apply additionally an improved sampling protocol, which does not suffer from the problem mentioned above. The results of our evaluation differ drastically from the results reported in so far.