LGDec 26, 2025

Decomposing Uncertainty in Probabilistic Knowledge Graph Embeddings: Why Entity Variance Is Not Enough

arXiv:2512.22318v2h-index: 1

Originality Highly original

AI Analysis

This work addresses a fundamental limitation in uncertainty quantification for knowledge graph embeddings, which is crucial for improving reliability in AI applications like question answering and recommendation systems, though it is incremental in refining existing probabilistic methods.

The paper tackled the problem of probabilistic knowledge graph embeddings conflating two distinct out-of-distribution phenomena—emerging entities and novel relational contexts—by proving an impossibility result for relation-agnostic uncertainty estimators and proposing a decomposition into semantic and structural uncertainty. Their method (CAGP) achieved 0.94-0.99 AUROC on temporal OOD detection, a 60-80% relative improvement, and reduced errors by 43% at 85% answer rate in selective prediction.

Probabilistic knowledge graph embeddings represent entities as distributions, using learned variances to quantify epistemic uncertainty. We identify a fundamental limitation: these variances are relation-agnostic, meaning an entity receives identical uncertainty regardless of relational context. This conflates two distinct out-of-distribution phenomena that behave oppositely: emerging entities (rare, poorly-learned) and novel relational contexts (familiar entities in unobserved relationships). We prove an impossibility result: any uncertainty estimator using only entity-level statistics independent of relation context achieves near-random OOD detection on novel contexts. We empirically validate this on three datasets, finding 100 percent of novel-context triples have frequency-matched in-distribution counterparts. This explains why existing probabilistic methods achieve 0.99 AUROC on random corruptions but only 0.52-0.64 on temporal distribution shift. We formalize uncertainty decomposition into complementary components: semantic uncertainty from entity embedding variance (detecting emerging entities) and structural uncertainty from entity-relation co-occurrence (detecting novel contexts). Our main theoretical result proves these signals are non-redundant, and that any convex combination strictly dominates either signal alone. Our method (CAGP) combines semantic and structural uncertainty via learned weights, achieving 0.94-0.99 AUROC on temporal OOD detection across multiple benchmarks, a 60-80 percent relative improvement over relation-agnostic baselines. Empirical validation confirms complete frequency overlap on three datasets (FB15k-237, WN18RR, YAGO3-10). On selective prediction, our method reduces errors by 43 percent at 85 percent answer rate.

View on arXiv PDF

Similar