LGAIJun 24, 2024

Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making

arXiv:2406.17098v244 citations
Originality Incremental advance
AI Analysis

This addresses a foundational limitation in planning and reinforcement learning by enabling more efficient and generalizable decision-making, though it builds incrementally on prior work in contrastive learning and quasimetrics.

The paper tackled the problem of defining temporal distances that satisfy the triangle inequality in stochastic settings, which prior methods failed to do, and showed that contrastive successor features provide such a metric, enabling combinatorial generalization and faster learning in reinforcement learning experiments.

Temporal distances lie at the heart of many algorithms for planning, control, and reinforcement learning that involve reaching goals, allowing one to estimate the transit time between two states. However, prior attempts to define such temporal distances in stochastic settings have been stymied by an important limitation: these prior approaches do not satisfy the triangle inequality. This is not merely a definitional concern, but translates to an inability to generalize and find shortest paths. In this paper, we build on prior work in contrastive learning and quasimetrics to show how successor features learned by contrastive learning (after a change of variables) form a temporal distance that does satisfy the triangle inequality, even in stochastic settings. Importantly, this temporal distance is computationally efficient to estimate, even in high-dimensional and stochastic settings. Experiments in controlled settings and benchmark suites demonstrate that an RL algorithm based on these new temporal distances exhibits combinatorial generalization (i.e., "stitching") and can sometimes learn more quickly than prior methods, including those based on quasimetrics.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes