LGAIJun 10, 2025

Policy-Based Trajectory Clustering in Offline Reinforcement Learning

arXiv:2506.09202v21 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses the need for policy-based trajectory clustering in offline RL, offering a framework with potential applications, though it appears incremental as it builds on existing clustering and RL techniques.

The paper tackles the problem of clustering trajectories in offline reinforcement learning datasets by associating clusters with policies, proposing two methods (PG-Kmeans and CAAE) that effectively partition trajectories into meaningful clusters on D4RL and GridWorld environments.

We introduce a novel task of clustering trajectories from offline reinforcement learning (RL) datasets, where each cluster center represents the policy that generated its trajectories. By leveraging the connection between the KL-divergence of offline trajectory distributions and a mixture of policy-induced distributions, we formulate a natural clustering objective. To solve this, we propose Policy-Guided K-means (PG-Kmeans) and Centroid-Attracted Autoencoder (CAAE). PG-Kmeans iteratively trains behavior cloning (BC) policies and assigns trajectories based on policy generation probabilities, while CAAE resembles the VQ-VAE framework by guiding the latent representations of trajectories toward the vicinity of specific codebook entries to achieve clustering. Theoretically, we prove the finite-step convergence of PG-Kmeans and identify a key challenge in offline trajectory clustering: the inherent ambiguity of optimal solutions due to policy-induced conflicts, which can result in multiple equally valid but structurally distinct clusterings. Experimentally, we validate our methods on the widely used D4RL dataset and custom GridWorld environments. Our results show that both PG-Kmeans and CAAE effectively partition trajectories into meaningful clusters. They offer a promising framework for policy-based trajectory clustering, with broad applications in offline RL and beyond.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes