LGAIFeb 1, 2022

CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery

arXiv:2202.00161v385 citations
Originality Incremental advance
AI Analysis

This addresses the problem of efficient adaptation to downstream tasks in reinforcement learning for researchers and practitioners, though it appears incremental as it builds on existing unsupervised skill discovery methods.

The paper tackles unsupervised skill discovery by introducing Contrastive Intrinsic Control (CIC), which uses contrastive learning and entropy maximization to learn diverse behaviors, resulting in a 1.79x improvement over prior skill discovery methods and 1.18x over the next leading exploration algorithm in adaptation efficiency.

We introduce Contrastive Intrinsic Control (CIC), an algorithm for unsupervised skill discovery that maximizes the mutual information between state-transitions and latent skill vectors. CIC utilizes contrastive learning between state-transitions and skills to learn behavior embeddings and maximizes the entropy of these embeddings as an intrinsic reward to encourage behavioral diversity. We evaluate our algorithm on the Unsupervised Reinforcement Learning Benchmark, which consists of a long reward-free pre-training phase followed by a short adaptation phase to downstream tasks with extrinsic rewards. CIC substantially improves over prior methods in terms of adaptation efficiency, outperforming prior unsupervised skill discovery methods by 1.79x and the next leading overall exploration algorithm by 1.18x.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes