LGAIFeb 11, 2025

Advancing Autonomous VLM Agents via Variational Subgoal-Conditioned Reinforcement Learning

arXiv:2502.07949v22 citationsh-index: 12
Originality Highly original
AI Analysis

This addresses a bottleneck in autonomous VLM agents for real-world decision-making, representing a novel method rather than an incremental improvement.

The paper tackles the inefficiency of reinforcement learning for vision-language model agents in complex tasks with sparse rewards by proposing Variational Subgoal-Conditioned Reinforcement Learning (VSC-RL), which improves learning efficiency and outperforms state-of-the-art methods on benchmarks like mobile device and web control tasks.

State-of-the-art (SOTA) reinforcement learning (RL) methods have enabled vision-language model (VLM) agents to learn from interaction with online environments without human supervision. However, these methods often struggle with learning inefficiencies when applied to complex, real-world decision-making tasks with sparse rewards and long-horizon dependencies. We propose a novel framework, Variational Subgoal-Conditioned Reinforcement Learning (VSC-RL), advancing the VLM agents in resolving challenging decision-making tasks. Fundamentally distinct from existing methods, VSC-RL reformulates the decision-making problem as a variational subgoal-conditioned RL problem with the newly derived optimization objective, Subgoal Evidence Lower BOund (SGC-ELBO), which comprises two key components: (a) maximizing the subgoal-conditioned return, and (b) minimizing the divergence from a reference goal-conditioned policy. We theoretically and empirically demonstrate that the VSC-RL can efficiently improve the learning efficiency without compromising performance guarantees. Across a diverse set of challenging benchmarks, including mobile device and web control tasks, VSC-RL consistently outperforms existing SOTA methods, achieving superior learning efficiency and performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes