LLM-assisted Semantic Option Discovery for Facilitating Adaptive Deep Reinforcement Learning
This work addresses practical challenges in DRL for applications requiring adaptive and interpretable policies, though it appears incremental by integrating LLMs with existing symbolic planning approaches.
The paper tackles the issues of low data efficiency, lack of interpretability, and limited cross-environment transferability in Deep Reinforcement Learning by introducing an LLM-driven framework that maps natural language instructions into executable rules and semantically annotates options, resulting in superior performance in data efficiency, constraint compliance, and cross-task transferability in experiments on Office World and Montezuma's Revenge domains.
Despite achieving remarkable success in complex tasks, Deep Reinforcement Learning (DRL) is still suffering from critical issues in practical applications, such as low data efficiency, lack of interpretability, and limited cross-environment transferability. However, the learned policy generating actions based on states are sensitive to the environmental changes, struggling to guarantee behavioral safety and compliance. Recent research shows that integrating Large Language Models (LLMs) with symbolic planning is promising in addressing these challenges. Inspired by this, we introduce a novel LLM-driven closed-loop framework, which enables semantic-driven skill reuse and real-time constraint monitoring by mapping natural language instructions into executable rules and semantically annotating automatically created options. The proposed approach utilizes the general knowledge of LLMs to facilitate exploration efficiency and adapt to transferable options for similar environments, and provides inherent interpretability through semantic annotations. To validate the effectiveness of this framework, we conduct experiments on two domains, Office World and Montezuma's Revenge, respectively. The results demonstrate superior performance in data efficiency, constraint compliance, and cross-task transferability.