AI GTFeb 20, 2024

MORE-3S:Multimodal-based Offline Reinforcement Learning with Shared Semantic Spaces

Tianyu Zheng, Ge Zhang, Xingwei Qu, Ming Kuang, Stephen W. Huang, Zhaofeng He

arXiv:2402.12845v133.482 citationsh-index: 28Has CodeLREC

Originality Incremental advance

AI Analysis

This work addresses the challenge of improving offline RL performance and efficiency for AI systems, offering a novel perspective but with incremental methodological contributions.

The paper tackles offline reinforcement learning by transforming it into a supervised task using multimodal and pre-trained language models to align states and actions in a shared semantic space, resulting in significant performance improvements over baselines on Atari and OpenAI Gym environments.

Drawing upon the intuition that aligning different modalities to the same semantic embedding space would allow models to understand states and actions more easily, we propose a new perspective to the offline reinforcement learning (RL) challenge. More concretely, we transform it into a supervised learning task by integrating multimodal and pre-trained language models. Our approach incorporates state information derived from images and action-related data obtained from text, thereby bolstering RL training performance and promoting long-term strategic thinking. We emphasize the contextual understanding of language and demonstrate how decision-making in RL can benefit from aligning states' and actions' representation with languages' representation. Our method significantly outperforms current baselines as evidenced by evaluations conducted on Atari and OpenAI Gym environments. This contributes to advancing offline RL performance and efficiency while providing a novel perspective on offline RL.Our code and data are available at https://github.com/Zheng0428/MORE_.

View on arXiv PDF Code

Similar