LGAIJun 28, 2023

DCT: Dual Channel Training of Action Embeddings for Reinforcement Learning with Large Discrete Action Spaces

arXiv:2306.15913v12 citationsh-index: 18
Originality Incremental advance
AI Analysis

This addresses the problem of scalability and noise in reinforcement learning for applications like robotics or recommendation systems, but it is incremental as it builds on existing embedding and RL methods.

The paper tackles the challenge of learning robust policies in reinforcement learning with large discrete action spaces by introducing a dual-channel training framework for action embeddings that balances action reconstruction and state prediction. It outperforms baselines in a 2D maze with over 4000 actions and a real-world e-commerce task, resulting in cleaner embeddings and earlier policy convergence.

The ability to learn robust policies while generalizing over large discrete action spaces is an open challenge for intelligent systems, especially in noisy environments that face the curse of dimensionality. In this paper, we present a novel framework to efficiently learn action embeddings that simultaneously allow us to reconstruct the original action as well as to predict the expected future state. We describe an encoder-decoder architecture for action embeddings with a dual channel loss that balances between action reconstruction and state prediction accuracy. We use the trained decoder in conjunction with a standard reinforcement learning algorithm that produces actions in the embedding space. Our architecture is able to outperform two competitive baselines in two diverse environments: a 2D maze environment with more than 4000 discrete noisy actions, and a product recommendation task that uses real-world e-commerce transaction data. Empirical results show that the model results in cleaner action embeddings, and the improved representations help learn better policies with earlier convergence.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes