Scope Loss for Imbalanced Classification and RL Exploration
This addresses performance degradation from over-exploitation and dataset imbalances for researchers and practitioners in reinforcement learning and classification, though it appears incremental as it builds on known connections between these problems.
The paper tackles the exploration-exploitation trade-off in reinforcement learning and dataset imbalance in supervised classification by demonstrating their equivalence and deriving a novel loss function called Scope Loss, which outperforms state-of-the-art methods on benchmark tasks without requiring tuning.
We demonstrate equivalence between the reinforcement learning problem and the supervised classification problem. We consequently equate the exploration exploitation trade-off in reinforcement learning to the dataset imbalance problem in supervised classification, and find similarities in how they are addressed. From our analysis of the aforementioned problems we derive a novel loss function for reinforcement learning and supervised classification. Scope Loss, our new loss function, adjusts gradients to prevent performance losses from over-exploitation and dataset imbalances, without the need for any tuning. We test Scope Loss against SOTA loss functions over a basket of benchmark reinforcement learning tasks and a skewed classification dataset, and show that Scope Loss outperforms other loss functions.