The Curse of Diversity in Ensemble-Based Exploration
This work highlights an unexpected pitfall in ensemble-based exploration for deep reinforcement learning, which is important for researchers and practitioners using such methods, though it is incremental as it builds on existing ensemble strategies.
The paper identifies that using a diverse ensemble of data-sharing agents for exploration in deep reinforcement learning can harm individual agent performance, a phenomenon termed the curse of diversity, and proposes Cross-Ensemble Representation Learning (CERL) to mitigate this issue, showing its effectiveness in discrete and continuous control domains.
We uncover a surprising phenomenon in deep reinforcement learning: training a diverse ensemble of data-sharing agents -- a well-established exploration strategy -- can significantly impair the performance of the individual ensemble members when compared to standard single-agent training. Through careful analysis, we attribute the degradation in performance to the low proportion of self-generated data in the shared training data for each ensemble member, as well as the inefficiency of the individual ensemble members to learn from such highly off-policy data. We thus name this phenomenon the curse of diversity. We find that several intuitive solutions -- such as a larger replay buffer or a smaller ensemble size -- either fail to consistently mitigate the performance loss or undermine the advantages of ensembling. Finally, we demonstrate the potential of representation learning to counteract the curse of diversity with a novel method named Cross-Ensemble Representation Learning (CERL) in both discrete and continuous control domains. Our work offers valuable insights into an unexpected pitfall in ensemble-based exploration and raises important caveats for future applications of similar approaches.