Deep Innovation Protection: Confronting the Credit Assignment Problem in Training Heterogeneous Neural Architectures
This addresses a bottleneck in deep reinforcement learning for complex environments, though it appears incremental as it builds on prior genetic algorithm work.
The paper tackles the credit assignment problem in training heterogeneous neural architectures end-to-end, proposing Deep Innovation Protection (DIP) to enable successful training on complex 3D tasks where previous methods failed.
Deep reinforcement learning approaches have shown impressive results in a variety of different domains, however, more complex heterogeneous architectures such as world models require the different neural components to be trained separately instead of end-to-end. While a simple genetic algorithm recently showed end-to-end training is possible, it failed to solve a more complex 3D task. This paper presents a method called Deep Innovation Protection (DIP) that addresses the credit assignment problem in training complex heterogenous neural network models end-to-end for such environments. The main idea behind the approach is to employ multiobjective optimization to temporally reduce the selection pressure on specific components in multi-component network, allowing other components to adapt. We investigate the emergent representations of these evolved networks, which learn to predict properties important for the survival of the agent, without the need for a specific forward-prediction loss.