The streaming rollout of deep networks - towards fully model-parallel execution
This addresses the need for real-time control in autonomous agents by improving temporal integration and efficiency, though it is incremental as it builds on existing rollout methods.
The paper tackles the problem of slow response times in recurrent neural networks during inference by introducing a theoretical framework for different rollouts, proving that certain rollouts enable earlier and more frequent responses with better performance, and showing that the streaming rollout reduces runtime on parallel devices.
Deep neural networks, and in particular recurrent networks, are promising candidates to control autonomous agents that interact in real-time with the physical world. However, this requires a seamless integration of temporal features into the network's architecture. For the training of and inference with recurrent neural networks, they are usually rolled out over time, and different rollouts exist. Conventionally during inference, the layers of a network are computed in a sequential manner resulting in sparse temporal integration of information and long response times. In this study, we present a theoretical framework to describe rollouts, the level of model-parallelization they induce, and demonstrate differences in solving specific tasks. We prove that certain rollouts, also for networks with only skip and no recurrent connections, enable earlier and more frequent responses, and show empirically that these early responses have better performance. The streaming rollout maximizes these properties and enables a fully parallel execution of the network reducing runtime on massively parallel devices. Finally, we provide an open-source toolbox to design, train, evaluate, and interact with streaming rollouts.