Multi-Task Learning by a Top-Down Control Network
This addresses the challenge of multi-task learning in vision systems, offering improved performance and scalability for applications requiring diverse tasks.
The paper tackles the problem of executing multiple vision tasks accurately and efficiently in a single network by introducing a top-down control network that modifies activations based on task, image content, and location, achieving significantly better results than state-of-the-art methods on four datasets.
As the range of tasks performed by a general vision system expands, executing multiple tasks accurately and efficiently in a single network has become an important and still open problem. Recent computer vision approaches address this problem by branching networks, or by a channel-wise modulation of the network feature-maps with task specific vectors. We present a novel architecture that uses a dedicated top-down control network to modify the activation of all the units in the main recognition network in a manner that depends on the selected task, image content, and spatial location. We show the effectiveness of our scheme by achieving significantly better results than alternative state-of-the-art approaches on four datasets. We further demonstrate our advantages in terms of task selectivity, scaling the number of tasks and interpretability.