CVMLMay 22, 2017

Learning multiple visual domains with residual adapters

arXiv:1705.08045v51058 citations
AI Analysis

This addresses the need for versatile visual representations in computer vision, enabling efficient multi-domain learning without sacrificing performance, though it is incremental as it builds on prior work on parameter prediction networks.

The paper tackles the problem of learning a single visual representation that works across diverse image types, such as dog breeds, stop signs, and digits, by developing a tunable deep network architecture with adapter residual modules, achieving high parameter sharing while maintaining or improving accuracy compared to domain-specific representations.

There is a growing interest in learning data representations that work well for many different types of problems and data. In this paper, we look in particular at the task of learning a single visual representation that can be successfully utilized in the analysis of very different types of images, from dog breeds to stop signs and digits. Inspired by recent work on learning networks that predict the parameters of another, we develop a tunable deep network architecture that, by means of adapter residual modules, can be steered on the fly to diverse visual domains. Our method achieves a high degree of parameter sharing while maintaining or even improving the accuracy of domain-specific representations. We also introduce the Visual Decathlon Challenge, a benchmark that evaluates the ability of representations to capture simultaneously ten very different visual domains and measures their ability to recognize well uniformly.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes