CVLGNov 11, 2019

Hierarchically Robust Representation Learning

arXiv:1911.04047v28 citations
AI Analysis

This work addresses the robustness of deep representations for transfer learning, but it appears incremental as it builds on existing distributionally robust optimization methods.

The paper tackles the problem of deep features being suboptimal when target task data distributions differ from the training set, proposing a hierarchically robust optimization method to learn more generic features, with experiments showing effectiveness on benchmark datasets.

With the tremendous success of deep learning in visual tasks, the representations extracted from intermediate layers of learned models, that is, deep features, attract much attention of researchers. Previous empirical analysis shows that those features can contain appropriate semantic information. Therefore, with a model trained on a large-scale benchmark data set (e.g., ImageNet), the extracted features can work well on other tasks. In this work, we investigate this phenomenon and demonstrate that deep features can be suboptimal due to the fact that they are learned by minimizing the empirical risk. When the data distribution of the target task is different from that of the benchmark data set, the performance of deep features can degrade. Hence, we propose a hierarchically robust optimization method to learn more generic features. Considering the example-level and concept-level robustness simultaneously, we formulate the problem as a distributionally robust optimization problem with Wasserstein ambiguity set constraints, and an efficient algorithm with the conventional training pipeline is proposed. Experiments on benchmark data sets demonstrate the effectiveness of the robust deep representations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes