ML LGMar 4, 2020

Meta Cyclical Annealing Schedule: A Simple Approach to Avoiding Meta-Amortization Error

arXiv:2003.01889v1

Originality Incremental advance

AI Analysis

This addresses a critical challenge in few-shot learning for AI systems that need to learn from small datasets, though it appears to be an incremental improvement over existing Bayesian meta-learning methods.

The paper tackles the problem of task ambiguity in few-shot learning by addressing the information preference problem in Bayesian meta-learning models, where posterior distributions degenerate to single points. Their approach using cyclical annealing schedule and MMD criterion substantially outperforms standard meta-learning algorithms.

The ability to learn new concepts with small amounts of data is a crucial aspect of intelligence that has proven challenging for deep learning methods. Meta-learning for few-shot learning offers a potential solution to this problem: by learning to learn across data from many previous tasks, few-shot learning algorithms can discover the structure among tasks to enable fast learning of new tasks. However, a critical challenge in few-shot learning is task ambiguity: even when a powerful prior can be meta-learned from a large number of prior tasks, a small dataset for a new task can simply be very ambiguous to acquire a single model for that task. The Bayesian meta-learning models can naturally resolve this problem by putting a sophisticated prior distribution and let the posterior well regularized through Bayesian decision theory. However, currently known Bayesian meta-learning procedures such as VERSA suffer from the so-called {\it information preference problem}, that is, the posterior distribution is degenerated to one point and is far from the exact one. To address this challenge, we design a novel meta-regularization objective using {\it cyclical annealing schedule} and {\it maximum mean discrepancy} (MMD) criterion. The cyclical annealing schedule is quite effective at avoiding such degenerate solutions. This procedure includes a difficult KL-divergence estimation, but we resolve the issue by employing MMD instead of KL-divergence. The experimental results show that our approach substantially outperforms standard meta-learning algorithms.

View on arXiv PDF

Similar