LGAIMLDec 9, 2016

Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks

arXiv:1612.02879v23 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of scaling learning systems for difficult tasks by reducing the need to relearn representations, though it is incremental as it builds on existing meta-gradient descent methods.

The paper tackles the problem of learning reusable feature representations in neural networks by introducing a new incremental learning algorithm called crossprop, which learns incoming weights of hidden units based on meta-gradient descent, and shows that crossprop learns and reuses feature representations across new tasks while backprop relearns them.

Representations are fundamental to artificial intelligence. The performance of a learning system depends on the type of representation used for representing the data. Typically, these representations are hand-engineered using domain knowledge. More recently, the trend is to learn these representations through stochastic gradient descent in multi-layer neural networks, which is called backprop. Learning the representations directly from the incoming data stream reduces the human labour involved in designing a learning system. More importantly, this allows in scaling of a learning system for difficult tasks. In this paper, we introduce a new incremental learning algorithm called crossprop, which learns incoming weights of hidden units based on the meta-gradient descent approach, that was previously introduced by Sutton (1992) and Schraudolph (1999) for learning step-sizes. The final update equation introduces an additional memory parameter for each of these weights and generalizes the backprop update equation. From our experiments, we show that crossprop learns and reuses its feature representation while tackling new and unseen tasks whereas backprop relearns a new feature representation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes