NEFeb 28, 2017

Improving the Neural GPU Architecture for Algorithm Learning

arXiv:1702.08727v243 citations
AI Analysis

This work addresses algorithm synthesis from examples, a core AI problem, but is incremental as it builds on the existing Neural GPU.

The authors tackled the problem of algorithm learning by improving the Neural GPU architecture, resulting in reduced training time, better generalization, and the first end-to-end learning of decimal multiplication.

Algorithm learning is a core problem in artificial intelligence with significant implications on automation level that can be achieved by machines. Recently deep learning methods are emerging for synthesizing an algorithm from its input-output examples, the most successful being the Neural GPU, capable of learning multiplication. We present several improvements to the Neural GPU that substantially reduces training time and improves generalization. We introduce a new technique - hard nonlinearities with saturation costs- that has general applicability. We also introduce a technique of diagonal gates that can be applied to active-memory models. The proposed architecture is the first capable of learning decimal multiplication end-to-end.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes