LGAIFeb 11, 2022

End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking

arXiv:2202.05826v328 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of logical reasoning in machine learning for AI systems, though it is incremental as it builds on existing recurrent approaches.

The paper tackled the problem of algorithmic extrapolation in recurrent networks, where models fail to scale to complex tasks due to 'overthinking' behavior, and proposed a recall architecture and progressive training that enabled solving extremely hard extrapolation tasks.

Machine learning systems perform well on pattern matching tasks, but their ability to perform algorithmic or logical reasoning is not well understood. One important reasoning capability is algorithmic extrapolation, in which models trained only on small/simple reasoning problems can synthesize complex strategies for large/complex problems at test time. Algorithmic extrapolation can be achieved through recurrent systems, which can be iterated many times to solve difficult reasoning problems. We observe that this approach fails to scale to highly complex problems because behavior degenerates when many iterations are applied -- an issue we refer to as "overthinking." We propose a recall architecture that keeps an explicit copy of the problem instance in memory so that it cannot be forgotten. We also employ a progressive training routine that prevents the model from learning behaviors that are specific to iteration number and instead pushes it to learn behaviors that can be repeated indefinitely. These innovations prevent the overthinking problem, and enable recurrent systems to solve extremely hard extrapolation tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes