CLLGApr 6, 2022

Sub-Task Decomposition Enables Learning in Sequence to Sequence Tasks

arXiv:2204.02892v452 citationsh-index: 62
Originality Incremental advance
AI Analysis

This provides a theoretical foundation for empirical methods in NLP, addressing a bottleneck in learning complex, multi-step problems.

The paper tackles the problem of unlearnable composite tasks in sequence-to-sequence models by proving that adding intermediate supervision to the input makes them learnable, specifically for tasks decomposable into a polynomial number of simple sub-tasks.

The field of Natural Language Processing has experienced a dramatic leap in capabilities with the recent introduction of huge Language Models. Despite this success, natural language problems that involve several compounded steps are still practically unlearnable, even by the largest LMs. This complies with experimental failures for end-to-end learning of composite problems that were demonstrated in a variety of domains. An effective mitigation is to introduce intermediate supervision for solving sub-tasks of the compounded problem. Recently, several works have demonstrated high gains by taking a straightforward approach for incorporating intermediate supervision in compounded natural language problems: the sequence-to-sequence LM is fed with an augmented input, in which the decomposed tasks' labels are simply concatenated to the original input. In this paper, we prove a positive learning result that motivates these recent efforts. We show that when concatenating intermediate supervision to the input and training a sequence-to-sequence model on this modified input, unlearnable composite problems can become learnable. We show that this is true for any family of tasks which on the one hand, are unlearnable, and on the other hand, can be decomposed into a polynomial number of simple sub-tasks, each of which depends only on O(1) previous sub-task results. Beyond motivating contemporary empirical efforts for incorporating intermediate supervision in sequence-to-sequence language models, our positive theoretical result is the first of its kind in the landscape of results on the benefits of intermediate supervision for neural-network learning: Until now, all theoretical results on the subject are negative, i.e., show cases where learning is impossible without intermediate supervision, while our result is positive, showing that learning is facilitated in the presence of intermediate supervision.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes