Self-Regulated Interactive Sequence-to-Sequence Learning
This addresses the challenge of efficiently using varied supervision signals in interactive learning, such as for neural machine translation, though it appears incremental as it builds on existing learning-to-learn and feedback methods.
The paper tackled the problem of optimizing cost-quality trade-offs in interactive sequence-to-sequence learning by developing a self-regulation strategy that decides when to ask for different types of feedback. The result showed that the self-regulator discovers an ε-greedy strategy for optimal trade-offs, demonstrating robustness under domain shift and serving as a promising alternative to active learning.
Not all types of supervision signals are created equal: Different types of feedback have different costs and effects on learning. We show how self-regulation strategies that decide when to ask for which kind of feedback from a teacher (or from oneself) can be cast as a learning-to-learn problem leading to improved cost-aware sequence-to-sequence learning. In experiments on interactive neural machine translation, we find that the self-regulator discovers an $ε$-greedy strategy for the optimal cost-quality trade-off by mixing different feedback types including corrections, error markups, and self-supervision. Furthermore, we demonstrate its robustness under domain shift and identify it as a promising alternative to active learning.