Improve Long-term Memory Learning Through Rescaling the Error Temporally
This addresses a fundamental issue in sequence modeling for tasks requiring long-term dependencies, though it appears incremental as it modifies existing error metrics rather than introducing a new paradigm.
The paper tackles the bias towards short-term memory in sequence modeling by analyzing error metrics like mean absolute/squared error, proposing a temporally rescaled error to reduce this bias and alleviate vanishing gradients, with numerical experiments validating improved long-term memory learning.
This paper studies the error metric selection for long-term memory learning in sequence modelling. We examine the bias towards short-term memory in commonly used errors, including mean absolute/squared error. Our findings show that all temporally positive-weighted errors are biased towards short-term memory in learning linear functionals. To reduce this bias and improve long-term memory learning, we propose the use of a temporally rescaled error. In addition to reducing the bias towards short-term memory, this approach can also alleviate the vanishing gradient issue. We conduct numerical experiments on different long-memory tasks and sequence models to validate our claims. Numerical results confirm the importance of appropriate temporally rescaled error for effective long-term memory learning. To the best of our knowledge, this is the first work that quantitatively analyzes different errors' memory bias towards short-term memory in sequence modelling.