LG MLJan 13, 2021

MC-LSTM: Mass-Conserving LSTM

Pieter-Jan Hoedt, Frederik Kratzert, Daniel Klotz, Christina Halmich, Markus Holzleitner, Grey Nearing, Sepp Hochreiter, Günter Klambauer

arXiv:2101.05186v311.376 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of incorporating conservation laws into neural networks for domains like physics and hydrology, offering an incremental improvement by extending LSTM's inductive bias.

The authors tackled the problem of modeling systems governed by conservation laws, such as physical and economic systems, by developing a Mass-Conserving LSTM (MC-LSTM) that extends LSTM's inductive bias to adhere to these laws. The result includes setting a new state-of-the-art for neural arithmetic units in learning addition tasks and for predicting peak flows in hydrology, with interpretable states correlating with real-world processes.

The success of Convolutional Neural Networks (CNNs) in computer vision is mainly driven by their strong inductive bias, which is strong enough to allow CNNs to solve vision-related tasks with random weights, meaning without learning. Similarly, Long Short-Term Memory (LSTM) has a strong inductive bias towards storing information over time. However, many real-world systems are governed by conservation laws, which lead to the redistribution of particular quantities -- e.g. in physical and economical systems. Our novel Mass-Conserving LSTM (MC-LSTM) adheres to these conservation laws by extending the inductive bias of LSTM to model the redistribution of those stored quantities. MC-LSTMs set a new state-of-the-art for neural arithmetic units at learning arithmetic operations, such as addition tasks, which have a strong conservation law, as the sum is constant over time. Further, MC-LSTM is applied to traffic forecasting, modelling a pendulum, and a large benchmark dataset in hydrology, where it sets a new state-of-the-art for predicting peak flows. In the hydrology example, we show that MC-LSTM states correlate with real-world processes and are therefore interpretable.

View on arXiv PDF Code

Similar