Information theory for data-driven model reduction in physics and biology
This work addresses the challenge of model reduction for researchers in physics and biology by providing a general method to identify predictive variables, though it builds on existing information bottleneck concepts.
The authors tackled the problem of identifying relevant variables for model reduction in many-body systems by developing an approach using the information bottleneck, which analytically relates these variables to eigenfunctions of the transfer operator and provides a foundation for interpreting deep learning tools. They demonstrated the method's effectiveness in chaotic systems, atmospheric flows, and cyanobacteria colonies, discovering an emergent synchronization order parameter in the latter.
Model reduction is the construction of simple yet predictive descriptions of the dynamics of many-body systems in terms of a few relevant variables. A prerequisite to model reduction is the identification of these variables, a task for which no general method exists. Here, we develop an approach to identify relevant variables, defined as those most predictive of the future, using the so-called information bottleneck. We elucidate analytically the relation between these relevant variables and the eigenfunctions of the transfer operator describing the dynamics. In the limit of high compression, the relevant variables are directly determined by the slowest-decaying eigenfunctions. Our results provide a firm foundation to interpret deep learning tools that automatically identify reduced variables. Combined with equation learning methods this procedure yields the hidden dynamical rules governing the system's evolution in a data-driven manner. We illustrate how these tools work in diverse settings including model chaotic and quasiperiodic systems in which we also learn the underlying dynamical equations, uncurated satellite recordings of atmospheric fluid flows, and experimental videos of cyanobacteria colonies in which we discover an emergent synchronization order parameter.