LG DATA-ANOct 23, 2024

Dreaming Learning

Alessandro Londei, Matteo Benati, Denise Lanzieri, Vittorio Loreto

arXiv:2410.18156v24.61 citationsh-index: 51

Originality Incremental advance

AI Analysis

This addresses the challenge of handling non-stationary data in deep learning, which is incremental as it builds on existing methods for novelty integration.

The paper tackles the problem of incorporating novelties into deep learning systems, which can interfere with stored data, by proposing Dreaming Learning, a training algorithm inspired by the Adjacent Possible that explores new data spaces to enhance integration of non-stationary data, resulting in a ~29% improvement in auto-correlation for textual sequences and ~100% faster loss convergence for Markov chain paradigm shifts.

Incorporating novelties into deep learning systems remains a challenging problem. Introducing new information to a machine learning system can interfere with previously stored data and potentially alter the global model paradigm, especially when dealing with non-stationary sources. In such cases, traditional approaches based on validation error minimization offer limited advantages. To address this, we propose a training algorithm inspired by Stuart Kauffman's notion of the Adjacent Possible. This novel training methodology explores new data spaces during the learning phase. It predisposes the neural network to smoothly accept and integrate data sequences with different statistical characteristics than expected. The maximum distance compatible with such inclusion depends on a specific parameter: the sampling temperature used in the explorative phase of the present method. This algorithm, called Dreaming Learning, anticipates potential regime shifts over time, enhancing the neural network's responsiveness to non-stationary events that alter statistical properties. To assess the advantages of this approach, we apply this methodology to unexpected statistical changes in Markov chains and non-stationary dynamics in textual sequences. We demonstrated its ability to improve the auto-correlation of generated textual sequences by $\sim 29\%$ and enhance the velocity of loss convergence by $\sim 100\%$ in the case of a paradigm shift in Markov chains.

View on arXiv PDF

Similar