AIDSCDHIST-PHApr 16, 2020

Random thoughts about Complexity, Data and Models

arXiv:2005.04729v11 citations
Originality Synthesis-oriented
AI Analysis

This is an incremental philosophical discussion for researchers in data science and machine learning, focusing on theoretical foundations rather than practical applications.

The paper argues that forecasting cannot be reduced to brute-force data analytics, emphasizing the importance of selecting relevant variables and clarifying the relationship between algorithmic complexity, compressibility, determinism, and predictability, using chaotic systems as an example where knowledge of rules does not guarantee prediction.

Data Science and Machine learning have been growing strong for the past decade. We argue that to make the most of this exciting field we should resist the temptation of assuming that forecasting can be reduced to brute-force data analytics. This owes to the fact that modelling, as we illustrate below, requires mastering the art of selecting relevant variables. More specifically, we investigate the subtle relation between "data and models" by focussing on the role played by algorithmic complexity, which contributed to making mathematically rigorous the long-standing idea that to understand empirical phenomena is to describe the rules which generate the data in terms which are "simpler" than the data itself. A key issue for the appraisal of the relation between algorithmic complexity and algorithmic learning is to do with a much needed clarification on the related but distinct concepts of compressibility, determinism and predictability. To this end we will illustrate that the evolution law of a chaotic system is compressibile, but a generic initial condition for it is not, making the time series generated by chaotic systems incompressible in general. Hence knowledge of the rules which govern an empirical phenomenon are not sufficient for predicting its outcomes. In turn this implies that there is more to understanding phenomena than learning -- even from data alone -- such rules. This can be achieved only in those cases when we are capable of "good modelling". Clearly, the very idea of algorithmic complexity rests on Turing's seminal analysis of computation. This motivates our remarks on this extremely telling example of analogy-based abstract modelling which is nonetheless heavily informed by empirical facts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes