James P. Crutchfield

STAT-MECH
h-index5
28papers
510citations
Novelty47%
AI Score41

28 Papers

STAT-MECHNov 23, 2023
On Principles of Emergent Organization

Adam T. Rupe, James P. Crutchfield

After more than a century of concerted effort, physics still lacks basic principles of spontaneous self-organization. To appreciate why, we first state the problem, outline historical approaches, and survey the present state of the physics of self-organization. This frames the particular challenges arising from mathematical intractability and the resulting need for computational approaches, as well as those arising from a chronic failure to define structure. Then, an overview of two modern mathematical formulations of organization -- intrinsic computation and evolution operators -- lays out a way to overcome these challenges. Together, the vantage point they afford shows how to account for the emergence of structured states via a statistical mechanics of systems arbitrarily far from equilibrium. The result is a constructive path forward to principles of organization that builds on mathematical identification of structure.

LGMar 25, 2023
Complexity-calibrated Benchmarks for Machine Learning Reveal When Next-Generation Reservoir Computer Predictions Succeed and Mislead

Sarah E. Marzen, Paul M. Riechers, James P. Crutchfield

Recurrent neural networks are used to forecast time series in finance, climate, language, and from many other domains. Reservoir computers are a particularly easily trainable form of recurrent neural network. Recently, a "next-generation" reservoir computer was introduced in which the memory trace involves only a finite number of previous symbols. We explore the inherent limitations of finite-past memory traces in this intriguing proposal. A lower bound from Fano's inequality shows that, on highly non-Markovian processes generated by large probabilistic state machines, next-generation reservoir computers with reasonably long memory traces have an error probability that is at least ~ 60% higher than the minimal attainable error probability in predicting the next observation. More generally, it appears that popular recurrent neural networks fall far short of optimally predicting such complex processes. These results highlight the need for a new generation of optimized recurrent neural network architectures. Alongside this finding, we present concentration-of-measure results for randomly-generated but complex processes. One conclusion is that large probabilistic state machines -- specifically, large $ε$-machines -- are key to generating challenging and structurally-unbiased stimuli for ground-truthing recurrent neural network architectures.

STAT-MECHJun 9, 2022
Exploring Predictive States via Cantor Embeddings and Wasserstein Distance

Samuel P. Loomis, James P. Crutchfield

Predictive states for stochastic processes are a nonparametric and interpretable construct with relevance across a multitude of modeling paradigms. Recent progress on the self-supervised reconstruction of predictive states from time-series data focused on the use of reproducing kernel Hilbert spaces. Here, we examine how Wasserstein distances may be used to detect predictive equivalences in symbolic data. We compute Wasserstein distances between distributions over sequences ("predictions"), using a finite-dimensional embedding of sequences based on the Cantor for the underlying geometry. We show that exploratory data analysis using the resulting geometry via hierarchical clustering and dimension reduction provides insight into the temporal structure of processes ranging from the relatively simple (e.g., finite-state hidden Markov models) to the very complex (e.g., infinite-state indexed grammars).

COMP-PHApr 25, 2023
Unsupervised Discovery of Extreme Weather Events Using Universal Representations of Emergent Organization

Adam Rupe, Karthik Kashinath, Nalini Kumar et al.

Spontaneous self-organization is ubiquitous in systems far from thermodynamic equilibrium. While organized structures that emerge dominate transport properties, universal representations that identify and describe these key objects remain elusive. Here, we introduce a theoretically-grounded framework for describing emergent organization that, via data-driven algorithms, is constructive in practice. Its building blocks are spacetime lightcones that embody how information propagates across a system through local interactions. We show that predictive equivalence classes of lightcones -- local causal states -- capture organized behaviors and coherent structures in complex spatiotemporal systems. Employing an unsupervised physics-informed machine learning algorithm and a high-performance computing implementation, we demonstrate automatically discovering coherent structures in two real world domain science problems. We show that local causal states identify vortices and track their power-law decay behavior in two-dimensional fluid turbulence. We then show how to detect and track familiar extreme weather events -- hurricanes and atmospheric rivers -- and discover other novel coherent structures associated with precipitation extremes in high-resolution climate data at the grid-cell level.

COMP-PHSep 25, 2019Code
DisCo: Physics-Based Unsupervised Discovery of Coherent Structures in Spatiotemporal Systems

Adam Rupe, Nalini Kumar, Vladislav Epifanov et al.

Extracting actionable insight from complex unlabeled scientific data is an open challenge and key to unlocking data-driven discovery in science. Complementary and alternative to supervised machine learning approaches, unsupervised physics-based methods based on behavior-driven theories hold great promise. Due to computational limitations, practical application on real-world domain science problems has lagged far behind theoretical development. We present our first step towards bridging this divide - DisCo - a high-performance distributed workflow for the behavior-driven local causal state theory. DisCo provides a scalable unsupervised physics-based representation learning method that decomposes spatiotemporal systems into their structurally relevant components, which are captured by the latent local causal state variables. Complex spatiotemporal systems are generally highly structured and organize around a lower-dimensional skeleton of coherent structures, and in several firsts we demonstrate the efficacy of DisCo in capturing such structures from observational and simulated scientific data. To the best of our knowledge, DisCo is also the first application software developed entirely in Python to scale to over 1000 machine nodes, providing good performance along with ensuring domain scientists' productivity. We developed scalable, performant methods optimized for Intel many-core processors that will be upstreamed to open-source Python library packages. Our capstone experiment, using newly developed DisCo workflow and libraries, performs unsupervised spacetime segmentation analysis of CAM5.1 climate simulation data, processing an unprecedented 89.5 TB in 6.6 minutes end-to-end using 1024 Intel Haswell nodes on the Cori supercomputer obtaining 91% weak-scaling and 64% strong-scaling efficiency.

STAT-MECHApr 26, 2025
Learning Stochastic Thermodynamics Directly from Correlation and Trajectory-Fluctuation Currents

Jinghao Lyu, Kyle J. Ray, James P. Crutchfield

Markedly increased computational power and data acquisition have led to growing interest in data-driven inverse dynamics problems. These seek to answer a fundamental question: What can we learn from time series measurements of a complex dynamical system? For small systems interacting with external environments, the effective dynamics are inherently stochastic, making it crucial to properly manage noise in data. Here, we explore this for systems obeying Langevin dynamics and, using currents, we construct a learning framework for stochastic modeling. Currents have recently gained increased attention for their role in bounding entropy production (EP) from thermodynamic uncertainty relations (TURs). We introduce a fundamental relationship between the cumulant currents there and standard machine-learning loss functions. Using this, we derive loss functions for several key thermodynamic functions directly from the system dynamics without the (common) intermediate step of deriving a TUR. These loss functions reproduce results derived both from TURs and other methods. More significantly, they open a path to discover new loss functions for previously inaccessible quantities. Notably, this includes access to per-trajectory entropy production, even if the observed system is driven far from its steady-state. We also consider higher order estimation. Our method is straightforward and unifies dynamic inference with recent approaches to entropy production estimation. Taken altogether, this reveals a deep connection between diffusion models in machine learning and entropy production estimation in stochastic thermodynamics.

STAT-MECHOct 4, 2025
Optimal Computation from Fluctuation Responses

Jinghao Lyu, Kyle J. Ray, James P. Crutchfield

The energy cost of computation has emerged as a central challenge at the intersection of physics and computer science. Recent advances in statistical physics -- particularly in stochastic thermodynamics -- enable precise characterizations of work, heat, and entropy production in information-processing systems driven far from equilibrium by time-dependent control protocols. A key open question is then how to design protocols that minimize thermodynamic cost while ensur- ing correct outcomes. To this end, we develop a unified framework to identify optimal protocols using fluctuation response relations (FRR) and machine learning. Unlike previous approaches that optimize either distributions or protocols separately, our method unifies both using FRR-derived gradients. Moreover, our method is based primarily on iteratively learning from sampled noisy trajectories, which is generally much easier than solving for the optimal protocol directly from a set of governing equations. We apply the framework to canonical examples -- bit erasure in a double-well potential and translating harmonic traps -- demonstrating how to construct loss functions that trade-off energy cost against task error. The framework extends trivially to underdamped systems, and we show this by optimizing a bit-flip in an underdamped system. In all computations we test, the framework achieves the theoretically optimal protocol or achieves work costs comparable to relevant finite time bounds. In short, the results provide principled strategies for designing thermodynamically efficient protocols in physical information-processing systems. Applications range from quantum gates robust under noise to energy-efficient control of chemical and synthetic biological networks.

STAT-MECHJul 10, 2025
Way More Than the Sum of Their Parts: From Statistical to Structural Mixtures

James P. Crutchfield

We show that mixtures comprised of multicomponent systems typically are much more structurally complex than the sum of their parts; sometimes, infinitely more complex. We contrast this with the more familiar notion of statistical mixtures, demonstrating how statistical mixtures miss key aspects of emergent hierarchical organization. This leads us to identify a new kind of structural complexity inherent in multicomponent systems and to draw out broad consequences for system ergodicity.

STAT-MECHSep 19, 2021
Topology, Convergence, and Reconstruction of Predictive States

Samuel P. Loomis, James P. Crutchfield

Predictive equivalence in discrete stochastic processes have been applied with great success to identify randomness and structure in statistical physics and chaotic dynamical systems and to inferring hidden Markov models. We examine the conditions under which they can be reliably reconstructed from time-series data, showing that convergence of predictive states can be achieved from empirical samples in the weak topology of measures. Moreover, predictive states may be represented in Hilbert spaces that replicate the weak topology. We mathematically explain how these representations are particularly beneficial when reconstructing high-memory processes and connect them to reproducing kernel Hilbert spaces.

LGNov 23, 2020
Discovering Causal Structure with Reproducing-Kernel Hilbert Space $ε$-Machines

Nicolas Brodu, James P. Crutchfield

We merge computational mechanics' definition of causal states (predictively-equivalent histories) with reproducing-kernel Hilbert space (RKHS) representation inference. The result is a widely-applicable method that infers causal structure directly from observations of a system's behaviors whether they are over discrete or continuous events or time. A structural representation -- a finite- or infinite-state kernel $ε$-machine -- is extracted by a reduced-dimension transform that gives an efficient representation of causal states and their topology. In this way, the system dynamics are represented by a stochastic (ordinary or partial) differential equation that acts on causal states. We introduce an algorithm to estimate the associated evolution operator. Paralleling the Fokker-Plank equation, it efficiently evolves causal-state distributions and makes predictions in the original data space via an RKHS functional mapping. We demonstrate these techniques, together with their predictive abilities, on discrete-time, discrete-value infinite Markov-order processes generated by finite-state hidden Markov models with (i) finite or (ii) uncountably-infinite causal states and (iii) continuous-time, continuous-value processes generated by thermally-driven chaotic flows. The method robustly estimates causal structure in the presence of varying external and measurement noise levels and for very high dimensional data.

LGOct 12, 2020
Spacetime Autoencoders Using Local Causal States

Adam Rupe, James P. Crutchfield

Local causal states are latent representations that capture organized pattern and structure in complex spatiotemporal systems. We expand their functionality, framing them as spacetime autoencoders. Previously, they were only considered as maps from observable spacetime fields to latent local causal state fields. Here, we show that there is a stochastic decoding that maps back from the latent fields to observable fields. Furthermore, their Markovian properties define a stochastic dynamic in the latent space. Combined with stochastic decoding, this gives a new method for forecasting spacetime fields.

CDAug 29, 2020
Shannon Entropy Rate of Hidden Markov Processes

Alexandra M. Jurgens, James P. Crutchfield

Hidden Markov chains are widely applied statistical models of stochastic processes, from fundamental physics and chemistry to finance, health, and artificial intelligence. The hidden Markov processes they generate are notoriously complicated, however, even if the chain is finite state: no finite expression for their Shannon entropy rate exists, as the set of their predictive features is generically infinite. As such, to date one cannot make general statements about how random they are nor how structured. Here, we address the first part of this challenge by showing how to efficiently and accurately calculate their entropy rates. We also show how this method gives the minimal set of infinite predictive features. A sequel addresses the challenge's second part on structure.

COMP-PHSep 16, 2019
Towards Unsupervised Segmentation of Extreme Weather Events

Adam Rupe, Karthik Kashinath, Nalini Kumar et al.

Extreme weather is one of the main mechanisms through which climate change will directly impact human society. Coping with such change as a global community requires markedly improved understanding of how global warming drives extreme weather events. While alternative climate scenarios can be simulated using sophisticated models, identifying extreme weather events in these simulations requires automation due to the vast amounts of complex high-dimensional data produced. Atmospheric dynamics, and hydrodynamic flows more generally, are highly structured and largely organize around a lower dimensional skeleton of coherent structures. Indeed, extreme weather events are a special case of more general hydrodynamic coherent structures. We present a scalable physics-based representation learning method that decomposes spatiotemporal systems into their structurally relevant components, which are captured by latent variables known as local causal states. For complex fluid flows we show our method is capable of capturing known coherent structures, and with promising segmentation results on CAM5.1 water vapor data we outline the path to extreme weather identification from unlabeled climate model simulation data.

ITAug 26, 2018
A Perspective on Unique Information: Directionality, Intuitions, and Secret Key Agreement

Ryan G. James, Jeffrey Emenheiser, James P. Crutchfield

Recently, the partial information decomposition emerged as a promising framework for identifying the meaningful components of the information contained in a joint distribution. Its adoption and practical application, however, have been stymied by the lack of a generally-accepted method of quantifying its components. Here, we briefly discuss the bivariate (two-source) partial information decomposition and two implicitly directional interpretations used to intuitively motivate alternative component definitions. Drawing parallels with secret key agreement rates from information-theoretic cryptography, we demonstrate that these intuitions are mutually incompatible and suggest that this underlies the persistence of competing definitions and interpretations. Having highlighted this hitherto unacknowledged issue, we outline several possible solutions.

STAT-MECHAug 21, 2018
Modes of Information Flow

Ryan G. James, Blanca Daniella Mansante Ayala, Bahti Zakirov et al.

Information flow between components of a system takes many forms and is key to understanding the organization and functioning of large-scale, complex systems. We demonstrate three modalities of information flow from time series X to time series Y. Intrinsic information flow exists when the past of X is individually predictive of the present of Y, independent of Y's past; this is most commonly considered information flow. Shared information flow exists when X's past is predictive of Y's present in the same manner as Y's past; this occurs due to synchronization or common driving, for example. Finally, synergistic information flow occurs when neither X's nor Y's pasts are predictive of Y's present on their own, but taken together they are. The two most broadly-employed information-theoretic methods of quantifying information flow---time-delayed mutual information and transfer entropy---are both sensitive to a pair of these modalities: time-delayed mutual information to both intrinsic and shared flow, and transfer entropy to both intrinsic and synergistic flow. To quantify each mode individually we introduce our cryptographic flow ansatz, positing that intrinsic flow is synonymous with secret key agreement between X and Y. Based on this, we employ an easily-computed secret-key-agreement bound---intrinsic mutual information&mdashto quantify the three flow modalities in a variety of systems including asymmetric flows and financial markets.

STAT-MECHJan 1, 2018
Local Causal States and Discrete Coherent Structures

Adam Rupe, James P. Crutchfield

Coherent structures form spontaneously in nonlinear spatiotemporal systems and are found at all spatial scales in natural phenomena from laboratory hydrodynamic flows and chemical reactions to ocean, atmosphere, and planetary climate dynamics. Phenomenologically, they appear as key components that organize the macroscopic behaviors in such systems. Despite a century of effort, they have eluded rigorous analysis and empirical prediction, with progress being made only recently. As a step in this, we present a formal theory of coherent structures in fully-discrete dynamical field theories. It builds on the notion of structure introduced by computational mechanics, generalizing it to a local spatiotemporal setting. The analysis' main tool employs the \localstates, which are used to uncover a system's hidden spatiotemporal symmetries and which identify coherent structures as spatially-localized deviations from those symmetries. The approach is behavior-driven in the sense that it does not rely on directly analyzing spatiotemporal equations of motion, rather it considers only the spatiotemporal fields a system generates. As such, it offers an unsupervised approach to discover and describe coherent structures. We illustrate the approach by analyzing coherent structures generated by elementary cellular automata, comparing the results with an earlier, dynamic-invariant-set approach that decomposes fields into domains, particles, and particle interactions.

STAT-MECHOct 18, 2017
The Origins of Computational Mechanics: A Brief Intellectual History and Several Clarifications

James P. Crutchfield

The principle goal of computational mechanics is to define pattern and structure so that the organization of complex systems can be detected and quantified. Computational mechanics developed from efforts in the 1970s and early 1980s to identify strange attractors as the mechanism driving weak fluid turbulence via the method of reconstructing attractor geometry from measurement time series and in the mid-1980s to estimate equations of motion directly from complex time series. In providing a mathematical and operational definition of structure it addressed weaknesses of these early approaches to discovering patterns in natural systems. Since then, computational mechanics has led to a range of results from theoretical physics and nonlinear mathematics to diverse applications---from closed-form analysis of Markov and non-Markov stochastic processes that are ergodic or nonergodic and their measures of information and intrinsic computation to complex materials and deterministic chaos and intelligence in Maxwellian demons to quantum compression of classical processes and the evolution of computation and language. This brief review clarifies several misunderstandings and addresses concerns recently raised regarding early works in the field (1980s). We show that misguided evaluations of the contributions of computational mechanics are groundless and stem from a lack of familiarity with its basic goals and from a failure to consider its historical context. For all practical purposes, its modern methods and results largely supersede the early works. This not only renders recent criticism moot and shows the solid ground on which computational mechanics stands but, most importantly, shows the significant progress achieved over three decades and points to the many intriguing and outstanding challenges in understanding the computational nature of complex dynamic systems.

STAT-MECHSep 19, 2017
Unique Information via Dependency Constraints

Ryan G. James, Jeffrey Emenheiser, James P. Crutchfield

The partial information decomposition (PID) is perhaps the leading proposal for resolving information shared between a set of sources and a target into redundant, synergistic, and unique constituents. Unfortunately, the PID framework has been hindered by a lack of a generally agreed-upon, multivariate method of quantifying the constituents. Here, we take a step toward rectifying this by developing a decomposition based on a new method that quantifies unique information. We first develop a broadly applicable method---the dependency decomposition---that delineates how statistical dependencies influence the structure of a joint distribution. The dependency decomposition then allows us to define a measure of the information about a target that can be uniquely attributed to a particular source as the least amount which the source-target statistical dependency can influence the information shared between the sources and the target. The result is the first measure that satisfies the core axioms of the PID framework while not satisfying the Blackwell relation, which depends on a particular interpretation of how the variables are related. This makes a key step forward to a practical PID.

STAT-MECHFeb 27, 2017
Nearly Maximally Predictive Features and Their Dimensions

Sarah E. Marzen, James P. Crutchfield

Scientific explanation often requires inferring maximally predictive features from a given data set. Unfortunately, the collection of minimal maximally predictive features for most stochastic processes is uncountably infinite. In such cases, one compromises and instead seeks nearly maximally predictive features. Here, we derive upper-bounds on the rates at which the number and the coding cost of nearly maximally predictive features scales with desired predictive power. The rates are determined by the fractal dimensions of a process' mixed-state distribution. These results, in turn, show how widely-used finite-order Markov models can fail as predictors and that mixed-state predictive features offer a substantial improvement.

STAT-MECHFeb 7, 2017
Trimming the Independent Fat: Sufficient Statistics, Mutual Information, and Predictability from Effective Channel States

Ryan G. James, John R. Mahoney, James P. Crutchfield

One of the most fundamental questions one can ask about a pair of random variables X and Y is the value of their mutual information. Unfortunately, this task is often stymied by the extremely large dimension of the variables. We might hope to replace each variable by a lower-dimensional representation that preserves the relationship with the other variable. The theoretically ideal implementation is the use of minimal sufficient statistics, where it is well-known that either X or Y can be replaced by their minimal sufficient statistic about the other while preserving the mutual information. While intuitively reasonable, it is not obvious or straightforward that both variables can be replaced simultaneously. We demonstrate that this is in fact possible: the information X's minimal sufficient statistic preserves about Y is exactly the information that Y's minimal sufficient statistic preserves about X. As an important corollary, we consider the case where one variable is a stochastic process' past and the other its future and the present is viewed as a memoryful channel. In this case, the mutual information is the channel transmission rate between the channel's effective states. That is, the past-future mutual information (the excess entropy) is the amount of information about the future that can be predicted using the past. Translating our result about minimal sufficient statistics, this is equivalent to the mutual information between the forward- and reverse-time causal states of computational mechanics. We close by discussing multivariate extensions to this use of minimal sufficient statistics.

STAT-MECHSep 17, 2016
Leveraging Environmental Correlations: The Thermodynamics of Requisite Variety

Alexander B. Boyd, Dibyendu Mandal, James P. Crutchfield

Key to biological success, the requisite variety that confronts an adaptive organism is the set of detectable, accessible, and controllable states in its environment. We analyze its role in the thermodynamic functioning of information ratchets---a form of autonomous Maxwellian Demon capable of exploiting fluctuations in an external information reservoir to harvest useful work from a thermal bath. This establishes a quantitative paradigm for understanding how adaptive agents leverage structured thermal environments for their own thermodynamic benefit. General ratchets behave as memoryful communication channels, interacting with their environment sequentially and storing results to an output. The bulk of thermal ratchets analyzed to date, however, assume memoryless environments that generate input signals without temporal correlations. Employing computational mechanics and a new information-processing Second Law of Thermodynamics (IPSL) we remove these restrictions, analyzing general finite-state ratchets interacting with structured environments that generate correlated input signals. On the one hand, we demonstrate that a ratchet need not have memory to exploit an uncorrelated environment. On the other, and more appropriate to biological adaptation, we show that a ratchet must have memory to most effectively leverage structure and correlation in its environment. The lesson is that to optimally harvest work a ratchet's memory must reflect the input generator's memory. Finally, we investigate achieving the IPSL bounds on the amount of work a ratchet can extract from its environment, discovering that finite-state, optimal ratchets are unable to reach these bounds. In contrast, we show that infinite-state ratchets can go well beyond these bounds by utilizing their own infinite "negentropy". We conclude with an outline of the collective thermodynamics of information-ratchet swarms.

ITSep 5, 2016
Multivariate Dependence Beyond Shannon Information

Ryan G. James, James P. Crutchfield

Accurately determining dependency structure is critical to discovering a system's causal organization. We recently showed that the transfer entropy fails in a key aspect of this---measuring information flow---due to its conflation of dyadic and polyadic relationships. We extend this observation to demonstrate that this is true of all such Shannon information measures when used to analyze multivariate dependencies. This has broad implications, particularly when employing information to express the organization and mechanisms embedded in complex systems, including the burgeoning efforts to combine complex network theory with information theory. Here, we do not suggest that any aspect of information theory is wrong. Rather, the vast majority of its informational measures are simply inadequate for determining the meaningful dependency structure within joint probability distributions. Therefore, such information measures are inadequate for discovering intrinsic causal relations. We close by demonstrating that such distributions exist across an arbitrary set of variables.

STAT-MECHJul 2, 2015
The Elusive Present: Hidden Past and Future Dependency and Why We Build Models

Pooneh M. Ara, Ryan G. James, James P. Crutchfield

Modeling a temporal process as if it is Markovian assumes the present encodes all of the process's history. When this occurs, the present captures all of the dependency between past and future. We recently showed that if one randomly samples in the space of structured processes, this is almost never the case. So, how does the Markov failure come about? That is, how do individual measurements fail to encode the past? And, how many are needed to capture dependencies between the past and future? Here, we investigate how much information can be shared between the past and future, but not be reflected in the present. We quantify this elusive information, give explicit calculational methods, and draw out the consequences. The most important of which is that when the present hides past-future dependency we must move beyond sequence-based statistics and build state-based models.

NCApr 18, 2015
Time Resolution Dependence of Information Measures for Spiking Neurons: Atoms, Scaling, and Universality

Sarah E. Marzen, Michael R. DeWeese, James P. Crutchfield

The mutual information between stimulus and spike-train response is commonly used to monitor neural coding efficiency, but neuronal computation broadly conceived requires more refined and targeted information measures of input-output joint processes. A first step towards that larger goal is to develop information measures for individual output processes, including information generation (entropy rate), stored information (statistical complexity), predictable information (excess entropy), and active information accumulation (bound information rate). We calculate these for spike trains generated by a variety of noise-driven integrate-and-fire neurons as a function of time resolution and for alternating renewal processes. We show that their time-resolution dependence reveals coarse-grained structural properties of interspike interval statistics; e.g., $τ$-entropy rates that diverge less quickly than the firing rate indicate interspike interval correlations. We also find evidence that the excess entropy and regularized statistical complexity of different types of integrate-and-fire neurons are universal in the continuous-time limit in the sense that they do not depend on mechanism details. This suggests a surprising simplicity in the spike trains generated by these model neurons. Interestingly, neurons with gamma-distributed ISIs and neurons whose spike trains are alternating renewal processes do not fall into the same universality class. These results lead to two conclusions. First, the dependence of information measures on time resolution reveals mechanistic details about spike train generation. Second, information measures can be used as model selection tools for analyzing spike train processes.

STAT-MECHApr 1, 2015
Signatures of Infinity: Nonergodicity and Resource Scaling in Prediction, Complexity, and Learning

James P. Crutchfield, Sarah Marzen

We introduce a simple analysis of the structural complexity of infinite-memory processes built from random samples of stationary, ergodic finite-memory component processes. Such processes are familiar from the well known multi-arm Bandit problem. We contrast our analysis with computation-theoretic and statistical inference approaches to understanding their complexity. The result is an alternative view of the relationship between predictability, complexity, and learning that highlights the distinct ways in which informational and correlational divergences arise in complex ergodic and nonergodic processes. We draw out consequences for the resource divergences that delineate the structural hierarchy of ergodic processes and for processes that are themselves hierarchical.

STAT-MECHDec 30, 2014
Understanding and Designing Complex Systems: Response to "A framework for optimal high-level descriptions in science and engineering---preliminary report"

James P. Crutchfield, Ryan G. James, Sarah Marzen et al.

We recount recent history behind building compact models of nonlinear, complex processes and identifying their relevant macroscopic patterns or "macrostates". We give a synopsis of computational mechanics, predictive rate-distortion theory, and the role of information measures in monitoring model complexity and predictive performance. Computational mechanics provides a method to extract the optimal minimal predictive model for a given process. Rate-distortion theory provides methods for systematically approximating such models. We end by commenting on future prospects for developing a general framework that automatically discovers optimal compact models. As a response to the manuscript cited in the title above, this brief commentary corrects potentially misleading claims about its state space compression method and places it in a broader historical setting.

STAT-MECHDec 9, 2014
Circumventing the Curse of Dimensionality in Prediction: Causal Rate-Distortion for Infinite-Order Markov Processes

Sarah Marzen, James P. Crutchfield

Predictive rate-distortion analysis suffers from the curse of dimensionality: clustering arbitrarily long pasts to retain information about arbitrarily long futures requires resources that typically grow exponentially with length. The challenge is compounded for infinite-order Markov processes, since conditioning on finite sequences cannot capture all of their past dependencies. Spectral arguments show that algorithms which cluster finite-length sequences fail dramatically when the underlying process has long-range temporal correlations and can fail even for processes generated by finite-memory hidden Markov models. We circumvent the curse of dimensionality in rate-distortion analysis of infinite-order processes by casting predictive rate-distortion objective functions in terms of the forward- and reverse-time causal states of computational mechanics. Examples demonstrate that the resulting causal rate-distortion theory substantially improves current predictive rate-distortion analyses.

MLSep 5, 2013
Bayesian Structural Inference for Hidden Processes

Christopher C. Strelioff, James P. Crutchfield

We introduce a Bayesian approach to discovering patterns in structurally complex processes. The proposed method of Bayesian Structural Inference (BSI) relies on a set of candidate unifilar HMM (uHMM) topologies for inference of process structure from a data series. We employ a recently developed exact enumeration of topological epsilon-machines. (A sequel then removes the topological restriction.) This subset of the uHMM topologies has the added benefit that inferred models are guaranteed to be epsilon-machines, irrespective of estimated transition probabilities. Properties of epsilon-machines and uHMMs allow for the derivation of analytic expressions for estimating transition probabilities, inferring start states, and comparing the posterior probability of candidate model topologies, despite process internal structure being only indirectly present in data. We demonstrate BSI's effectiveness in estimating a process's randomness, as reflected by the Shannon entropy rate, and its structure, as quantified by the statistical complexity. We also compare using the posterior distribution over candidate models and the single, maximum a posteriori model for point estimation and show that the former more accurately reflects uncertainty in estimated values. We apply BSI to in-class examples of finite- and infinite-order Markov processes, as well to an out-of-class, infinite-state hidden process.