Artemy Kolchinsky

IT
h-index20
13papers
692citations
Novelty44%
AI Score42

13 Papers

AOJan 16, 2015
Prediction and Modularity in Dynamical Systems

Artemy Kolchinsky, Luis M. Rocha

Identifying and understanding modular organizations is centrally important in the study of complex systems. Several approaches to this problem have been advanced, many framed in information-theoretic terms. Our treatment starts from the complementary point of view of statistical modeling and prediction of dynamical systems. It is known that for finite amounts of training data, simpler models can have greater predictive power than more complex ones. We use the trade-off between model simplicity and predictive accuracy to generate optimal multiscale decompositions of dynamical networks into weakly-coupled, simple modules. State-dependent and causal versions of our method are also proposed.

STAT-MECHDec 20, 2025
Generalized free energy and excess/housekeeping decomposition in nonequilibrium systems: from large deviations to thermodynamic speed limits

Artemy Kolchinsky, Andreas Dechant, Kohei Yoshimura et al.

In genuine nonequilibrium systems that undergo continuous driving, the thermodynamic forces are nonconservative, meaning they cannot be described by any free energy potential. Nonetheless, we show that the dynamics of such systems are governed by a "generalized free energy" that is derived from a large-deviations variational principle. This variational principle also yields a decomposition of fluxes, forces, and dissipation (entropy production) into a conservative "excess" part and a nonconservative "housekeeping" part. Our decomposition is universally applicable to stochastic master equations, deterministic chemical reaction networks, and open systems. We also show that the excess entropy production obeys a thermodynamic speed limit (TSL), a fundamental thermodynamic constraint on the rate of state evolution and/or external fluxes. We demonstrate our approach on several examples, including real-world metabolic networks, where we derive fundamental dissipation bounds and uncover "futile" metabolic cycles. Our generalized free energy and decomposition are empirically accessible to thermodynamic inference in both stochastic and deterministic systems. We discuss important connections to several theoretical frameworks, including information geometry and Onsager theory, as well as previous excess/housekeeping decompositions.

HEP-THFeb 5
Towards Worst-Case Guarantees with Scale-Aware Interpretability

Lauren Greenspan, David Berman, Aryeh Brill et al.

Neural networks organize information according to the hierarchical, multi-scale structure of natural data. Methods to interpret model internals should be similarly scale-aware, explicitly tracking how features compose across resolutions and guaranteeing bounds on the influence of fine-grained structure that is discarded as irrelevant noise. We posit that the renormalisation framework from physics can meet this need by offering technical tools that can overcome limitations of current methods. Moreover, relevant work from adjacent fields has now matured to a point where scattered research threads can be synthesized into practical, theory-informed tools. To combine these threads in an AI safety context, we propose a unifying research agenda -- \emph{scale-aware interpretability} -- to develop formal machinery and interpretability tools that have robustness and faithfulness properties supported by statistical physics.

STAT-MECHMay 15, 2025
Inferring entropy production in many-body systems using nonequilibrium MaxEnt

Miguel Aguilera, Sosuke Ito, Artemy Kolchinsky

We propose a method for inferring entropy production (EP) in high-dimensional stochastic systems, including many-body systems and non-Markovian systems with long memory. Standard techniques for estimating EP become intractable in such systems due to computational and statistical limitations. We infer trajectory-level EP and lower bounds on average EP by exploiting a nonequilibrium analogue of the Maximum Entropy principle, along with convex duality. Our approach uses only samples of trajectory observables, such as spatiotemporal correlations. It does not require reconstruction of high-dimensional probability distributions or rate matrices, nor impose any special assumptions such as discrete states or multipartite dynamics. In addition, it may be used to compute a hierarchical decomposition of EP, reflecting contributions from different interaction orders, and it has an intuitive physical interpretation as a "thermodynamic uncertainty relation." We demonstrate its numerical performance on a disordered nonequilibrium spin model with 1000 spins and a large neural spike-train dataset.

NEAug 31, 2020
The Computational Capacity of LRC, Memristive and Hybrid Reservoirs

Forrest C. Sheldon, Artemy Kolchinsky, Francesco Caravelli

Reservoir computing is a machine learning paradigm that uses a high-dimensional dynamical system, or \emph{reservoir}, to approximate and predict time series data. The scale, speed and power usage of reservoir computers could be enhanced by constructing reservoirs out of electronic circuits, and several experimental studies have demonstrated promise in this direction. However, designing quality reservoirs requires a precise understanding of how such circuits process and store information. We analyze the feasibility and optimal design of electronic reservoirs that include both linear elements (resistors, inductors, and capacitors) and nonlinear memory elements called memristors. We provide analytic results regarding the feasibility of these reservoirs, and give a systematic characterization of their computational properties by examining the types of input-output relationships that they can approximate. This allows us to design reservoirs with optimal properties. By introducing measures of the total linear and nonlinear computational capacities of the reservoir, we are able to design electronic circuits whose total computational capacity scales extensively with the system size. Our electronic reservoirs can match or exceed the performance of conventional "echo state network" reservoirs in a form that may be directly implemented in hardware.

ITAug 23, 2019
A Novel Approach to the Partial Information Decomposition

Artemy Kolchinsky

We consider the "partial information decomposition" (PID) problem, which aims to decompose the information that a set of source random variables provide about a target random variable into separate redundant, synergistic, union, and unique components. In the first part of this paper, we propose a general framework for constructing a multivariate PID. Our framework is defined in terms of a formal analogy with intersection and union from set theory, along with an ordering relation which specifies when one information source is more informative than another. Our definitions are algebraically and axiomatically motivated, and can be generalized to domains beyond Shannon information theory (such as algorithmic information theory and quantum information theory). In the second part of this paper, we use our general framework to define a PID in terms of the well-known Blackwell order, which has a fundamental operational interpretation. We demonstrate our approach on numerous examples and show that it overcomes many drawbacks associated with previous proposals.

ITMar 21, 2019
Decomposing information into copying versus transformation

Artemy Kolchinsky, Bernat Corominas-Murtra

In many real-world systems, information can be transmitted in two qualitatively different ways: by copying or by transformation. Copying occurs when messages are transmitted without modification, e.g., when an offspring receives an unaltered copy of a gene from its parent. Transformation occurs when messages are modified systematically during transmission, e.g., when mutational biases occur during genetic replication. Standard information-theoretic measures do not distinguish these two modes of information transfer, although they may reflect different mechanisms and have different functional consequences. Starting from a few simple axioms, we derive a decomposition of mutual information into the information transmitted by copying versus the information transmitted by transformation. We begin with a decomposition that applies when the source and destination of the channel have the same set of messages and a notion of message identity exists. We then generalize our decomposition to other kinds of channels, which can involve different source and destination sets and broader notions of similarity. In addition, we show that copy information can be interpreted as the minimal work needed by a physical copying process, which is relevant for understanding the physics of replication. We use the proposed decomposition to explore a model of amino acid substitution rates. Our results apply to any system in which the fidelity of copying, rather than simple predictability, is of critical relevance.

MLAug 23, 2018
Caveats for information bottleneck in deterministic scenarios

Artemy Kolchinsky, Brendan D. Tracey, Steven Van Kuyk

Information bottleneck (IB) is a method for extracting information from one random variable $X$ that is relevant for predicting another random variable $Y$. To do so, IB identifies an intermediate "bottleneck" variable $T$ that has low mutual information $I(X;T)$ and high mutual information $I(Y;T)$. The "IB curve" characterizes the set of bottleneck variables that achieve maximal $I(Y;T)$ for a given $I(X;T)$, and is typically explored by maximizing the "IB Lagrangian", $I(Y;T) - βI(X;T)$. In some cases, $Y$ is a deterministic function of $X$, including many classification problems in supervised learning where the output class $Y$ is a deterministic function of the input $X$. We demonstrate three caveats when using IB in any situation where $Y$ is a deterministic function of $X$: (1) the IB curve cannot be recovered by maximizing the IB Lagrangian for different values of $β$; (2) there are "uninteresting" trivial solutions at all points of the IB curve; and (3) for multi-layer classifiers that achieve low prediction error, different layers cannot exhibit a strict trade-off between compression and prediction, contrary to a recent proposal. We also show that when $Y$ is a small perturbation away from being a deterministic function of $X$, these three caveats arise in an approximate way. To address problem (1), we propose a functional that, unlike the IB Lagrangian, can recover the IB curve in all cases. We demonstrate the three caveats on the MNIST dataset.

CLJun 26, 2017
The Minor Fall, the Major Lift: Inferring Emotional Valence of Musical Chords through Lyrics

Artemy Kolchinsky, Nakul Dhande, Kengjeun Park et al.

We investigate the association between musical chords and lyrics by analyzing a large dataset of user-contributed guitar tablatures. Motivated by the idea that the emotional content of chords is reflected in the words used in corresponding lyrics, we analyze associations between lyrics and chord categories. We also examine the usage patterns of chords and lyrics in different musical genres, historical eras, and geographical regions. Our overall results confirms a previously known association between Major chords and positive valence. We also report a wide variation in this association across regions, genres, and eras. Our results suggest possible existence of different emotional associations for other types of chords.

ITJun 8, 2017
Estimating Mixture Entropy with Pairwise Distances

Artemy Kolchinsky, Brendan D. Tracey

Mixture distributions arise in many parametric and non-parametric settings -- for example, in Gaussian mixture models and in non-parametric estimation. It is often necessary to compute the entropy of a mixture, but, in most cases, this quantity has no closed-form expression, making some form of approximation necessary. We propose a family of estimators based on a pairwise distance function between mixture components, and show that this estimator class has many attractive properties. For many distributions of interest, the proposed estimators are efficient to compute, differentiable in the mixture parameters, and become exact when the mixture components are clustered. We prove this family includes lower and upper bounds on the mixture entropy. The Chernoff $α$-divergence gives a lower bound when chosen as the distance function, with the Bhattacharyya distance providing the tightest lower bound for components that are symmetric and members of a location family. The Kullback-Leibler divergence gives an upper bound when used as the distance function. We provide closed-form expressions of these bounds for mixtures of Gaussians, and discuss their applications to the estimation of mutual information. We then demonstrate that our bounds are significantly tighter than well-known existing bounds using numeric simulations. This estimator class is very useful in optimization problems involving maximization/minimization of entropy and mutual information, such as MaxEnt and rate distortion problems.

ITMay 6, 2017
Nonlinear Information Bottleneck

Artemy Kolchinsky, Brendan D. Tracey, David H. Wolpert

Information bottleneck (IB) is a technique for extracting information in one random variable $X$ that is relevant for predicting another random variable $Y$. IB works by encoding $X$ in a compressed "bottleneck" random variable $M$ from which $Y$ can be accurately decoded. However, finding the optimal bottleneck variable involves a difficult optimization problem, which until recently has been considered for only two limited cases: discrete $X$ and $Y$ with small state spaces, and continuous $X$ and $Y$ with a Gaussian joint distribution (in which case optimal encoding and decoding maps are linear). We propose a method for performing IB on arbitrarily-distributed discrete and/or continuous $X$ and $Y$, while allowing for nonlinear encoding and decoding maps. Our approach relies on a novel non-parametric upper bound for mutual information. We describe how to implement our method using neural networks. We then show that it achieves better performance than the recently-proposed "variational IB" method on several real-world datasets.

MLDec 2, 2014
Extraction of Pharmacokinetic Evidence of Drug-drug Interactions from the Literature

Artemy Kolchinsky, Anália Lourenço, Heng-Yi Wu et al.

Drug-drug interaction (DDI) is a major cause of morbidity and mortality and a subject of intense scientific interest. Biomedical literature mining can aid DDI research by extracting evidence for large numbers of potential interactions from published literature and clinical databases. Though DDI is investigated in domains ranging in scale from intracellular biochemistry to human populations, literature mining has not been used to extract specific types of experimental evidence, which are reported differently for distinct experimental goals. We focus on pharmacokinetic evidence for DDI, essential for identifying causal mechanisms of putative interactions and as input for further pharmacological and pharmaco-epidemiology investigations. We used manually curated corpora of PubMed abstracts and annotated sentences to evaluate the efficacy of literature mining on two tasks: first, identifying PubMed abstracts containing pharmacokinetic evidence of DDIs; second, extracting sentences containing such evidence from abstracts. We implemented a text mining pipeline and evaluated it using several linear classifiers and a variety of feature transforms. The most important textual features in the abstract and sentence classification tasks were analyzed. We also investigated the performance benefits of using features derived from PubMed metadata fields, various publicly available named entity recognizers, and pharmacokinetic dictionaries. Several classifiers performed very well in distinguishing relevant and irrelevant abstracts (reaching F1~=0.93, MCC~=0.74, iAUC~=0.99) and sentences (F1~=0.76, MCC~=0.65, iAUC~=0.83). We found that word bigram features were important for achieving optimal classifier performance and that features derived from Medical Subject Headings (MeSH) terms significantly improved abstract classification. ...

MLOct 2, 2012
Evaluation of linear classifiers on articles containing pharmacokinetic evidence of drug-drug interactions

Artemy Kolchinsky, Anália Lourenço, Lang Li et al.

Background. Drug-drug interaction (DDI) is a major cause of morbidity and mortality. [...] Biomedical literature mining can aid DDI research by extracting relevant DDI signals from either the published literature or large clinical databases. However, though drug interaction is an ideal area for translational research, the inclusion of literature mining methodologies in DDI workflows is still very preliminary. One area that can benefit from literature mining is the automatic identification of a large number of potential DDIs, whose pharmacological mechanisms and clinical significance can then be studied via in vitro pharmacology and in populo pharmaco-epidemiology. Experiments. We implemented a set of classifiers for identifying published articles relevant to experimental pharmacokinetic DDI evidence. These documents are important for identifying causal mechanisms behind putative drug-drug interactions, an important step in the extraction of large numbers of potential DDIs. We evaluate performance of several linear classifiers on PubMed abstracts, under different feature transformation and dimensionality reduction methods. In addition, we investigate the performance benefits of including various publicly-available named entity recognition features, as well as a set of internally-developed pharmacokinetic dictionaries. Results. We found that several classifiers performed well in distinguishing relevant and irrelevant abstracts. We found that the combination of unigram and bigram textual features gave better performance than unigram features alone, and also that normalization transforms that adjusted for feature frequency and document length improved classification. For some classifiers, such as linear discriminant analysis (LDA), proper dimensionality reduction had a large impact on performance. Finally, the inclusion of NER features and dictionaries was found not to help classification.