Alexander J. Gates

ML
5papers
276citations
Novelty45%
AI Score45

5 Papers

ITMay 31
Finite-Resolution Information from Collision Statistics

Alexander J. Gates

Collision statistics provide a finite-resolution view of information by measuring how often a fixed number of independent samples fall on the same state. These directly countable quantities form the basis of integer-order Rényi entropies. Here, we use low-order Rényi entropies to approximate Shannon entropy and mutual information, while characterizing what is necessarily lost when only finitely many collision moments are used. We derive interpolation-error bounds showing that approximation error is controlled by the shape of the Rényi entropy path near the Shannon point. We also separate this deterministic error from finite-sample estimation error: for fixed collision order, increasing sample size improves estimation of the finite-resolution target but does not eliminate its deterministic difference from Shannon entropy or mutual information. Finally, we show that finite collision moments do not generally identify Shannon entropy, and that increasing collision order shifts sensitivity toward high-probability events. Numerical experiments illustrate the approximation--estimation tradeoff and compare collision-based approximations with plug-in and Miller--Madow estimators. The framework links collision counts, Rényi entropy, Shannon limits, and mutual information through a finite-resolution view of information, clarifying when low-order coincidence structure is informative and when irreducible information is lost.

MNApr 18, 2016
Control of complex networks requires both structure and dynamics

Alexander J. Gates, Luis M. Rocha

The study of network structure has uncovered signatures of the organization of complex systems. However, there is also a need to understand how to control them; for example, identifying strategies to revert a diseased cell to a healthy state, or a mature cell to a pluripotent state. Two recent methodologies suggest that the controllability of complex systems can be predicted solely from the graph of interactions between variables, without considering their dynamics: structural controllability and minimum dominating sets. We demonstrate that such structure-only methods fail to characterize controllability when dynamics are introduced. We study Boolean network ensembles of network motifs as well as three models of biochemical regulation: the segment polarity network in Drosophila melanogaster, the cell cycle of budding yeast Saccharomyces cerevisiae, and the floral organ arrangement in Arabidopsis thaliana. We demonstrate that structure-only methods both undershoot and overshoot the number and which sets of critical variables best control the dynamics of these models, highlighting the importance of the actual system dynamics in determining control. Our analysis further shows that the logic of automata transition functions, namely how canalizing they are, plays an important role in the extent to which structure predicts dynamics.

OHMay 9, 2018
CANA: A python package for quantifying control and canalization in Boolean Networks

Rion Brattig Correia, Alexander J. Gates, Xuan Wang et al.

Logical models offer a simple but powerful means to understand the complex dynamics of biochemical regulation, without the need to estimate kinetic parameters. However, even simple automata components can lead to collective dynamics that are computationally intractable when aggregated into networks. In previous work we demonstrated that automata network models of biochemical regulation are highly canalizing, whereby many variable states and their groupings are redundant (Marques-Pita and Rocha, 2013). The precise charting and measurement of such canalization simplifies these models, making even very large networks amenable to analysis. Moreover, canalization plays an important role in the control, robustness, modularity and criticality of Boolean network dynamics, especially those used to model biochemical regulation (Gates and Rocha, 2016; Gates et al., 2016; Manicka, 2017). Here we describe a new publicly-available Python package that provides the necessary tools to extract, measure, and visualize canalizing redundancy present in Boolean network models. It extracts the pathways most effective in controlling dynamics in these models, including their effective graph and dynamics canalizing map, as well as other tools to uncover minimum sets of control variables.

MLNov 4, 2025
Unifying Information-Theoretic and Pair-Counting Clustering Similarity

Alexander J. Gates

Comparing clusterings is central to evaluating unsupervised models, yet the many existing similarity measures can produce widely divergent, sometimes contradictory, evaluations. Clustering similarity measures are typically organized into two principal families, pair-counting and information-theoretic, reflecting whether they quantify agreement through element pairs or aggregate information across full cluster contingency tables. Prior work has uncovered parallels between these families and applied empirical normalization or chance-correction schemes, but their deeper analytical connection remains only partially understood. Here, we develop an analytical framework that unifies these families through two complementary perspectives. First, both families are expressed as weighted expansions of observed versus expected co-occurrences, with pair-counting arising as a quadratic, low-order approximation and information-theoretic measures as higher-order, frequency-weighted extensions. Second, we generalize pair-counting to $k$-tuple agreement and show that information-theoretic measures can be viewed as systematically accumulating higher-order co-assignment structure beyond the pairwise level. We illustrate the approaches analytically for the Rand index and Mutual Information, and show how other indices in each family emerge as natural extensions. Together, these views clarify when and why the two regimes diverge, relating their sensitivities directly to weighting and approximation order, and provide a principled basis for selecting, interpreting, and extending clustering similarity measures across applications.

MLJun 19, 2017
Element-centric clustering comparison unifies overlaps and hierarchy

Alexander J. Gates, Ian B. Wood, William P. Hetrick et al.

Clustering is one of the most universal approaches for understanding complex data. A pivotal aspect of clustering analysis is quantitatively comparing clusterings; clustering comparison is the basis for many tasks such as clustering evaluation, consensus clustering, and tracking the temporal evolution of clusters. In particular, the extrinsic evaluation of clustering methods requires comparing the uncovered clusterings to planted clusterings or known metadata. Yet, as we demonstrate, existing clustering comparison measures have critical biases which undermine their usefulness, and no measure accommodates both overlapping and hierarchical clusterings. Here we unify the comparison of disjoint, overlapping, and hierarchically structured clusterings by proposing a new element-centric framework: elements are compared based on the relationships induced by the cluster structure, as opposed to the traditional cluster-centric philosophy. We demonstrate that, in contrast to standard clustering similarity measures, our framework does not suffer from critical biases and naturally provides unique insights into how the clusterings differ. We illustrate the strengths of our framework by revealing new insights into the organization of clusters in two applications: the improved classification of schizophrenia based on the overlapping and hierarchical community structure of fMRI brain networks, and the disentanglement of various social homophily factors in Facebook social networks. The universality of clustering suggests far-reaching impact of our framework throughout all areas of science.