31.6SIMay 30
Hypergraph backboningAlec Kirkley, Helcio Felippe, Federico Malizia et al.
Hypergraphs provide a natural framework for describing complex networked systems with higher-order, non-dyadic interactions. Due to their high dimensionality and often redundant structure, a key challenge is to develop methods that simplify hypergraph representations while preserving the essential structure of interactions. Here we present a principled, efficient, and non-parametric information-theoretic method for pruning nested and/or redundant structures in hypergraphs, enabling a minimal representation of higher-order interactions in the presence of local heterogeneity. Our approach naturally extends to weighted hypergraphs, where higher-order topology and hyperedge weights combine to identify the system's structural backbone. We validate the method on controlled synthetic hypergraphs and apply it to empirical datasets from diverse domains, demonstrating substantial sparsification without loss of core structural information.
SIOct 17, 2022
Implicit models, latent compression, intrinsic biases, and cheap lunches in community detectionTiago P. Peixoto, Alec Kirkley
The task of community detection, which aims to partition a network into clusters of nodes to summarize its large-scale structure, has spawned the development of many competing algorithms with varying objectives. Some community detection methods are inferential, explicitly deriving the clustering objective through a probabilistic generative model, while other methods are descriptive, dividing a network according to an objective motivated by a particular application, making it challenging to compare these methods on the same scale. Here we present a solution to this problem that associates any community detection objective, inferential or descriptive, with its corresponding implicit network generative model. This allows us to compute the description length of a network and its partition under arbitrary objectives, providing a principled measure to compare the performance of different algorithms without the need for "ground truth" labels. Our approach also gives access to instances of the community detection problem that are optimal to any given algorithm, and in this way reveals intrinsic biases in popular descriptive methods, explaining their tendency to overfit. Using our framework, we compare a number of community detection methods on artificial networks, and on a corpus of over 500 structurally diverse empirical networks. We find that more expressive community detection methods exhibit consistently superior compression performance on structured data instances, without having degraded performance on a minority of situations where more specialized algorithms perform optimally. Our results undermine the implications of the "no free lunch" theorem for community detection, both conceptually and in practice, since it is confined to unstructured data instances, unlike relevant community detection problems which are structured by requirement.
33.3SIApr 22
Heterogeneous Interaction Network Analysis (HINA): A New Learning Analytics Approach for Modelling, Analyzing, and Visualizing Complex Interactions in Learning ProcessesShihui Feng, Baiyue He, Dragan Gasevic et al.
Existing learning analytics approaches, which often model learning processes as sequences of learner actions or homogeneous relationships, are limited in capturing the distributed, multi-faceted nature of interactions in contemporary learning environments. To address this, we propose Heterogeneous Interaction Network Analysis (HINA), a novel multi-level learning analytics framework for modeling complex learning processes across diverse entities (e.g., learners, behaviours, AI agents, and task designs). HINA integrates a set of original methods, including summative measures and a new non-parametric clustering technique, with established practices for statistical testing and interactive visualization to provide a flexible and powerful analytical toolkit. In this paper, we first detail the theoretical and mathematical foundations of HINA for individual, dyadic, and meso-level analysis. We then demonstrate HINA's utility through a case study on AI-mediated small-group collaborative learning, revealing students' interaction profiles with peers versus AI; distinct engagement patterns that emerge from these interactions; and specific types of learning behaviors (e.g., asking questions, planning) directed to AI versus peers. By transforming process data into Heterogeneous Interaction Networks (HINs), HINA introduces a new paradigm for modeling learning processes and provides the dedicated, multi-level analytical methods required to extract meaning from them. It thereby moves beyond a single process data type to quantify and visualize how different elements in a learning environment interact and co-influence each other, opening new avenues for understanding complex educational dynamics.
28.7SOC-PHMay 9
Networks of amenities reveal universal homophily and heterophily across global citiesJianrui Wu, Baiyue He, Alec Kirkley
Agglomeration economies drive urban growth at different spatial scales by enabling productivity gains, knowledge spillovers, and shared inputs among proximate firms and amenities. To develop a unified science of cities it is thus important to understand how and to what extent different amenities cluster or mix across scales and regional contexts. By utilizing a novel Bayesian framework for nonparametrically quantifying the spectrum of possible mixing patterns of amenities in a city, we identify universal spatial scales of homophily (agglomeration) and heterophily (co-agglomeration) among different amenity types across roughly 800 cities worldwide. Through a detailed longitudinal case study, we also find that the changes in heterophilic mixing derived from our methodology more effectively predict changes in neighborhood rental values than the diversity of amenities present. These findings suggest that agglomeration economies exhibit universal spatial regularities that depend largely on the types of firms or amenities being considered, rather than their specifics or regional context, and highlight the benefit of heterophilic amenity mixing at walkable spatial scales.
5.0MLMay 6
Scalable inference of spatial regions and temporal signatures from time seriesJiayu Weng, Alec Kirkley
Regionalization aims to partition a spatial domain into contiguous regions that share similar characteristics, enabling more effective spatial analysis, policy making, and resource management. Existing approaches for spatial regionalization typically rely on static spatial snapshots rather than evolving time series. Meanwhile, most time series clustering methods ignore spatial structure or enforce spatial continuity through ad hoc regularization, constraining the number of inferred regions a priori either explicitly or implicitly. Utilizing the minimum description length principle from information theory, here we propose an efficient and fully nonparametric framework for the regionalization of spatial time series. Our method jointly infers a spatial partition along with a set of representative time series archetypes ("drivers") that best compress a spatiotemporal dataset, with a runtime log-linear in the number of time series. We demonstrate that this method can accurately recover planted regional structure and drivers in synthetic time series, and can extract meaningful structural regularities in large-scale empirical air quality and vegetation index records. Our method provides a principled and scalable framework for spatially contiguous partitioning, allowing interpretable temporal patterns and homogeneous regions to emerge directly from the data itself.
STAT-MECHSep 23, 2020
Belief propagation for networks with loopsAlec Kirkley, George T. Cantwell, M. E. J. Newman
Belief propagation is a widely used message passing method for the solution of probabilistic models on networks such as epidemic models, spin models, and Bayesian graphical models, but it suffers from the serious shortcoming that it works poorly in the common case of networks that contain short loops. Here we provide a solution to this long-standing problem, deriving a belief propagation method that allows for fast calculation of probability distributions in systems with short loops, potentially with high density, as well as giving expressions for the entropy and partition function, which are notoriously difficult quantities to compute. Using the Ising model as an example, we show that our approach gives excellent results on both real and synthetic networks, improving significantly on standard message passing methods. We also discuss potential applications of our method to a variety of other problems.