SIJun 1, 2022
Core-periphery Models for HypergraphsMarios Papachristou, Jon Kleinberg
We introduce a random hypergraph model for core-periphery structure. By leveraging our model's sufficient statistics, we develop a novel statistical inference algorithm that is able to scale to large hypergraphs with runtime that is practically linear wrt. the number of nodes in the graph after a preprocessing step that is almost linear in the number of hyperedges, as well as a scalable sampling algorithm. Our inference algorithm is capable of learning embeddings that correspond to the reputation (rank) of a node within the hypergraph. We also give theoretical bounds on the size of the core of hypergraphs generated by our model. We experiment with hypergraph data that range to $\sim 10^5$ hyperedges mined from the Microsoft Academic Graph, Stack Exchange, and GitHub and show that our model outperforms baselines wrt. producing good fits.
28.3SIApr 17
Structural Measures of Resilience for Supply ChainsMarios Papachristou, M. Amin Rahimian, Arash Azadegan
Modern production systems are increasingly defined by dense networks of multi-tier sourcing dependencies, where localized upstream disruptions can cascade into system-wide collapses. While supply chain resilience has garnered significant managerial attention, we still lack theoretically-grounded, reliable, analytical metrics that can distinguish inherently resilient architectures from fragile ones. This paper addresses this gap by developing a structural resilience framework and a novel metric, defined as the maximum supplier failure rate that a network can sustain while maintaining an aggregate production level. Using node percolation theory and branching processes, we identify four critical structural determinants of resilience: the number of raw materials, the number of finished goods, sourcing requirements, and sourcing influence. Our analysis reveals two distinct regimes: "top hat" architectures, which are characterized by excessive raw materials and high centralization, making them inherently fragile; and "rolling pin" structures, which maintain controlled input/output widths and sparsity, allowing them to absorb non-trivial shocks. To operationalize these insights, we formulate resilience computation as a scalable linear program that approximates cascading failure sizes in large-scale networks with cycles, heterogeneous suppliers, and structural decoupling. Furthermore, we extend our framework to account for exogenous failure correlations, such as those arising from geographic or geopolitical factors that can undermine traditional supplier and input diversification strategies. We validate our theoretical results using multi-echelon supply chain data. These tools can inform network design, supplier diversification, and inventory planning to proactively reduce systemic risk.
LGJun 28, 2023
Differentially Private Distributed Estimation and LearningMarios Papachristou, M. Amin Rahimian
We study distributed estimation and learning problems in a networked environment where agents exchange information to estimate unknown statistical properties of random variables from their privately observed samples. The agents can collectively estimate the unknown quantities by exchanging information about their private observations, but they also face privacy risks. Our novel algorithms extend the existing distributed estimation literature and enable the participating agents to estimate a complete sufficient statistic from private signals acquired offline or online over time and to preserve the privacy of their signals and network neighborhoods. This is achieved through linear aggregation schemes with adjusted randomization schemes that add noise to the exchanged estimates subject to differential privacy (DP) constraints, both in an offline and online manner. We provide convergence rate analysis and tight finite-time convergence bounds. We show that the noise that minimizes the convergence time to the best estimates is the Laplace noise, with parameters corresponding to each agent's sensitivity to their signal and network characteristics. Our algorithms are amenable to dynamic topologies and balancing privacy and accuracy trade-offs. Finally, to supplement and validate our theoretical results, we run experiments on real-world data from the US Power Grid Network and electric consumption data from German Households to estimate the average power consumption of power stations and households under all privacy regimes and show that our method outperforms existing first-order, privacy-aware, distributed optimization methods.
LGNov 1, 2022
GLINKX: A Scalable Unified Framework For Homophilous and Heterophilous GraphsMarios Papachristou, Rishab Goel, Frank Portman et al.
In graph learning, there have been two predominant inductive biases regarding graph-inspired architectures: On the one hand, higher-order interactions and message passing work well on homophilous graphs and are leveraged by GCNs and GATs. Such architectures, however, cannot easily scale to large real-world graphs. On the other hand, shallow (or node-level) models using ego features and adjacency embeddings work well in heterophilous graphs. In this work, we propose a novel scalable shallow method -- GLINKX -- that can work both on homophilous and heterophilous graphs. GLINKX leverages (i) novel monophilous label propagations, (ii) ego/node features, (iii) knowledge graph embeddings as positional embeddings, (iv) node-level training, and (v) low-dimensional message passing. Formally, we prove novel error bounds and justify the components of GLINKX. Experimentally, we show its effectiveness on several homophilous and heterophilous datasets.
CLNov 3, 2023
Leveraging Large Language Models for Collective Decision-MakingMarios Papachristou, Longqi Yang, Chin-Chia Hsu
In various work contexts, such as meeting scheduling, collaborating, and project planning, collective decision-making is essential but often challenging due to diverse individual preferences, varying work focuses, and power dynamics among members. To address this, we propose a system leveraging Large Language Models (LLMs) to facilitate group decision-making by managing conversations and balancing preferences among individuals. Our system aims to extract individual preferences from each member's conversation with the system and suggest options that satisfy the preferences of the members. We specifically apply this system to corporate meeting scheduling. We create synthetic employee profiles and simulate conversations at scale, leveraging LLMs to evaluate the system performance as a novel approach to conducting a user study. Our results indicate efficient coordination with reduced interactions between the members and the LLM-based system. The system refines and improves its proposed options over time, ensuring that many of the members' individual preferences are satisfied in an equitable way. Finally, we conduct a survey study involving human participants to assess our system's ability to aggregate preferences and reasoning about them. Our findings show that the system exhibits strong performance in both dimensions.
LGFeb 25, 2021Code
Truncated Log-concave Sampling with Reflective Hamiltonian Monte CarloApostolos Chalkis, Vissarion Fisikopoulos, Marios Papachristou et al.
We introduce Reflective Hamiltonian Monte Carlo (ReHMC), an HMC-based algorithm, to sample from a log-concave distribution restricted to a convex body. We prove that, starting from a warm start, the walk mixes to a log-concave target distribution $π(x) \propto e^{-f(x)}$, where $f$ is $L$-smooth and $m$-strongly-convex, within accuracy $\varepsilon$ after $\widetilde O(κd^2 \ell^2 \log (1 / \varepsilon))$ steps for a well-rounded convex body where $κ= L / m$ is the condition number of the negative log-density, $d$ is the dimension, $\ell$ is an upper bound on the number of reflections, and $\varepsilon$ is the accuracy parameter. We also developed an efficient open source implementation of ReHMC and we performed an experimental study on various high-dimensional data-sets. The experiments suggest that ReHMC outperfroms Hit-and-Run and Coordinate-Hit-and-Run regarding the time it needs to produce an independent sample and introduces practical truncated sampling in thousands of dimensions.
SIFeb 16, 2024
Network Formation and Dynamics Among Multi-LLMsMarios Papachristou, Yuan Yuan
Social networks profoundly influence how humans form opinions, exchange information, and organize collectively. As large language models (LLMs) are increasingly embedded into social and professional environments, it is critical to understand whether their interactions approximate human-like network dynamics. We develop a framework to study the network formation behaviors of multiple LLM agents and benchmark them against human decisions. Across synthetic and real-world settings, including friendship, telecommunication, and employment networks, we find that LLMs consistently reproduce fundamental micro-level principles such as preferential attachment, triadic closure, and homophily, as well as macro-level properties including community structure and small-world effects. Importantly, the relative emphasis of these principles adapts to context: for example, LLMs favor homophily in friendship networks but heterophily in organizational settings, mirroring patterns of social mobility. A controlled human-subject survey confirms strong alignment between LLMs and human participants in link-formation decisions. These results establish that LLMs can serve as powerful tools for social simulation and synthetic data generation, while also raising critical questions about bias, fairness, and the design of AI systems that participate in human networks.
LGFeb 13, 2024
Differentially Private Distributed InferenceMarios Papachristou, M. Amin Rahimian
How can agents exchange information to learn while protecting privacy? Healthcare centers collaborating on clinical trials must balance knowledge sharing with safeguarding sensitive patient data. We address this challenge by using differential privacy (DP) to control information leakage. Agents update belief statistics via log-linear rules, and DP noise provides plausible deniability and rigorous performance guarantees. We study two settings: distributed maximum likelihood estimation (MLE) with a finite set of private signals and online learning from an intermittent signal stream. Noisy aggregation introduces trade-offs between rejecting low-quality states and accepting high-quality ones. The MLE setting naturally applies to hypothesis testing with formal statistical guarantees. Through simulations, we demonstrate differentially private, distributed survival analysis on real-world clinical trial data, evaluating treatment efficacy and the impact of biomedical indices on patient survival. Our methods enable privacy-preserving inference with greater efficiency and lower error rates than homomorphic encryption and first-order DP optimization approaches.