LGJun 30, 2022
On Computing Probabilistic Explanations for Decision TreesMarcelo Arenas, Pablo Barceló, Miguel Romero et al.
Formal XAI (explainable AI) is a growing area that focuses on computing explanations with mathematical guarantees for the decisions made by ML models. Inside formal XAI, one of the most studied cases is that of explaining the choices taken by decision trees, as they are traditionally deemed as one of the most interpretable classes of models. Recent work has focused on studying the computation of "sufficient reasons", a kind of explanation in which given a decision tree $T$ and an instance $x$, one explains the decision $T(x)$ by providing a subset $y$ of the features of $x$ such that for any other instance $z$ compatible with $y$, it holds that $T(z) = T(x)$, intuitively meaning that the features in $y$ are already enough to fully justify the classification of $x$ by $T$. It has been argued, however, that sufficient reasons constitute a restrictive notion of explanation, and thus the community has started to study their probabilistic counterpart, in which one requires that the probability of $T(z) = T(x)$ must be at least some value $δ\in (0, 1]$, where $z$ is a random instance that is compatible with $y$. Our paper settles the computational complexity of $δ$-sufficient-reasons over decision trees, showing that both (1) finding $δ$-sufficient-reasons that are minimal in size, and (2) finding $δ$-sufficient-reasons that are minimal inclusion-wise, do not admit polynomial-time algorithms (unless P=NP). This is in stark contrast with the deterministic case ($δ= 1$) where inclusion-wise minimal sufficient-reasons are easy to compute. By doing this, we answer two open problems originally raised by Izza et al. On the positive side, we identify structural restrictions of decision trees that make the problem tractable, and show how SAT solvers might be able to tackle these problems in practical settings.
LGFeb 4, 2023
A Theory of Link Prediction via Relational Weisfeiler-Leman on Knowledge GraphsXingyue Huang, Miguel Romero Orth, İsmail İlkan Ceylan et al.
Graph neural networks are prominent models for representation learning over graph-structured data. While the capabilities and limitations of these models are well-understood for simple graphs, our understanding remains incomplete in the context of knowledge graphs. Our goal is to provide a systematic understanding of the landscape of graph neural networks for knowledge graphs pertaining to the prominent task of link prediction. Our analysis entails a unifying perspective on seemingly unrelated models and unlocks a series of other models. The expressive power of various models is characterized via a corresponding relational Weisfeiler-Leman algorithm. This analysis is extended to provide a precise logical characterization of the class of functions captured by a class of graph neural networks. The theoretical findings presented in this paper explain the benefits of some widely employed practical design choices, which are validated empirically.
LGMar 22, 2023
Three iterations of $(d-1)$-WL test distinguish non isometric clouds of $d$-dimensional pointsValentino Delle Rose, Alexander Kozachinskiy, Cristóbal Rojas et al.
The Weisfeiler--Lehman (WL) test is a fundamental iterative algorithm for checking isomorphism of graphs. It has also been observed that it underlies the design of several graph neural network architectures, whose capabilities and performance can be understood in terms of the expressive power of this test. Motivated by recent developments in machine learning applications to datasets involving three-dimensional objects, we study when the WL test is {\em complete} for clouds of euclidean points represented by complete distance graphs, i.e., when it can distinguish, up to isometry, any arbitrary such cloud. %arbitrary clouds of euclidean points represented by complete distance graphs. % How many dimensions of the Weisfeiler--Lehman test is enough to distinguish any two non-isometric point clouds in $d$-dimensional Euclidean space, assuming that these point clouds are given as complete graphs labeled by distances between the points? This question is important for understanding, which architectures of graph neural networks are capable of fully exploiting the spacial structure of a point cloud. Our main result states that the $(d-1)$-dimensional WL test is complete for point clouds in $d$-dimensional Euclidean space, for any $d\ge 2$, and that only three iterations of the test suffice. We also observe that the $d$-dimensional WL test only requires one iteration to achieve completeness. Our paper thus provides complete understanding of the 3-dimensional case: it was shown in previous works that 1-WL is not complete in $\mathbb{R}^3$, and we show that 2-WL is complete there. We also strengthen the lower bound for 1-WL by showing that it is unable to recognize planar point clouds in $\mathbb{R}^3$. Finally, we show that 2-WL is not complete in $\mathbb{R}^6$, leaving as an open question, whether it is complete in $\mathbb{R}^{d}$ for $d = 4,5$.
LGDec 22, 2025
A Logical View of GNN-Style Computation and the Role of Activation FunctionsPablo Barceló, Floris Geerts, Matthias Lanzinger et al.
We study the numerical and Boolean expressiveness of MPLang, a declarative language that captures the computation of graph neural networks (GNNs) through linear message passing and activation functions. We begin with A-MPLang, the fragment without activation functions, and give a characterization of its expressive power in terms of walk-summed features. For bounded activation functions, we show that (under mild conditions) all eventually constant activations yield the same expressive power - numerical and Boolean - and that it subsumes previously established logics for GNNs with eventually constant activation functions but without linear layers. Finally, we prove the first expressive separation between unbounded and bounded activations in the presence of linear layers: MPLang with ReLU is strictly more powerful for numerical queries than MPLang with eventually constant activation functions, e.g., truncated ReLU. This hinges on subtle interactions between linear aggregation and eventually constant non-linearities, and it establishes that GNNs using ReLU are more expressive than those restricted to eventually constant activations and linear layers.
LGOct 6, 2023
A Neuro-Symbolic Framework for Answering Graph Pattern Queries in Knowledge GraphsTamara Cucumides, Daniel Daza, Pablo Barceló et al.
The challenge of answering graph queries over incomplete knowledge graphs is gaining significant attention in the machine learning community. Neuro-symbolic models have emerged as a promising approach, combining good performance with high interpretability. These models utilize trained architectures to execute atomic queries and integrate modules that mimic symbolic query operators. However, most neuro-symbolic query processors are constrained to tree-like graph pattern queries. These queries admit a bottom-up execution with constant values or anchors at the leaves and the target variable at the root. While expressive, tree-like queries fail to capture critical properties in knowledge graphs, such as the existence of multiple edges between entities or the presence of triangles. We introduce a framework for answering arbitrary graph pattern queries over incomplete knowledge graphs, encompassing both cyclic queries and tree-like queries with existentially quantified leaves. These classes of queries are vital for practical applications but are beyond the scope of most current neuro-symbolic models. Our approach employs an approximation scheme that facilitates acyclic traversals for cyclic patterns, thereby embedding additional symbolic bias into the query execution process. Our experimental evaluation demonstrates that our framework performs competitively on three datasets, effectively handling cyclic queries through our approximation strategy. Additionally, it maintains the performance of existing neuro-symbolic models on anchored tree-like queries and extends their capabilities to queries with existentially quantified variables.
LOMay 19
Satisfiability for Knowing How over Linear Plans is NP-completeCarlos Areces, Pablo Barceló, Valentin Cassano et al.
We study the satisfiability problem for a modal logic expressing knowing-how assertions, which captures an agent's ability to achieve a given goal under the standard semantics based on linear plans. Our main result shows that satisfiability of knowing-how formulas is NP-complete, improving previously known complexity bounds. The proof proceeds via a translation into modal logic S5, an instrumental tool for addressing a variety of problems in knowledge representation.
LGSep 29, 2023
On the Power of the Weisfeiler-Leman Test for Graph Motif ParametersMatthias Lanzinger, Pablo Barceló
Seminal research in the field of graph neural networks (GNNs) has revealed a direct correspondence between the expressive capabilities of GNNs and the $k$-dimensional Weisfeiler-Leman ($k$WL) test, a widely-recognized method for verifying graph isomorphism. This connection has reignited interest in comprehending the specific graph properties effectively distinguishable by the $k$WL test. A central focus of research in this field revolves around determining the least dimensionality $k$, for which $k$WL can discern graphs with different number of occurrences of a pattern graph $P$. We refer to such a least $k$ as the WL-dimension of this pattern counting problem. This inquiry traditionally delves into two distinct counting problems related to patterns: subgraph counting and induced subgraph counting. Intriguingly, despite their initial appearance as separate challenges with seemingly divergent approaches, both of these problems are interconnected components of a more comprehensive problem: "graph motif parameters". In this paper, we provide a precise characterization of the WL-dimension of labeled graph motif parameters. As specific instances of this result, we obtain characterizations of the WL-dimension of the subgraph counting and induced subgraph counting problem for every labeled pattern $P$. We additionally demonstrate that in cases where the $k$WL test distinguishes between graphs with varying occurrences of a pattern $P$, the exact number of occurrences of $P$ can be computed uniformly using only local information of the last layer of a corresponding GNN. We finally delve into the challenge of recognizing the WL-dimension of various graph parameters. We give a polynomial time algorithm for determining the WL-dimension of the subgraph counting problem for given pattern $P$, answering an open question from previous work.
CLNov 7, 2025
Language Generation: Complexity Barriers and Implications for LearningMarcelo Arenas, Pablo Barceló, Luis Cofré et al.
Kleinberg and Mullainathan showed that, in principle, language generation is always possible: with sufficiently many positive examples, a learner can eventually produce sentences indistinguishable from those of a target language. However, the existence of such a guarantee does not speak to its practical feasibility. In this work, we show that even for simple and well-studied language families -- such as regular and context-free languages -- the number of examples required for successful generation can be extraordinarily large, and in some cases not bounded by any computable function. These results reveal a substantial gap between theoretical possibility and efficient learnability. They suggest that explaining the empirical success of modern language models requires a refined perspective -- one that takes into account structural properties of natural language that make effective generation possible in practice.
AINov 3, 2022
No Agreement Without Loss: Learning and Social Choice in Peer ReviewPablo Barceló, Mauricio Duarte, Cristóbal Rojas et al.
In peer review systems, reviewers are often asked to evaluate various features of submissions, such as technical quality or novelty. A score is given to each of the predefined features and based on these the reviewer has to provide an overall quantitative recommendation. It may be assumed that each reviewer has her own mapping from the set of features to a recommendation, and that different reviewers have different mappings in mind. This introduces an element of arbitrariness known as commensuration bias. In this paper we discuss a framework, introduced by Noothigattu, Shah and Procaccia, and then applied by the organizers of the AAAI 2022 conference. Noothigattu, Shah and Procaccia proposed to aggregate reviewer's mapping by minimizing certain loss functions, and studied axiomatic properties of this approach, in the sense of social choice theory. We challenge several of the results and assumptions used in their work and report a number of negative results. On the one hand, we study a trade-off between some of the axioms proposed and the ability of the method to properly capture agreements of the majority of reviewers. On the other hand, we show that dropping a certain unrealistic assumption has dramatic effects, including causing the method to be discontinuous.
LGFeb 18, 2025
How Expressive are Knowledge Graph Foundation Models?Xingyue Huang, Pablo Barceló, Michael M. Bronstein et al. · deepmind
Knowledge Graph Foundation Models (KGFMs) are at the frontier for deep learning on knowledge graphs (KGs), as they can generalize to completely novel knowledge graphs with different relational vocabularies. Despite their empirical success, our theoretical understanding of KGFMs remains very limited. In this paper, we conduct a rigorous study of the expressive power of KGFMs. Specifically, we show that the expressive power of KGFMs directly depends on the motifs that are used to learn the relation representations. We then observe that the most typical motifs used in the existing literature are binary, as the representations are learned based on how pairs of relations interact, which limits the model's expressiveness. As part of our study, we design more expressive KGFMs using richer motifs, which necessitate learning relation representations based on, e.g., how triples of relations interact with each other. Finally, we empirically validate our theoretical findings, showing that the use of richer motifs results in better performance on a wide range of datasets drawn from different domains.
LGJan 10, 2025
Explaining k-Nearest Neighbors: Abductive and Counterfactual ExplanationsPablo Barceló, Alexander Kozachinskiy, Miguel Romero Orth et al.
Despite the wide use of $k$-Nearest Neighbors as classification models, their explainability properties remain poorly understood from a theoretical perspective. While nearest neighbors classifiers offer interpretability from a "data perspective", in which the classification of an input vector $\bar{x}$ is explained by identifying the vectors $\bar{v}_1, \ldots, \bar{v}_k$ in the training set that determine the classification of $\bar{x}$, we argue that such explanations can be impractical in high-dimensional applications, where each vector has hundreds or thousands of features and it is not clear what their relative importance is. Hence, we focus on understanding nearest neighbor classifications through a "feature perspective", in which the goal is to identify how the values of the features in $\bar{x}$ affect its classification. Concretely, we study abductive explanations such as "minimum sufficient reasons", which correspond to sets of features in $\bar{x}$ that are enough to guarantee its classification, and "counterfactual explanations" based on the minimum distance feature changes one would have to perform in $\bar{x}$ to change its classification. We present a detailed landscape of positive and negative complexity results for counterfactual and abductive explanations, distinguishing between discrete and continuous feature spaces, and considering the impact of the choice of distance function involved. Finally, we show that despite some negative complexity results, Integer Quadratic Programming and SAT solving allow for computing explanations in practice.
LGFeb 6, 2024
Link Prediction with Relational HypergraphsXingyue Huang, Miguel Romero Orth, Pablo Barceló et al.
Link prediction with knowledge graphs has been thoroughly studied in graph machine learning, leading to a rich landscape of graph neural network architectures with successful applications. Nonetheless, it remains challenging to transfer the success of these architectures to relational hypergraphs, where the task of link prediction is over $k$-ary relations, which is substantially harder than link prediction with knowledge graphs. In this paper, we propose a framework for link prediction with relational hypergraphs, unlocking applications of graph neural networks to fully relational structures. Theoretically, we conduct a thorough analysis of the expressive power of the resulting model architectures via corresponding relational Weisfeiler-Leman algorithms and also via logical expressiveness. Empirically, we validate the power of the proposed model architectures on various relational hypergraph benchmarks. The resulting model architectures substantially outperform every baseline for inductive link prediction, and lead to state-of-the-art results for transductive link prediction.
LGJan 22, 2025
Ehrenfeucht-Haussler Rank and Chain of ThoughtPablo Barceló, Alexander Kozachinskiy, Tomasz Steifer
The notion of \emph{rank} of a Boolean function has been a cornerstone in PAC learning theory, enabling quasipolynomial-time learning algorithms for polynomial-size decision trees. We present a novel characterization of rank, grounded in the well-known Transformer architecture. We show that the rank of a function $f$ corresponds to the minimum number of \emph{Chain of Thought} (CoT) steps required by a single-layer Transformer with hard attention to compute $f$. Based on this characterization we establish tight bounds on the number of CoT steps required for specific problems, showing that \(\ell\)-fold function composition necessitates exactly \(\ell\) CoT steps. Furthermore, we analyze the problem of identifying the position of the \(k\)-th occurrence of 1 in a Boolean sequence, proving that it requires \(k\) CoT steps. Finally, we introduce the notion of the multi-head rank that captures multi-head single-layer transformers, and perform the analysis of PAC-learnability of the classes of functions with bounded multi-head rank.
LGOct 15, 2025
Message Passing on the Edge: Towards Scalable and Expressive GNNsPablo Barceló, Fabian Jogl, Alexander Kozachinskiy et al.
We propose EB-1WL, an edge-based color-refinement test, and a corresponding GNN architecture, EB-GNN. Our architecture is inspired by a classic triangle counting algorithm by Chiba and Nishizeki, and explicitly uses triangles during message passing. We achieve the following results: (1)~EB-1WL is significantly more expressive than 1-WL. Further, we provide a complete logical characterization of EB-1WL based on first-order logic, and matching distinguishability results based on homomorphism counting. (2)~In an important distinction from previous proposals for more expressive GNN architectures, EB-1WL and EB-GNN require near-linear time and memory on practical graph learning tasks. (3)~Empirically, we show that EB-GNN is a highly-efficient general-purpose architecture: It substantially outperforms simple MPNNs, and remains competitive with task-specialized GNNs while being significantly more computationally efficient.
AIOct 5, 2021
Foundations of Symbolic Languages for Model InterpretabilityMarcelo Arenas, Daniel Baez, Pablo Barceló et al.
Several queries and scores have recently been proposed to explain individual predictions over ML models. Given the need for flexible, reliable, and easy-to-apply interpretability methods for ML models, we foresee the need for developing declarative languages to naturally specify different explainability queries. We do this in a principled way by rooting such a language in a logic, called FOIL, that allows for expressing many simple but important explainability queries, and might serve as a core for more expressive interpretability languages. We study the computational complexity of FOIL queries over two classes of ML models often deemed to be easily interpretable: decision trees and OBDDs. Since the number of possible inputs for an ML model is exponential in its dimension, the tractability of the FOIL evaluation problem is delicate but can be achieved by either restricting the structure of the models or the fragment of FOIL being evaluated. We also present a prototype implementation of FOIL wrapped in a high-level declarative language and perform experiments showing that such a language can be used in practice.
LGJun 12, 2021
Graph Neural Networks with Local Graph ParametersPablo Barceló, Floris Geerts, Juan Reutter et al.
Various recent proposals increase the distinguishing power of Graph Neural Networks GNNs by propagating features between $k$-tuples of vertices. The distinguishing power of these "higher-order'' GNNs is known to be bounded by the $k$-dimensional Weisfeiler-Leman (WL) test, yet their $\mathcal O(n^k)$ memory requirements limit their applicability. Other proposals infuse GNNs with local higher-order graph structural information from the start, hereby inheriting the desirable $\mathcal O(n)$ memory requirement from GNNs at the cost of a one-time, possibly non-linear, preprocessing step. We propose local graph parameter enabled GNNs as a framework for studying the latter kind of approaches and precisely characterize their distinguishing power, in terms of a variant of the WL test, and in terms of the graph structural properties that they can take into account. Local graph parameters can be added to any GNN architecture, and are cheap to compute. In terms of expressive power, our proposal lies in the middle of GNNs and their higher-order counterparts. Further, we propose several techniques to aide in choosing the right local graph parameters. Our results connect GNNs with deep results in finite model theory and finite variable logics. Our experimental evaluation shows that adding local graph parameters often has a positive effect for a variety of GNNs, datasets and graph learning tasks.
AIApr 16, 2021
On the Complexity of SHAP-Score-Based Explanations: Tractability via Knowledge Compilation and Non-Approximability ResultsMarcelo Arenas, Pablo Barceló, Leopoldo Bertossi et al.
In Machine Learning, the $\mathsf{SHAP}$-score is a version of the Shapley value that is used to explain the result of a learned model on a specific entity by assigning a score to every feature. While in general computing Shapley values is an intractable problem, we prove a strong positive result stating that the $\mathsf{SHAP}$-score can be computed in polynomial time over deterministic and decomposable Boolean circuits. Such circuits are studied in the field of Knowledge Compilation and generalize a wide range of Boolean circuits and binary decision diagrams classes, including binary decision trees and Ordered Binary Decision Diagrams (OBDDs). We also establish the computational limits of the SHAP-score by observing that computing it over a class of Boolean models is always polynomially as hard as the model counting problem for that class. This implies that both determinism and decomposability are essential properties for the circuits that we consider. It also implies that computing $\mathsf{SHAP}$-scores is intractable as well over the class of propositional formulas in DNF. Based on this negative result, we look for the existence of fully-polynomial randomized approximation schemes (FPRAS) for computing $\mathsf{SHAP}$-scores over such class. In contrast to the model counting problem for DNF formulas, which admits an FPRAS, we prove that no such FPRAS exists for the computation of $\mathsf{SHAP}$-scores. Surprisingly, this negative result holds even for the class of monotone formulas in DNF. These techniques can be further extended to prove another strong negative result: Under widely believed complexity assumptions, there is no polynomial-time algorithm that checks, given a monotone DNF formula $\varphi$ and features $x,y$, whether the $\mathsf{SHAP}$-score of $x$ in $\varphi$ is smaller than the $\mathsf{SHAP}$-score of $y$ in $\varphi$.
AIOct 23, 2020
Model Interpretability through the Lens of Computational ComplexityPablo Barceló, Mikaël Monet, Jorge Pérez et al.
In spite of several claims stating that some models are more interpretable than others -- e.g., "linear models are more interpretable than deep neural networks" -- we still lack a principled notion of interpretability to formally compare among different classes of models. We make a step towards such a notion by studying whether folklore interpretability claims have a correlate in terms of computational complexity theory. We focus on local post-hoc explainability queries that, intuitively, attempt to answer why individual inputs are classified in a certain way by a given model. In a nutshell, we say that a class $\mathcal{C}_1$ of models is more interpretable than another class $\mathcal{C}_2$, if the computational complexity of answering post-hoc queries for models in $\mathcal{C}_2$ is higher than for those in $\mathcal{C}_1$. We prove that this notion provides a good theoretical counterpart to current beliefs on the interpretability of models; in particular, we show that under our definition and assuming standard complexity-theoretical assumptions (such as P$\neq$NP), both linear and tree-based models are strictly more interpretable than neural networks. Our complexity analysis, however, does not provide a clear-cut difference between linear and tree-based models, as we obtain different results depending on the particular post-hoc explanations considered. Finally, by applying a finer complexity analysis based on parameterized complexity, we are able to prove a theoretical result suggesting that shallow neural networks are more interpretable than deeper ones.
LGJan 10, 2019
On the Turing Completeness of Modern Neural Network ArchitecturesJorge Pérez, Javier Marinković, Pablo Barceló
Alternatives to recurrent neural networks, in particular, architectures based on attention or convolutions, have been gaining momentum for processing input sequences. In spite of their relevance, the computational properties of these alternatives have not yet been fully explored. We study the computational power of two of the most paradigmatic architectures exemplifying these mechanisms: the Transformer (Vaswani et al., 2017) and the Neural GPU (Kaiser & Sutskever, 2016). We show both models to be Turing complete exclusively based on their capacity to compute and access internal dense representations of the data. In particular, neither the Transformer nor the Neural GPU requires access to an external memory to become Turing complete. Our study also reveals some minimal sets of elements needed to obtain these completeness results.