Marcelo Arenas

h-index43

12papers

8,420citations

Novelty51%

AI Score33

Ranked #116,544 of 194,257 authors (top 60%)#7,133 in AI (top 57%)

12 Papers

22.4LGJun 30, 2022

On Computing Probabilistic Explanations for Decision Trees

Marcelo Arenas, Pablo Barceló, Miguel Romero et al.

Formal XAI (explainable AI) is a growing area that focuses on computing explanations with mathematical guarantees for the decisions made by ML models. Inside formal XAI, one of the most studied cases is that of explaining the choices taken by decision trees, as they are traditionally deemed as one of the most interpretable classes of models. Recent work has focused on studying the computation of "sufficient reasons", a kind of explanation in which given a decision tree $T$ and an instance $x$, one explains the decision $T(x)$ by providing a subset $y$ of the features of $x$ such that for any other instance $z$ compatible with $y$, it holds that $T(z) = T(x)$, intuitively meaning that the features in $y$ are already enough to fully justify the classification of $x$ by $T$. It has been argued, however, that sufficient reasons constitute a restrictive notion of explanation, and thus the community has started to study their probabilistic counterpart, in which one requires that the probability of $T(z) = T(x)$ must be at least some value $δ\in (0, 1]$, where $z$ is a random instance that is compatible with $y$. Our paper settles the computational complexity of $δ$-sufficient-reasons over decision trees, showing that both (1) finding $δ$-sufficient-reasons that are minimal in size, and (2) finding $δ$-sufficient-reasons that are minimal inclusion-wise, do not admit polynomial-time algorithms (unless P=NP). This is in stark contrast with the deterministic case ($δ= 1$) where inclusion-wise minimal sufficient-reasons are easy to compute. By doing this, we answer two open problems originally raised by Izza et al. On the positive side, we identify structural restrictions of decision trees that make the problem tractable, and show how SAT solvers might be able to tackle these problems in practical settings.

3.3LOOct 18, 2023Code

A Uniform Language to Explain Decision Trees

Marcelo Arenas, Pablo Barcelo, Diego Bustamante et al.

The formal XAI community has studied a plethora of interpretability queries aiming to understand the classifications made by decision trees. However, a more uniform understanding of what questions we can hope to answer about these models, traditionally deemed to be easily interpretable, has remained elusive. In an initial attempt to understand uniform languages for interpretability, Arenas et al. (2021) proposed FOIL, a logic for explaining black-box ML models, and showed that it can express a variety of interpretability queries. However, we show that FOIL is limited in two important senses: (i) it is not expressive enough to capture some crucial queries, and (ii) its model agnostic nature results in a high computational complexity for decision trees. In this paper, we carefully craft two fragments of first-order logic that allow for efficiently interpreting decision trees: Q-DT-FOIL and its optimization variant OPT-DT-FOIL. We show that our proposed logics can express not only a variety of interpretability queries considered by previous literature, but also elegantly allows users to specify different objectives the sought explanations should optimize for. Using finite model-theoretic techniques, we show that the different ingredients of Q-DT-FOIL are necessary for its expressiveness, and yet that queries in Q-DT-FOIL can be evaluated with a polynomial number of queries to a SAT solver, as well as their optimization versions in OPT-DT-FOIL. Besides our theoretical results, we provide a SAT-based implementation of the evaluation for OPT-DT-FOIL that is performant on industry-size decision trees.

6.3AIJul 7

ExplAIner: A Declarative Query Language for Explaining Classification Models

Marcelo Arenas, Pablo Barceló, Diego Bustamante et al.

The XAI community has studied a wide range of queries and scores for explaining predictions of ML models. From a data management perspective, this proliferation of explanation notions calls for declarative query languages in which such notions can be specified, combined, and analyzed uniformly. In this paper, we develop such a framework for Boolean models. We first revisit FOIL, an interpretability query language for black-box models, and show that it has two fundamental limitations: it cannot express central optimality-based explanation queries, and its evaluation problem over decision trees is hard for every level of the polynomial hierarchy. We then introduce ExplAIner, a query language based on FOIL with an extended vocabulary and a layered structure. We show that ExplAIner can express a broad family of explanation notions, including abductive, contrastive, feature-based, and distance-based queries. We also prove that the evaluation problem for each query in ExplAIner belongs to the Boolean hierarchy over every class of Boolean models for which some basic predicates can be evaluated in polynomial time. In particular, that property holds for deterministic and decomposable Boolean circuits. Finally, we introduce Opt-FOIL, an optimization-oriented fragment of ExplAIner for computing explanations that are minimal with respect to strict partial orders, and prove that its evaluation problem is in $\mathrm{FP}^{\mathrm{NP}}$ under the same tractability assumptions. These complexity results have a direct algorithmic consequence: a fixed ExplAIner query can be evaluated with a fixed number of calls to a SAT solver, while a notion of explanation specified in Opt-FOIL can be computed with a polynomial number of such calls. This is particularly relevant in formal XAI, where SAT solvers have been successfully used to compute explanations for several classes of ML models.

9.2CCJun 28

On the Complexity of Counting Orderings in Graphs

Marcelo Arenas, María Alejandra Schild, Bernardo Subercaseaux

We study the computational complexity of several counting problems on graphs. Each of these problems consists of counting orderings of the vertices or edges with adjacency constraints. We show $\#P$-completeness for all of them via a common new technique. Given a counting function $C$ of interest, we define a parameterized family of instances $G_q$, where the parameter $q$ controls the amplification of a simple gadget. After multiplying by an explicit factor $f(q)$, we show that the values of $f(q) \cdot C(G_q)$, for positive integers $q$, agree with a rational function in $q$ whose numerator and denominator can be interpolated in polynomial time. We then recover a $\#P$-hard function by evaluating this rational function symbolically at a limiting value $L \in \mathbb{Q} \cup \{\infty, -\infty\}$. With this methodology, we show $\#P$-completeness for the following counting problems: (a) successive vertex orderings of bipartite graphs, (b) st-numberings of graphs, (c) shellings of bipartite graphs, (d) linear extensions of N-free posets of height $3$, and (e) linear extensions of posets of height $2$. Result (d) settles a conjecture of Felsner and Manneville (2015). Although result (e) was first proved by Dittmer and Pak (2018), we include an alternative proof, using our technique, that does not rely on the result of Brightwell and Winkler (1991) about the hardness of counting linear extensions for general posets.

11.5DBJun 22

A Compositional Language for Property Graphs

Marcelo Arenas, Leonid Libkin, Wim Martens

A major shortcoming of the recently standardized graph query languages GQL and SQL/PGQ is their lack of compositionality. Given the importance of these languages in querying knowledge graphs, we address this shortcoming and propose both theoretical solutions and a path to adding them to the new standards. The highlight of the non-compositionality problem is that while both GQL and SQL/PGQ can express graph reachability and all first-order queries, they fall short of the problems in NLOGSPACE. In view of the completeness of reachability for NLOGSPACE under first-order reductions, this is extremely counterintuitive. The issue is well recognized by the standards committee that has been searching for language extensions to fill the gaps at the level of some specific inexpressible queries. We address the issue in a systematic way and propose a language that fills expressivity gaps by allowing full compositionality between graph patterns and relational queries. It does so by using two key components: a cleaned up definition of regular path queries with variables and data value comparisons, and a fully compositional graph-to-graph language #Datalog with complete support for constructing new graph elements from nodes, edges, lists of nodes and edges, and even entire paths. We show that the resulting language addresses the issues facing the standards committee, and propose a concrete addition to GQL and SQL/PGQ that incorporates its main features.

10.7AINov 19, 2024

Restructuring Tractable Probabilistic Circuits

Honghua Zhang, Benjie Wang, Marcelo Arenas et al.

Probabilistic circuits (PCs) are a unifying representation for probabilistic models that support tractable inference. Numerous applications of PCs like controllable text generation depend on the ability to efficiently multiply two circuits. Existing multiplication algorithms require that the circuits respect the same structure, i.e. variable scopes decomposes according to the same vtree. In this work, we propose and study the task of restructuring structured(-decomposable) PCs, that is, transforming a structured PC such that it conforms to a target vtree. We propose a generic approach for this problem and show that it leads to novel polynomial-time algorithms for multiplying circuits respecting different vtrees, as well as a practical depth-reduction algorithm that preserves structured decomposibility. Our work opens up new avenues for tractable PC inference, suggesting the possibility of training with less restrictive PC structures while enabling efficient inference by changing their structures at inference time.

11.1AIDec 30, 2024

Probabilistic Explanations for Linear Models

Bernardo Subercaseaux, Marcelo Arenas, Kuldeep S Meel

Formal XAI is an emerging field that focuses on providing explanations with mathematical guarantees for the decisions made by machine learning models. A significant amount of work in this area is centered on the computation of "sufficient reasons". Given a model $M$ and an input instance $\vec{x}$, a sufficient reason for the decision $M(\vec{x})$ is a subset $S$ of the features of $\vec{x}$ such that for any instance $\vec{z}$ that has the same values as $\vec{x}$ for every feature in $S$, it holds that $M(\vec{x}) = M(\vec{z})$. Intuitively, this means that the features in $S$ are sufficient to fully justify the classification of $\vec{x}$ by $M$. For sufficient reasons to be useful in practice, they should be as small as possible, and a natural way to reduce the size of sufficient reasons is to consider a probabilistic relaxation; the probability of $M(\vec{x}) = M(\vec{z})$ must be at least some value $δ\in (0,1]$, for a random instance $\vec{z}$ that coincides with $\vec{x}$ on the features in $S$. Computing small $δ$-sufficient reasons ($δ$-SRs) is known to be a theoretically hard problem; even over decision trees--traditionally deemed simple and interpretable models--strong inapproximability results make the efficient computation of small $δ$-SRs unlikely. We propose the notion of $(δ, ε)$-SR, a simple relaxation of $δ$-SRs, and show that this kind of explanation can be computed efficiently over linear models.

18.3AIOct 5, 2021Code

Foundations of Symbolic Languages for Model Interpretability

Marcelo Arenas, Daniel Baez, Pablo Barceló et al.

Several queries and scores have recently been proposed to explain individual predictions over ML models. Given the need for flexible, reliable, and easy-to-apply interpretability methods for ML models, we foresee the need for developing declarative languages to naturally specify different explainability queries. We do this in a principled way by rooting such a language in a logic, called FOIL, that allows for expressing many simple but important explainability queries, and might serve as a core for more expressive interpretability languages. We study the computational complexity of FOIL queries over two classes of ML models often deemed to be easily interpretable: decision trees and OBDDs. Since the number of possible inputs for an ML model is exponential in its dimension, the tractability of the FOIL evaluation problem is delicate but can be achieved by either restricting the structure of the models or the fragment of FOIL being evaluated. We also present a prototype implementation of FOIL wrapped in a high-level declarative language and perform experiments showing that such a language can be used in practice.

2.3DBJun 22, 2021

Querying in the Age of Graph Databases and Knowledge Graphs

Marcelo Arenas, Claudio Gutierrez, Juan F. Sequeda

Graphs have become the best way we know of representing knowledge. The computing community has investigated and developed the support for managing graphs by means of digital technology. Graph databases and knowledge graphs surface as the most successful solutions to this program. The goal of this document is to provide a conceptual map of the data management tasks underlying these developments, paying particular attention to data models and query languages for graphs.

21.9AIApr 16, 2021

On the Complexity of SHAP-Score-Based Explanations: Tractability via Knowledge Compilation and Non-Approximability Results

Marcelo Arenas, Pablo Barceló, Leopoldo Bertossi et al.

In Machine Learning, the $\mathsf{SHAP}$-score is a version of the Shapley value that is used to explain the result of a learned model on a specific entity by assigning a score to every feature. While in general computing Shapley values is an intractable problem, we prove a strong positive result stating that the $\mathsf{SHAP}$-score can be computed in polynomial time over deterministic and decomposable Boolean circuits. Such circuits are studied in the field of Knowledge Compilation and generalize a wide range of Boolean circuits and binary decision diagrams classes, including binary decision trees and Ordered Binary Decision Diagrams (OBDDs). We also establish the computational limits of the SHAP-score by observing that computing it over a class of Boolean models is always polynomially as hard as the model counting problem for that class. This implies that both determinism and decomposability are essential properties for the circuits that we consider. It also implies that computing $\mathsf{SHAP}$-scores is intractable as well over the class of propositional formulas in DNF. Based on this negative result, we look for the existence of fully-polynomial randomized approximation schemes (FPRAS) for computing $\mathsf{SHAP}$-scores over such class. In contrast to the model counting problem for DNF formulas, which admits an FPRAS, we prove that no such FPRAS exists for the computation of $\mathsf{SHAP}$-scores. Surprisingly, this negative result holds even for the class of monotone formulas in DNF. These techniques can be further extended to prove another strong negative result: Under widely believed complexity assumptions, there is no polynomial-time algorithm that checks, given a monotone DNF formula $\varphi$ and features $x,y$, whether the $\mathsf{SHAP}$-score of $x$ in $\varphi$ is smaller than the $\mathsf{SHAP}$-score of $y$ in $\varphi$.

10.5AIJul 28, 2020

The Tractability of SHAP-Score-Based Explanations over Deterministic and Decomposable Boolean Circuits

Marcelo Arenas, Pablo Barceló Leopoldo Bertossi, Mikaël Monet

Scores based on Shapley values are widely used for providing explanations to classification results over machine learning models. A prime example of this is the influential SHAP-score, a version of the Shapley value that can help explain the result of a learned model on a specific entity by assigning a score to every feature. While in general computing Shapley values is a computationally intractable problem, it has recently been claimed that the SHAP-score can be computed in polynomial time over the class of decision trees. In this paper, we provide a proof of a stronger result over Boolean models: the SHAP-score can be computed in polynomial time over deterministic and decomposable Boolean circuits. Such circuits, also known as tractable Boolean circuits, generalize a wide range of Boolean circuits and binary decision diagrams classes, including binary decision trees, Ordered Binary Decision Diagrams (OBDDs) and Free Binary Decision Diagrams (FBDDs). We also establish the computational limits of the notion of SHAP-score by observing that, under a mild condition, computing it over a class of Boolean models is always polynomially as hard as the model counting problem for that class. This implies that both determinism and decomposability are essential properties for the circuits that we consider, as removing one or the other renders the problem of computing the SHAP-score intractable (namely, #P-hard).

3.2AIApr 21, 2013

Exchanging OWL 2 QL Knowledge Bases

Marcelo Arenas, Elena Botoeva, Diego Calvanese et al.

Knowledge base exchange is an important problem in the area of data exchange and knowledge representation, where one is interested in exchanging information between a source and a target knowledge base connected through a mapping. In this paper, we study this fundamental problem for knowledge bases and mappings expressed in OWL 2 QL, the profile of OWL 2 based on the description logic DL-Lite_R. More specifically, we consider the problem of computing universal solutions, identified as one of the most desirable translations to be materialized, and the problem of computing UCQ-representations, which optimally capture in a target TBox the information that can be extracted from a source TBox and a mapping by means of unions of conjunctive queries. For the former we provide a novel automata-theoretic technique, and complexity results that range from NP to EXPTIME, while for the latter we show NLOGSPACE-completeness.