DSMay 21
Gray Codes With Constant Delay and Constant Auxiliary SpaceAntoine Amarilli, Claire David, Nadime Francis et al.
We give the first two algorithms to enumerate all binary words of $\{0,1\}^\ell$ (like Gray codes) while ensuring that the delay and the auxiliary space is independent from $\ell$, i.e., constant time for each word, and constant memory in addition to the $\ell$ bits storing the current word. Our algorithms are given in two new computational models: tape machines and deque machines. We also study more restricted models, queue machines and stack machines, and show that they cannot enumerate all binary words with constant auxiliary space, even with unrestricted delay. A tape machine is a Turing machine that stores the current binary word on a single working tape of length $\ell$ (which never increases), using no other tape. The machine has a single head and must edit its tape to reach all possible words of $\{0,1\}^\ell$, and output them (in unit time, by entering special output states), with no duplicates. Hence a tape machine uses constant auxiliary space by definition (up to the head position). We construct a tape machine that achieves this task with constant delay between consecutive outputs, so that the machine implements a so-called skew-tolerant quasi-Gray code. We then construct a more involved tape machine that implements a Gray code. A deque machine stores the current binary word on a double-ended queue of length $\ell$, and stores a constant-size internal state. It works as a tape machine, except that it modifies the content of the deque by performing push and pop operations on the endpoints. Hence again a deque machine uses constant auxiliary space by definition. We construct deque machines that enumerate all words of $\{0,1\}^\ell$ with constant-delay. The main technical challenge in this model is to correctly detect when enumeration has finished.
DSMay 5
The S-Hamiltonian Cycle ProblemAntoine Amarilli, Arthur Lombardo, Mikaël Monet
Determining if an input undirected graph is Hamiltonian, i.e., if it has a cycle that visits every vertex exactly once, is one of the most famous NP-complete problems. We consider the following generalization of Hamiltonian cycles: for a fixed set $S$ of natural numbers, we want to visit each vertex of a graph $G$ exactly once and ensure that any two consecutive vertices can be joined in $k$ hops for some choice of $k \in S$. Formally, an $S$-Hamiltonian cycle is a permutation $(v_0,\ldots,v_{n-1})$ of the vertices of $G$ such that, for $0 \leq i \leq n-1$, there exists a walk between $v_i$ and $v_{i+1 \bmod n}$ whose length is in $S$. (We do not impose any constraints on how many times vertices can be visited as intermediate vertices of walks.) Of course Hamiltonian cycles in the standard sense correspond to $S=\{1\}$. We study the $S$-Hamiltonian cycle problem of deciding whether an input graph $G$ has an $S$-Hamiltonian cycle. Our goal is to determine the complexity of this problem depending on the fixed set $S$. It is already known that the problem remains NP-complete for $S=\{1,2\}$, whereas it is trivial for $S=\{1,2,3\}$ because any connected graph contains a $\{1,2,3\}$-Hamiltonian cycle. Our work classifies the complexity of this problem for most kinds of sets $S$, with the key new results being the following: we have NP-completeness for $S = \{2\}$ and for $S = \{2, 4\}$, but tractability for $S = \{1, 2, 4\}$, for $S = \{2, 4, 6\}$, for any superset of these two tractable cases, and for $S$ the infinite set of all odd integers. The remaining open cases are the non-singleton finite sets of odd integers, in particular $S = \{1, 3\}$. Beyond cycles, we also discuss the complexity of finding $S$-Hamiltonian paths, and show that our problems are all tractable on graphs of bounded cliquewidth.
DBJan 12, 2024
Expected Shapley-Like Scores of Boolean Functions: Complexity and Applications to Probabilistic DatabasesPratik Karmakar, Mikaël Monet, Pierre Senellart et al.
Shapley values, originating in game theory and increasingly prominent in explainable AI, have been proposed to assess the contribution of facts in query answering over databases, along with other similar power indices such as Banzhaf values. In this work we adapt these Shapley-like scores to probabilistic settings, the objective being to compute their expected value. We show that the computations of expected Shapley values and of the expected values of Boolean functions are interreducible in polynomial time, thus obtaining the same tractability landscape. We investigate the specific tractable case where Boolean functions are represented as deterministic decomposable circuits, designing a polynomial-time algorithm for this setting. We present applications to probabilistic databases through database provenance, and an effective implementation of this algorithm within the ProvSQL system, which experimentally validates its feasibility over a standard benchmark.
AIApr 16, 2021
On the Complexity of SHAP-Score-Based Explanations: Tractability via Knowledge Compilation and Non-Approximability ResultsMarcelo Arenas, Pablo Barceló, Leopoldo Bertossi et al.
In Machine Learning, the $\mathsf{SHAP}$-score is a version of the Shapley value that is used to explain the result of a learned model on a specific entity by assigning a score to every feature. While in general computing Shapley values is an intractable problem, we prove a strong positive result stating that the $\mathsf{SHAP}$-score can be computed in polynomial time over deterministic and decomposable Boolean circuits. Such circuits are studied in the field of Knowledge Compilation and generalize a wide range of Boolean circuits and binary decision diagrams classes, including binary decision trees and Ordered Binary Decision Diagrams (OBDDs). We also establish the computational limits of the SHAP-score by observing that computing it over a class of Boolean models is always polynomially as hard as the model counting problem for that class. This implies that both determinism and decomposability are essential properties for the circuits that we consider. It also implies that computing $\mathsf{SHAP}$-scores is intractable as well over the class of propositional formulas in DNF. Based on this negative result, we look for the existence of fully-polynomial randomized approximation schemes (FPRAS) for computing $\mathsf{SHAP}$-scores over such class. In contrast to the model counting problem for DNF formulas, which admits an FPRAS, we prove that no such FPRAS exists for the computation of $\mathsf{SHAP}$-scores. Surprisingly, this negative result holds even for the class of monotone formulas in DNF. These techniques can be further extended to prove another strong negative result: Under widely believed complexity assumptions, there is no polynomial-time algorithm that checks, given a monotone DNF formula $\varphi$ and features $x,y$, whether the $\mathsf{SHAP}$-score of $x$ in $\varphi$ is smaller than the $\mathsf{SHAP}$-score of $y$ in $\varphi$.
AIOct 23, 2020
Model Interpretability through the Lens of Computational ComplexityPablo Barceló, Mikaël Monet, Jorge Pérez et al.
In spite of several claims stating that some models are more interpretable than others -- e.g., "linear models are more interpretable than deep neural networks" -- we still lack a principled notion of interpretability to formally compare among different classes of models. We make a step towards such a notion by studying whether folklore interpretability claims have a correlate in terms of computational complexity theory. We focus on local post-hoc explainability queries that, intuitively, attempt to answer why individual inputs are classified in a certain way by a given model. In a nutshell, we say that a class $\mathcal{C}_1$ of models is more interpretable than another class $\mathcal{C}_2$, if the computational complexity of answering post-hoc queries for models in $\mathcal{C}_2$ is higher than for those in $\mathcal{C}_1$. We prove that this notion provides a good theoretical counterpart to current beliefs on the interpretability of models; in particular, we show that under our definition and assuming standard complexity-theoretical assumptions (such as P$\neq$NP), both linear and tree-based models are strictly more interpretable than neural networks. Our complexity analysis, however, does not provide a clear-cut difference between linear and tree-based models, as we obtain different results depending on the particular post-hoc explanations considered. Finally, by applying a finer complexity analysis based on parameterized complexity, we are able to prove a theoretical result suggesting that shallow neural networks are more interpretable than deeper ones.
AIJul 28, 2020
The Tractability of SHAP-Score-Based Explanations over Deterministic and Decomposable Boolean CircuitsMarcelo Arenas, Pablo Barceló Leopoldo Bertossi, Mikaël Monet
Scores based on Shapley values are widely used for providing explanations to classification results over machine learning models. A prime example of this is the influential SHAP-score, a version of the Shapley value that can help explain the result of a learned model on a specific entity by assigning a score to every feature. While in general computing Shapley values is a computationally intractable problem, it has recently been claimed that the SHAP-score can be computed in polynomial time over the class of decision trees. In this paper, we provide a proof of a stronger result over Boolean models: the SHAP-score can be computed in polynomial time over deterministic and decomposable Boolean circuits. Such circuits, also known as tractable Boolean circuits, generalize a wide range of Boolean circuits and binary decision diagrams classes, including binary decision trees, Ordered Binary Decision Diagrams (OBDDs) and Free Binary Decision Diagrams (FBDDs). We also establish the computational limits of the notion of SHAP-score by observing that, under a mild condition, computing it over a class of Boolean models is always polynomially as hard as the model counting problem for that class. This implies that both determinism and decomposability are essential properties for the circuits that we consider, as removing one or the other renders the problem of computing the SHAP-score intractable (namely, #P-hard).