LOMay 22
The complexity of Presburger arithmetic with power or powersMichael Benedikt, Dmitry Chistikov, Alessio Mansutti
We investigate expansions of Presburger arithmetic, i.e., the theory of the integers with addition and order, with additional structure related to exponentiation: either a function that takes a number to the power of $2$, or a predicate for the powers of $2$. The latter theory, denoted $\mathrm{PresPower}$, was introduced by Büchi as a first attempt at characterizing the sets of tuples of numbers that can be expressed using finite automata; Büchi's method does not give an elementary upper bound, and the complexity of this theory has been open. The former theory, denoted as $\mathrm{PresExp}$, was shown decidable by Semenov; while the decision procedure for this theory differs radically from the automata-based method proposed by Büchi, Semenov's method is also non-elementary. And in fact, the theory with the power function has a non-elementary lower bound. In this paper, we show that while Semenov's and Büchi's approaches yield non-elementary blow-ups for $\mathrm{PresPower}$, the theory is in fact decidable in triply exponential time, similarly to the best known quantifier-elimination algorithm for Presburger arithmetic. We also provide a $\mathrm{NExpTime}$ upper bound for the existential fragment of $\mathrm{PresExp}$, a step towards a finer-grained analysis of its complexity. Both these results are established by analyzing a single parameterized satisfiability algorithm for $\mathrm{PresExp}$, which can be specialized to either the setting of $\mathrm{PresPower}$ or the existential theory of $\mathrm{PresExp}$. Besides the new upper bounds for the existential theory of $\mathrm{PresExp}$ and $\mathrm{PresPower}$, we believe our algorithm provides new intuition for the decidability of these theories, and for the features that lead to non-elementary blow-ups.
DBMay 29
Revisiting the Expressiveness Landscape of Data Graph QueriesMichael Benedikt, Anthony Widjaja Lin, Di-De Yen
The study of graph queries in database theory has spanned more than three decades, resulting in a multitude of proposals for graph query languages. We can identify three main families of languages, with the canonical representatives being: (1) regular path queries, (2) walk logic, and (3) first-order logic with transitive closure operators. This paper provides a complete picture of the expressive power of these languages in the context of data graphs. Specifically, we consider a graph data model that supports querying over both data and topology. For example, ``Does there exist a path between two different persons in a social network with the same last name?''. We also show that an extension of (1) with regular path comparisons, augmented with transitive closure operators, can unify the expressivity of (1)--(3).
LOMay 31
How (and when) can you fit examples to logic-based hypothesis classes over infinite structures?Michael Benedikt, Alessio Mansutti
We study fitting problems, sometimes called ``training problems'', where we have a finite sample consisting of inputs and outputs, and we want to know whether there is a function in a certain class that could produce these outputs, exactly or approximately, on the given inputs. We focus on the computational and descriptive complexity of fitting for logically-defined classes in common decidable structures, like the real ordered field and Presburger arithmetic, and also for broader classes defined via combinatorial or model-theoretic properties. We isolate the complexity of these fitting problems, with particular attention to cases where we can use queries in a natural query language over the sample to determine whether a sample is fittable.
LGJul 2, 2023
Towards Unbiased Exploration in Partial Label LearningZsolt Zombori, Agapi Rissaki, Kristóf Szabó et al.
We consider learning a probabilistic classifier from partially-labelled supervision (inputs denoted with multiple possibilities) using standard neural architectures with a softmax as the final layer. We identify a bias phenomenon that can arise from the softmax layer in even simple architectures that prevents proper exploration of alternative options, making the dynamics of gradient descent overly sensitive to initialisation. We introduce a novel loss function that allows for unbiased exploration within the space of alternative outputs. We give a theoretical justification for our loss function, and provide an extensive evaluation of its impact on synthetic data, on standard partially labelled benchmarks and on a contributed novel benchmark related to an existing rule learning challenge.
LOMar 21
Tighter Bounds for Query Answering with Guarded TGDsAntoine Amarilli, Michael Benedikt
We consider the complexity of the open-world query answering problem, where we wish to determine certain answers to conjunctive queries over incomplete datasets specified by an initial set of facts and a set of guarded TGDs. This problem has been well-studied in the literature and is decidable but with a high complexity, namely, it is 2EXPTIME complete. Further, the complexity shrinks by one exponential when the arity is fixed. We show in this paper how we can obtain better complexity bounds when considering separately the arity of the guard atom and that of the additional atoms, called the side signature. Our results make use of the technique of linearizing guarded TGDs, introduced in Gottlob, Manna, and Pieris. Specifically, we present a variant of the linearization process, making use of a restricted version of the chase that we recently introduced. Our results imply that open-world query answering with guarded TGDs can be solved in EXPTIME with arbitrary-arity guard relations if we simply bound the arity of the side signature; and that the complexity drops to NP if we fix the side signature and bound the width of the dependencies.
LOJan 29
How Expressive Are Graph Neural Networks in the Presence of Node Identifiers?Arie Soeteman, Michael Benedikt, Martin Grohe et al.
Graph neural networks (GNNs) are a widely used class of machine learning models for graph-structured data, based on local aggregation over neighbors. GNNs have close connections to logic. In particular, their expressive power is linked to that of modal logics and bounded-variable logics with counting. In many practical scenarios, graphs processed by GNNs have node features that act as unique identifiers. In this work, we study how such identifiers affect the expressive power of GNNs. We initiate a study of the key-invariant expressive power of GNNs, inspired by the notion of order-invariant definability in finite model theory: which node queries that depend only on the underlying graph structure can GNNs express on graphs with unique node identifiers? We provide answers for various classes of GNNs with local max- or sum-aggregation.
LGMar 6, 2024
Almost Surely Asymptotically Constant Graph Neural NetworksSam Adam-Day, Michael Benedikt, İsmail İlkan Ceylan et al.
We present a new angle on the expressive power of graph neural networks (GNNs) by studying how the predictions of real-valued GNN classifiers, such as those classifying graphs probabilistically, evolve as we apply them on larger graphs drawn from some random graph model. We show that the output converges to a constant function, which upper-bounds what these classifiers can uniformly express. This strong convergence phenomenon applies to a very wide class of GNNs, including state of the art models, with aggregates including mean and the attention-based mechanism of graph transformers. Our results apply to a broad class of random graph models, including sparse and dense variants of the Erdős-Rényi model, the stochastic block model, and the Barabási-Albert model. We empirically validate these findings, observing that the convergence phenomenon appears not only on random graphs but also on some real-world graphs.
LOApr 1, 2025
From learnable objects to learnable random objectsAaron Anderson, Michael Benedikt
We consider the relationship between learnability of a "base class" of functions on a set $X$, and learnability of a class of statistical functions derived from the base class. For example, we refine results showing that learnability of a family $h_p: p \in Y$ of functions implies learnability of the family of functions $h_μ=λp: Y. E_μ(h_p)$, where $E_μ$ is the expectation with respect to $μ$, and $μ$ ranges over probability distributions on $X$. We will look at both Probably Approximately Correct (PAC) learning, where example inputs and outputs are chosen at random, and online learning, where the examples are chosen adversarily. For agnostic learning, we establish improved bounds on the sample complexity of learning for statistical classes, stated in terms of combinatorial dimensions of the base class. We connect these problems to techniques introduced in model theory for "randomizing a structure". We also provide counterexamples for realizable learning, in both the PAC and online settings.
LGOct 21, 2025
Robustness Verification of Graph Neural Networks Via Lightweight Satisfiability TestingChia-Hsuan Lu, Tony Tan, Michael Benedikt
Graph neural networks (GNNs) are the predominant architecture for learning over graphs. As with any machine learning model, and important issue is the detection of adversarial attacks, where an adversary can change the output with a small perturbation of the input. Techniques for solving the adversarial robustness problem - determining whether such an attack exists - were originally developed for image classification, but there are variants for many other machine learning architectures. In the case of graph learning, the attack model usually considers changes to the graph structure in addition to or instead of the numerical features of the input, and the state of the art techniques in the area proceed via reduction to constraint solving, working on top of powerful solvers, e.g. for mixed integer programming. We show that it is possible to improve on the state of the art in structural robustness by replacing the use of powerful solvers by calls to efficient partial solvers, which run in polynomial time but may be incomplete. We evaluate our tool RobLight on a diverse set of GNN variants and datasets.
LGJul 20, 2025
Constraint-aware Learning of Probabilistic Sequential Models for Multi-Label ClassificationMykhailo Buleshnyi, Anna Polova, Zsolt Zombori et al.
We investigate multi-label classification involving large sets of labels, where the output labels may be known to satisfy some logical constraints. We look at an architecture in which classifiers for individual labels are fed into an expressive sequential model, which produces a joint distribution. One of the potential advantages for such an expressive model is its ability to modelling correlations, as can arise from constraints. We empirically demonstrate the ability of the architecture both to exploit constraints in training and to enforce constraints at inference time.
LOFeb 17, 2022
Query Answering with Transitive and Linear-Ordered DataAntoine Amarilli, Michael Benedikt, Pierre Bourhis et al.
We consider entailment problems involving powerful constraint languages such as frontier-guarded existential rules in which we impose additional semantic restrictions on a set of distinguished relations. We consider restricting a relation to be transitive, restricting a relation to be the transitive closure of another relation, and restricting a relation to be a linear order. We give some natural variants of guardedness that allow inference to be decidable in each case, and isolate the complexity of the corresponding decision problems. Finally we show that slight changes in these conditions lead to undecidability.
LOJun 3, 2019
Reasoning about disclosure in data integration in the presence of source constraintsMichael Benedikt, Pierre Bourhis, Louis Jachiet et al.
Data integration systems allow users to access data sitting in multiple sources by means of queries over a global schema, related to the sources via mappings. Data sources often contain sensitive information, and thus an analysis is needed to verify that a schema satisfies a privacy policy, given as a set of queries whose answers should not be accessible to users. Such an analysis should take into account not only knowledge that an attacker may have about the mappings, but also what they may know about the semantics of the sources. In this paper, we show that source constraints can have a dramatic impact on disclosure analysis. We study the problem of determining whether a given data integration system discloses a source query to an attacker in the presence of constraints, providing both lower and upper bounds on source-aware disclosure analysis.
AINov 14, 2017
Goal-Driven Query Answering for Existential Rules with EqualityMichael Benedikt, Boris Motik, Efthymia Tsamoura
Inspired by the magic sets for Datalog, we present a novel goal-driven approach for answering queries over terminating existential rules with equality (aka TGDs and EGDs). Our technique improves the performance of query answering by pruning the consequences that are not relevant for the query. This is challenging in our setting because equalities can potentially affect all predicates in a dataset. We address this problem by combining the existing singularization technique with two new ingredients: an algorithm for identifying the rules relevant to a query and a new magic sets algorithm. We show empirically that our technique can significantly improve the performance of query answering, and that it can mean the difference between answering a query in a few seconds or not being able to process the query at all.