Balder ten Cate

LO
h-index35
13papers
1,394citations
Novelty56%
AI Score58

13 Papers

38.6LOMay 14
The Size of Interpolants in Modal Logics

Balder ten Cate, Louwe Kuijer, Frank Wolter

We start a systematic investigation of the size of Craig interpolants, uniform interpolants, and strongest implicates for (quasi-)normal modal logics. Our main upper bound states that for tabular modal logics, the computation of strongest implicates can be reduced in polynomial time to uniform interpolant computation in classical propositional logic. Hence they are of polynomial dag-size iff NP is included in P/poly. The reduction also holds for Craig interpolants and uniform interpolants if the tabular modal logic has the Craig interpolation property. Our main lower bound shows an unconditional exponential lower bound on the size of Craig interpolants and strongest implicates covering almost all non-tabular standard normal modal logics. For normal modal logics contained in or containing S4 or GL we obtain the following dichotomy: tabular logics have ``propositionally sized'' interpolants while for non-tabular logics an unconditional exponential lower bound holds.

DBAug 22, 2022
On the non-efficient PAC learnability of conjunctive queries

Balder ten Cate, Maurice Funk, Jean Christoph Jung et al.

This note serves three purposes: (i) we provide a self-contained exposition of the fact that conjunctive queries are not efficiently learnable in the Probably-Approximately-Correct (PAC) model, paying clear attention to the complicating fact that this concept class lacks the polynomial-size fitting property, a property that is tacitly assumed in much of the computational learning theory literature; (ii) we establish a strong negative PAC learnability result that applies to many restricted classes of conjunctive queries (CQs), including acyclic CQs for a wide range of notions of "acyclicity"; (iii) we show that CQs (and UCQs) are efficiently PAC learnable with membership queries.

44.1LOApr 23
Characterizing LTL Formulas by Examples

Balder ten Cate, Dana Fisman, Roi Ohayon et al.

We investigate the extent to which Linear Temporal Logic (LTL) formulas can be uniquely characterized by a finite set of labeled examples. We consider different types of examples, ranging from finite words to transfinite words, as well as schematic examples. In the finite-word setting, we provide a complete classification of basis-restricted LTL fragments that admit such unique characterizations. Next, we show that allowing transfinite words as examples enables finite unique characterizations for large monotone fragments of LTL. Finally, we introduce schematic examples, i.e., patterns that compactly represent a family of finite words, and we show that these enable unique characterization results in the finite setting that were not possible with ordinary finite examples alone. Overall, the work provides a foundational account of the descriptive power of different example types for example-driven specification, debugging, and learning of temporal properties.

89.7LOMar 12
When do modal definability and preservation theorems transfer to the finite?

Johan van Benthem, Balder ten Cate, Xi Yang

We study which classic modal definability and preservation results survive when attention is restricted to finite structures, where many first-order transfer theorems are known to break down. Several semantic characterizations for modal formula classes survive the passage to the finite, while a number of first-order preservation theorems for basic frame operations fail. Our main positive result is that the Bisimulation Safety Theorem does transfer to finite structures. We also discuss computability aspects, and analogues in the finite for the Goldblatt-Thomason theorem and for modal correspondence theory.

LOJan 29
How Expressive Are Graph Neural Networks in the Presence of Node Identifiers?

Arie Soeteman, Michael Benedikt, Martin Grohe et al.

Graph neural networks (GNNs) are a widely used class of machine learning models for graph-structured data, based on local aggregation over neighbors. GNNs have close connections to logic. In particular, their expressive power is linked to that of modal logics and bounded-variable logics with counting. In many practical scenarios, graphs processed by GNNs have node features that act as unique identifiers. In this work, we study how such identifiers affect the expressive power of GNNs. We initiate a study of the key-invariant expressive power of GNNs, inspired by the notion of order-invariant definability in finite model theory: which node queries that depend only on the underlying graph structure can GNNs express on graphs with unique node identifiers? We provide answers for various classes of GNNs with local max- or sum-aggregation.

78.3DBMay 18
Expressive Power of Deep Homomorphism Networks over Relational Databases

Moritz Schönherr, Balder ten Cate, Maurice Funk et al.

The expressive limitations of message-passing Graph Neural Networks (GNNs) have motivated a wide range of more powerful graph learning architectures. We advocate Deep Homomorphism Networks (DHNs) as a model particularly well-suited for learning over relational databases, due to their close connection to important fragments of SQL such as conjunctive queries. We study the precise expressive power of DHNs by relating them to various natural fragments and extensions of first-order logic (FO). For DHNs with max, sum, and mean aggregations, we establish connections to the unary negation fragment (UNFO) and to the extensions of UNFO with counting quantifiers and with ratio quantifiers. We further relate sum-aggregation DHNs to the unary quantifier alternation fragment of FO and to an extension of FO with expressive counting. Through the classical correspondence between FO and SQL, these results also illuminate the relation between DHNs and SQL. They also enable us to study the decidability of two fundamental static analysis problems for DHNs, the emptiness problem and the subsumption problem. Finally, we confirm through experiments that the established differences in expressive power are reflected in the performance on suitable prediction tasks.

LODec 23, 2024
On the Power and Limitations of Examples for Description Logic Concepts

Balder ten Cate, Raoul Koudijs, Ana Ozaki

Labeled examples (i.e., positive and negative examples) are an attractive medium for communicating complex concepts. They are useful for deriving concept expressions (such as in concept learning, interactive concept specification, and concept refinement) as well as for illustrating concept expressions to a user or domain expert. We investigate the power of labeled examples for describing description-logic concepts. Specifically, we systematically study the existence and efficient computability of finite characterisations, i.e. finite sets of labeled examples that uniquely characterize a single concept, for a wide variety of description logics between EL and ALCQI, both without an ontology and in the presence of a DL-Lite ontology. Finite characterisations are relevant for debugging purposes, and their existence is a necessary condition for exact learnability with membership queries.

LGJun 16, 2025
Logical Expressiveness of Graph Neural Networks with Hierarchical Node Individualization

Arie Soeteman, Balder ten Cate

We propose and study Hierarchical Ego Graph Neural Networks (HEGNNs), an expressive extension of graph neural networks (GNNs) with hierarchical node individualization, inspired by the Individualization-Refinement paradigm for graph isomorphism testing. HEGNNs generalize subgraph-GNNs and form a hierarchy of increasingly expressive models that, in the limit, can distinguish graphs up to isomorphism. We provide a logical characterization of HEGNN node classifiers, with and without subgraph restrictions, using graded hybrid logic. This characterization enables us to relate the separating power of HEGNNs to that of higher-order GNNs, GNNs enriched with local homomorphism count features, and color refinement algorithms based on Individualization-Refinement. Our experimental results confirm the practical feasibility of HEGNNs and show benefits in comparison with traditional GNN architectures, both with and without local homomorphism count features.

AIMay 15, 2023
SAT-Based PAC Learning of Description Logic Concepts

Balder ten Cate, Maurice Funk, Jean Christoph Jung et al.

We propose bounded fitting as a scheme for learning description logic concepts in the presence of ontologies. A main advantage is that the resulting learning algorithms come with theoretical guarantees regarding their generalization to unseen examples in the sense of PAC learning. We prove that, in contrast, several other natural learning algorithms fail to provide such guarantees. As a further contribution, we present the system SPELL which efficiently implements bounded fitting for the description logic $\mathcal{ELH}^r$ based on a SAT solver, and compare its performance to a state-of-the-art learner.

LOAug 16, 2020
Conjunctive Queries: Unique Characterizations and Exact Learnability

Balder ten Cate, Victor Dalmau

We answer the question which conjunctive queries are uniquely characterized by polynomially many positive and negative examples, and how to construct such examples efficiently. As a consequence, we obtain a new efficient exact learning algorithm for a class of conjunctive queries. At the core of our contributions lie two new polynomial-time algorithms for constructing frontiers in the homomorphism lattice of finite structures. We also discuss implications for the unique characterizability and learnability of schema mappings and of description logic concepts.

CLMay 29, 2019
Learning Multilingual Word Embeddings Using Image-Text Data

Karan Singhal, Karthik Raman, Balder ten Cate

There has been significant interest recently in learning multilingual word embeddings -- in which semantically similar words across languages have similar embeddings. State-of-the-art approaches have relied on expensive labeled data, which is unavailable for low-resource languages, or have involved post-hoc unification of monolingual embeddings. In the present paper, we investigate the efficacy of multilingual embeddings learned from weakly-supervised image-text data. In particular, we propose methods for learning multilingual embeddings using image-text data, by enforcing similarity between the representations of the image and that of the text. Our experiments reveal that even without using any expensive labeled data, a bag-of-words-based embedding model trained on image-text data achieves performance comparable to the state-of-the-art on crosslingual semantic similarity tasks.

DBDec 6, 2014
Declarative Statistical Modeling with Datalog

Vince Barany, Balder ten Cate, Benny Kimelfeld et al.

Formalisms for specifying statistical models, such as probabilistic-programming languages, typically consist of two components: a specification of a stochastic process (the prior), and a specification of observations that restrict the probability space to a conditional subspace (the posterior). Use cases of such formalisms include the development of algorithms in machine learning and artificial intelligence. We propose and investigate a declarative framework for specifying statistical models on top of a database, through an appropriate extension of Datalog. By virtue of extending Datalog, our framework offers a natural integration with the database, and has a robust declarative semantics. Our Datalog extension provides convenient mechanisms to include numerical probability functions; in particular, conclusions of rules may contain values drawn from such functions. The semantics of a program is a probability distribution over the possible outcomes of the input database with respect to the program; these outcomes are minimal solutions with respect to a related program with existentially quantified variables in conclusions. Observations are naturally incorporated by means of integrity constraints over the extensional and intensional relations. We focus on programs that use discrete numerical distributions, but even then the space of possible outcomes may be uncountable (as a solution can be infinite). We define a probability measure over possible outcomes by applying the known concept of cylinder sets to a probabilistic chase procedure. We show that the resulting semantics is robust under different chases. We also identify conditions guaranteeing that all possible outcomes are finite (and then the probability space is discrete). We argue that the framework we propose retains the purely declarative nature of Datalog, and allows for natural specifications of statistical models.

DBJan 28, 2013
Ontology-based Data Access: A Study through Disjunctive Datalog, CSP, and MMSNP

Meghyn Bienvenu, Balder ten Cate, Carsten Lutz et al.

Ontology-based data access is concerned with querying incomplete data sources in the presence of domain-specific knowledge provided by an ontology. A central notion in this setting is that of an ontology-mediated query, which is a database query coupled with an ontology. In this paper, we study several classes of ontology-mediated queries, where the database queries are given as some form of conjunctive query and the ontologies are formulated in description logics or other relevant fragments of first-order logic, such as the guarded fragment and the unary-negation fragment. The contributions of the paper are three-fold. First, we characterize the expressive power of ontology-mediated queries in terms of fragments of disjunctive datalog. Second, we establish intimate connections between ontology-mediated queries and constraint satisfaction problems (CSPs) and their logical generalization, MMSNP formulas. Third, we exploit these connections to obtain new results regarding (i) first-order rewritability and datalog-rewritability of ontology-mediated queries, (ii) P/NP dichotomies for ontology-mediated queries, and (iii) the query containment problem for ontology-mediated queries.