Sebastijan Dumancic

AI
h-index9
14papers
162citations
Novelty48%
AI Score39

14 Papers

LGMar 2, 2023
DeepSaDe: Learning Neural Networks that Guarantee Domain Constraint Satisfaction

Kshitij Goyal, Sebastijan Dumancic, Hendrik Blockeel

As machine learning models, specifically neural networks, are becoming increasingly popular, there are concerns regarding their trustworthiness, specially in safety-critical applications, e.g. actions of an autonomous vehicle must be safe. There are approaches that can train neural networks where such domain requirements are enforced as constraints, but they either cannot guarantee that the constraint will be satisfied by all possible predictions (even on unseen data) or they are limited in the type of constraints that can be enforced. In this paper, we present an approach to train neural networks which can enforce a wide variety of constraints and guarantee that the constraint is satisfied by all possible predictions. The approach builds on earlier work where learning linear models is formulated as a constraint satisfaction problem (CSP). To make this idea applicable to neural networks, two crucial new elements are added: constraint propagation over the network layers, and weight updates based on a mix of gradient descent and CSP solving. Evaluation on various machine learning tasks demonstrates that our approach is flexible enough to enforce a wide variety of domain constraints and is able to guarantee them in neural networks.

PLJul 10, 2025
Modelling Program Spaces in Program Synthesis with Constraints

Tilman Hinnerichs, Bart Swinkels, Jaap de Jong et al.

A core challenge in program synthesis is taming the large space of possible programs. Since program synthesis is essentially a combinatorial search, the community has sought to leverage powerful combinatorial constraint solvers. Here, constraints are used to express the program semantics, but not as a potentially potent tool to remove unwanted programs. Recent inductive logic programming approaches introduce constraints on the program's syntax to be synthesized. These syntactic constraints allow for checking and propagating a constraint without executing the program, and thus for arbitrary operators. In this work, we leverage syntactic constraints to model program spaces, defining not just solutions that are feasible, but also ones that are likely useful. To demonstrate this idea, we introduce BART, a solver that efficiently propagates and solves these constraints. We evaluate BART on program space enumeration tasks, finding that the constraints eliminate up to 99 percent of the program space, and that modeling program spaces significantly reduces enumeration time.

AIMay 15, 2024
Declarative Design of Neural Predicates in Neuro-Symbolic Systems

Tilman Hinnerichs, Robin Manhaeve, Giuseppe Marra et al.

Neuro-symbolic systems (NeSy), which claim to combine the best of both learning and reasoning capabilities of artificial intelligence, are missing a core property of reasoning systems: Declarativeness. The lack of declarativeness is caused by the functional nature of neural predicates inherited from neural networks. We propose and implement a general framework for fully declarative neural predicates, which hence extends to fully declarative NeSy frameworks. We first show that the declarative extension preserves the learning and reasoning capabilities while being able to answer arbitrary queries while only being trained on a single query type.

PLOct 10, 2025
Herb.jl: A Unifying Program Synthesis Library

Tilman Hinnerichs, Reuben Gardos Reid, Jaap de Jong et al.

Program synthesis -- the automatic generation of code given a specification -- is one of the most fundamental tasks in artificial intelligence (AI) and many programmers' dream. Numerous synthesizers have been developed to tackle program synthesis, manifesting different ideas to approach the exponentially growing program space. While numerous smart program synthesis tools exist, reusing and remixing previously developed methods is tedious and time-consuming. We propose Herb.jl, a unifying program synthesis library written in the Julia programming language, to address these issues. Since current methods rely on similar building blocks, we aim to modularize the underlying synthesis algorithm into communicating and fully extendable sub-compartments, allowing for straightforward reapplication of these modules. To demonstrate the benefits of using Herb.jl, we show three common use cases: 1. how to implement a simple problem and grammar, and how to solve it, 2. how to implement a previously developed synthesizer with just a few lines of code, and 3. how to run a synthesizer against a benchmark.

LGDec 1, 2021
SaDe: Learning Models that Provably Satisfy Domain Constraints

Kshitij Goyal, Sebastijan Dumancic, Hendrik Blockeel

In many real world applications of machine learning, models have to meet certain domain-based requirements that can be expressed as constraints (e.g., safety-critical constraints in autonomous driving systems). Such constraints are often handled by including them in a regularization term, while learning a model. This approach, however, does not guarantee 100% satisfaction of the constraints: it only reduces violations of the constraints on the training set rather than ensuring that the predictions by the model will always adhere to them. In this paper, we present a framework for learning models that provably fulfil the constraints under all circumstances (i.e., also on unseen data). To achieve this, we cast learning as a maximum satisfiability problem, and solve it using a novel SaDe algorithm that combines constraint satisfaction with gradient descent. We compare our method against regularization based baselines on linear models and show that our method is capable of enforcing different types of domain constraints effectively on unseen data, without sacrificing predictive performance.

LGJul 11, 2020
Feature Interactions in XGBoost

Kshitij Goyal, Sebastijan Dumancic, Hendrik Blockeel

In this paper, we investigate how feature interactions can be identified to be used as constraints in the gradient boosting tree models using XGBoost's implementation. Our results show that accurate identification of these constraints can help improve the performance of baseline XGBoost model significantly. Further, the improvement in the model structure can also lead to better interpretability.

AIApr 21, 2020
Knowledge Refactoring for Inductive Program Synthesis

Sebastijan Dumancic, Tias Guns, Andrew Cropper

Humans constantly restructure knowledge to use it more efficiently. Our goal is to give a machine learning system similar abilities so that it can learn more efficiently. We introduce the \textit{knowledge refactoring} problem, where the goal is to restructure a learner's knowledge base to reduce its size and to minimise redundancy in it. We focus on inductive logic programming, where the knowledge base is a logic program. We introduce Knorf, a system which solves the refactoring problem using constraint optimisation. We evaluate our approach on two program induction domains: real-world string transformations and building Lego structures. Our experiments show that learning from refactored knowledge can improve predictive accuracies fourfold and reduce learning times by half.

LGMar 29, 2019
Learning Relational Representations with Auto-encoding Logic Programs

Sebastijan Dumancic, Tias Guns, Wannes Meert et al.

Deep learning methods capable of handling relational data have proliferated over the last years. In contrast to traditional relational learning methods that leverage first-order logic for representing such data, these deep learning methods aim at re-representing symbolic relational data in Euclidean spaces. They offer better scalability, but can only numerically approximate relational structures and are less flexible in terms of reasoning tasks supported. This paper introduces a novel framework for relational representation learning that combines the best of both worlds. This framework, inspired by the auto-encoding principle, uses first-order logic as a data representation language, and the mapping between the original and latent representation is done by means of logic programs instead of neural networks. We show how learning can be cast as a constraint optimisation problem for which existing solvers can be used. The use of logic as a representation language makes the proposed framework more accurate (as the representation is exact, rather than approximate), more flexible, and more interpretable than deep learning methods. We experimentally show that these latent representations are indeed beneficial in relational learning tasks.

AIJun 29, 2018
A Comparative Study of Distributional and Symbolic Paradigms for Relational Learning

Sebastijan Dumancic, Alberto Garcia-Duran, Mathias Niepert

Many real-world domains can be expressed as graphs and, more generally, as multi-relational knowledge graphs. Though reasoning and learning with knowledge graphs has traditionally been addressed by symbolic approaches, recent methods in (deep) representation learning has shown promising results for specialized tasks such as knowledge base completion. These approaches abandon the traditional symbolic paradigm by replacing symbols with vectors in Euclidean space. With few exceptions, symbolic and distributional approaches are explored in different communities and little is known about their respective strengths and weaknesses. In this work, we compare representation learning and relational learning on various relational classification and clustering tasks and analyse the complexity of the rules used implicitly by these approaches. Preliminary results reveal possible indicators that could help in choosing one approach over the other for particular knowledge graphs.

MLMay 2, 2018
COBRAS-TS: A new approach to Semi-Supervised Clustering of Time Series

Toon Van Craenendonck, Wannes Meert, Sebastijan Dumancic et al.

Clustering is ubiquitous in data analysis, including analysis of time series. It is inherently subjective: different users may prefer different clusterings for a particular dataset. Semi-supervised clustering addresses this by allowing the user to provide examples of instances that should (not) be in the same cluster. This paper studies semi-supervised clustering in the context of time series. We show that COBRAS, a state-of-the-art semi-supervised clustering method, can be adapted to this setting. We refer to this approach as COBRAS-TS. An extensive experimental evaluation supports the following claims: (1) COBRAS-TS far outperforms the current state of the art in semi-supervised clustering for time series, and thus presents a new baseline for the field; (2) COBRAS-TS can identify clusters with separated components; (3) COBRAS-TS can identify clusters that are characterized by small local patterns; (4) a small amount of semi-supervision can greatly improve clustering quality for time series; (5) the choice of the clustering algorithm matters (contrary to earlier claims in the literature).

AIJan 30, 2018
COBRA: A Fast and Simple Method for Active Clustering with Pairwise Constraints

Toon Van Craenendonck, Sebastijan Dumancic, Hendrik Blockeel

Clustering is inherently ill-posed: there often exist multiple valid clusterings of a single dataset, and without any additional information a clustering system has no way of knowing which clustering it should produce. This motivates the use of constraints in clustering, as they allow users to communicate their interests to the clustering system. Active constraint-based clustering algorithms select the most useful constraints to query, aiming to produce a good clustering using as few constraints as possible. We propose COBRA, an active method that first over-clusters the data by running K-means with a $K$ that is intended to be too large, and subsequently merges the resulting small clusters into larger ones based on pairwise constraints. In its merging step, COBRA is able to keep the number of pairwise queries low by maximally exploiting constraint transitivity and entailment. We experimentally show that COBRA outperforms the state of the art in terms of clustering quality and runtime, without requiring the number of clusters in advance.

MLJun 28, 2016
Theory reconstruction: a representation learning view on predicate invention

Sebastijan Dumancic, Wannes Meert, Hendrik Blockeel

With this positional paper we present a representation learning view on predicate invention. The intention of this proposal is to bridge the relational and deep learning communities on the problem of predicate invention. We propose a theory reconstruction approach, a formalism that extends autoencoder approach to representation learning to the relational settings. Our intention is to start a discussion to define a unifying framework for predicate invention and theory revision.

MLJun 28, 2016
Clustering-Based Relational Unsupervised Representation Learning with an Explicit Distributed Representation

Sebastijan Dumancic, Hendrik Blockeel

The goal of unsupervised representation learning is to extract a new representation of data, such that solving many different tasks becomes easier. Existing methods typically focus on vectorized data and offer little support for relational data, which additionally describe relationships among instances. In this work we introduce an approach for relational unsupervised representation learning. Viewing a relational dataset as a hypergraph, new features are obtained by clustering vertices and hyperedges. To find a representation suited for many relational learning tasks, a wide range of similarities between relational objects is considered, e.g. feature and structural similarities. We experimentally evaluate the proposed approach and show that models learned on such latent representations perform better, have lower complexity, and outperform the existing approaches on classification tasks.

MLApr 29, 2016
An expressive dissimilarity measure for relational clustering using neighbourhood trees

Sebastijan Dumancic, Hendrik Blockeel

Clustering is an underspecified task: there are no universal criteria for what makes a good clustering. This is especially true for relational data, where similarity can be based on the features of individuals, the relationships between them, or a mix of both. Existing methods for relational clustering have strong and often implicit biases in this respect. In this paper, we introduce a novel similarity measure for relational data. It is the first measure to incorporate a wide variety of types of similarity, including similarity of attributes, similarity of relational context, and proximity in a hypergraph. We experimentally evaluate how using this similarity affects the quality of clustering on very different types of datasets. The experiments demonstrate that (a) using this similarity in standard clustering methods consistently gives good results, whereas other measures work well only on datasets that match their bias; and (b) on most datasets, the novel similarity outperforms even the best among the existing ones.