LGMar 25, 2023
Verifying Properties of Tsetlin MachinesEmilia Przybysz, Bimal Bhattarai, Cosimo Persia et al.
Tsetlin Machines (TsMs) are a promising and interpretable machine learning method which can be applied for various classification tasks. We present an exact encoding of TsMs into propositional logic and formally verify properties of TsMs using a SAT solver. In particular, we introduce in this work a notion of similarity of machine learning models and apply our notion to check for similarity of TsMs. We also consider notions of robustness and equivalence from the literature and adapt them for TsMs. Then, we show the correctness of our encoding and provide results for the properties: adversarial robustness, equivalence, and similarity of TsMs. In our experiments, we employ the MNIST and IMDB datasets for (respectively) image and sentiment classification. We discuss the results for verifying robustness obtained with TsMs with those in the literature obtained with Binarized Neural Networks on MNIST.
LGApr 1, 2022
Extracting Rules from Neural Networks with Partial InterpretationsCosimo Persia, Ana Ozaki
We investigate the problem of extracting rules, expressed in Horn logic, from neural network models. Our work is based on the exact learning model, in which a learner interacts with a teacher (the neural network model) via queries in order to learn an abstract target concept, which in our case is a set of Horn rules. We consider partial interpretations to formulate the queries. These can be understood as a representation of the world where part of the knowledge regarding the truthiness of propositions is unknown. We employ Angluin s algorithm for learning Horn rules via queries and evaluate our strategy empirically.
AISep 14, 2022
Finding Common Ground for Incoherent Horn ExpressionsAna Ozaki, Anum Rehman, Philip Turk et al.
Autonomous systems that operate in a shared environment with people need to be able to follow the rules of the society they occupy. While laws are unique for one society, different people and institutions may use different rules to guide their conduct. We study the problem of reaching a common ground among possibly incoherent rules of conduct. We formally define a notion of common ground and discuss the main properties of this notion. Then, we identify three sufficient conditions on the class of Horn expressions for which common grounds are guaranteed to exist. We provide a polynomial time algorithm that computes common grounds, under these conditions. We also show that if any of the three conditions is removed then common grounds for the resulting (larger) class may not exist.
LGJun 2, 2022
On the Effectiveness of Knowledge Graph Embeddings: a Rule Mining ApproachJohanna Jøsang, Ricardo Guimarães, Ana Ozaki
We study the effectiveness of Knowledge Graph Embeddings (KGE) for knowledge graph (KG) completion with rule mining. More specifically, we mine rules from KGs before and after they have been completed by a KGE to compare possible differences in the rules extracted. We apply this method to classical KGEs approaches, in particular, TransE, DistMult and ComplEx. Our experiments indicate that there can be huge differences between the extracted rules, depending on the KGE approach for KG completion. In particular, after the TransE completion, several spurious rules were extracted.
AIAug 9, 2024
Knowledge Base Embeddings: Semantics and Theoretical PropertiesCamille Bourgaux, Ricardo Guimarães, Raoul Koudijs et al.
Research on knowledge graph embeddings has recently evolved into knowledge base embeddings, where the goal is not only to map facts into vector spaces but also constrain the models so that they take into account the relevant conceptual knowledge available. This paper examines recent methods that have been proposed to embed knowledge bases in description logic into vector spaces through the lens of their geometric-based semantics. We identify several relevant theoretical properties, which we draw from the literature and sometimes generalize or unify. We then investigate how concrete embedding methods fit in this theoretical framework.
CLNov 5, 2023
Rule Learning as Machine Translation using the Atomic Knowledge BankKristoffer Æsøy, Ana Ozaki
Machine learning models, and in particular language models, are being applied to various tasks that require reasoning. While such models are good at capturing patterns their ability to reason in a trustable and controlled manner is frequently questioned. On the other hand, logic-based rule systems allow for controlled inspection and already established verification methods. However it is well-known that creating such systems manually is time-consuming and prone to errors. We explore the capability of transformers to translate sentences expressing rules in natural language into logical rules. We see reasoners as the most reliable tools for performing logical reasoning and focus on translating language into the format expected by such tools. We perform experiments using the DKET dataset from the literature and create a dataset for language to logic translation based on the Atomic knowledge bank.
LOOct 25, 2023
Semiring Provenance for Lightweight Description LogicsCamille Bourgaux, Ana Ozaki, Rafael Peñaloza
We investigate semiring provenance--a successful framework originally defined in the relational database setting--for description logics. In this context, the ontology axioms are annotated with elements of a commutative semiring and these annotations are propagated to the ontology consequences in a way that reflects how they are derived. We define a provenance semantics for a language that encompasses several lightweight description logics and show its relationships with semantics that have been defined for ontologies annotated with a specific kind of annotation (such as fuzzy degrees). We show that under some restrictions on the semiring, the semantics satisfies desirable properties (such as extending the semiring provenance defined for databases). We then focus on the well-known why-provenance, for which we study the complexity of problems related to the provenance of an assertion or a conjunctive query answer. Finally, we consider two more restricted cases which correspond to the so-called positive Boolean provenance and lineage in the database setting. For these cases, we exhibit relationships with well-known notions related to explanations in description logics and complete our complexity analysis. As a side contribution, we provide conditions on an $\mathcal{ELHI}_\bot$ ontology that guarantee tractable reasoning.
10.3AIApr 27
BoxLitE: A Faithful Knowledge Base Embedding Based on Convex OptimizationBruno F. Lourenço, Hesham Morgan, Ana Ozaki et al.
Knowledge base (KB) embeddings aim at combining the capability of classical knowledge graph embeddings to generalize the information present in facts, the ABox, with conceptual knowledge represented in an ontology language, the TBox. Several authors have recently explored the idea of mapping concepts to convex regions in a vector space. This is useful to represent hierarchies, typically present in TBoxes, since more general concepts can be mapped to larger regions, containing those regions associated with more specific concepts. However, the power of convexity is rarely leveraged during the actual learning tasks. Here, we introduce BoxLitE, a KB embedding model for DL-Lite$^{\mathcal{H}}$ that allows for convex optimization. We show that for any satisfiable DL-Lite$^{\mathcal{H}}$ KB, there is a BoxLitE embedding that is a weakly faithful model. As a proof of concept, we show how to formulate the KB embedding task as a convex optimization problem and how to obtain embeddings with such desirable faithfulness properties.
LODec 23, 2024
On the Power and Limitations of Examples for Description Logic ConceptsBalder ten Cate, Raoul Koudijs, Ana Ozaki
Labeled examples (i.e., positive and negative examples) are an attractive medium for communicating complex concepts. They are useful for deriving concept expressions (such as in concept learning, interactive concept specification, and concept refinement) as well as for illustrating concept expressions to a user or domain expert. We investigate the power of labeled examples for describing description-logic concepts. Specifically, we systematically study the existence and efficient computability of finite characterisations, i.e. finite sets of labeled examples that uniquely characterize a single concept, for a wide variety of description logics between EL and ALCQI, both without an ontology and in the presence of a DL-Lite ontology. Finite characterisations are relevant for debugging purposes, and their existence is a necessary condition for exact learnability with membership queries.
AIDec 13, 2024
Extracting PAC Decision Trees from Black Box Binary Classifiers: The Gender Bias Case Study on BERT-based Language ModelsAna Ozaki, Roberto Confalonieri, Ricardo Guimarães et al.
Decision trees are a popular machine learning method, known for their inherent explainability. In Explainable AI, decision trees can be used as surrogate models for complex black box AI models or as approximations of parts of such models. A key challenge of this approach is determining how accurately the extracted decision tree represents the original model and to what extent it can be trusted as an approximation of their behavior. In this work, we investigate the use of the Probably Approximately Correct (PAC) framework to provide a theoretical guarantee of fidelity for decision trees extracted from AI models. Based on theoretical results from the PAC framework, we adapt a decision tree algorithm to ensure a PAC guarantee under certain conditions. We focus on binary classification and conduct experiments where we extract decision trees from BERT-based language models with PAC guarantees. Our results indicate occupational gender bias in these models.
LGMay 20, 2023
Learning Horn Envelopes via Queries from Large Language ModelsSophie Blum, Raoul Koudijs, Ana Ozaki et al.
We investigate an approach for extracting knowledge from trained neural networks based on Angluin's exact learning model with membership and equivalence queries to an oracle. In this approach, the oracle is a trained neural network. We consider Angluin's classical algorithm for learning Horn theories and study the necessary changes to make it applicable to learn from neural networks. In particular, we have to consider that trained neural networks may not behave as Horn oracles, meaning that their underlying target theory may not be Horn. We propose a new algorithm that aims at extracting the "tightest Horn approximation" of the target theory and that is guaranteed to terminate in exponential time (in the worst case) and in polynomial time if the target has polynomially many non-Horn examples. To showcase the applicability of the approach, we perform experiments on pre-trained language models and extract rules that expose occupation-based gender biases.
LOAug 27, 2021
Geometric Models for (Temporally) Attributed Description LogicsCamille Bourgaux, Ana Ozaki, Jeff Z. Pan
In the search for knowledge graph embeddings that could capture ontological knowledge, geometric models of existential rules have been recently introduced. It has been shown that convex geometric regions capture the so-called quasi-chained rules. Attributed description logics (DL) have been defined to bridge the gap between DL languages and knowledge graphs, whose facts often come with various kinds of annotations that may need to be taken into account for reasoning. In particular, temporally attributed DLs are enriched by specific attributes whose semantics allows for some temporal reasoning. Considering that geometric models and (temporally) attributed DLs are promising tools designed for knowledge graphs, this paper investigates their compatibility, focusing on the attributed version of a Horn dialect of the DL-Lite family. We first adapt the definition of geometric models to attributed DLs and show that every satisfiable ontology has a convex geometric model. Our second contribution is a study of the impact of temporal attributes. We show that a temporally attributed DL may not have a convex geometric model in general but we can recover geometric satisfiability by imposing some restrictions on the use of the temporal attributes.
AIApr 2, 2021
Learning Description Logic Ontologies. Five Approaches. Where Do They Stand?Ana Ozaki
The quest for acquiring a formal representation of the knowledge of a domain of interest has attracted researchers with various backgrounds into a diverse field called ontology learning. We highlight classical machine learning and data mining approaches that have been proposed for (semi-)automating the creation of description logic (DL) ontologies. These are based on association rule mining, formal concept analysis, inductive logic programming, computational learning theory, and neural networks. We provide an overview of each approach and how it has been adapted for dealing with DL ontologies. Finally, we discuss the benefits and limitations of each of them for learning DL ontologies.
AIMar 25, 2021
On the Complexity of Learning Description Logic OntologiesAna Ozaki
Ontologies are a popular way of representing domain knowledge, in particular, knowledge in domains related to life sciences. (Semi-)automating the process of building an ontology has attracted researchers from different communities into a field called "Ontology Learning". We provide a formal specification of the exact and the probably approximately correct learning models from computational learning theory. Then, we recall from the literature complexity results for learning lightweight description logic (DL) ontologies in these models. Finally, we highlight other approaches proposed in the literature for learning DL ontologies.
AIAug 17, 2020
Automated Reasoning in Temporal DL-LiteSabiha Tahrat, German Braun, Alessandro Artale et al.
This paper investigates the feasibility of automated reasoning over temporal DL-Lite (TDL-Lite) knowledge bases (KBs). We test the usage of off-the-shelf LTL reasoners to check satisfiability of TDL-Lite KBs. In particular, we test the robustness and the scalability of reasoners when dealing with TDL-Lite TBoxes paired with a temporal ABox. We conduct various experiments to analyse the performance of different reasoners by randomly generating TDL-Lite KBs and then measuring the running time and the size of the translations. Furthermore, in an effort to make the usage of TDL-Lite KBs a reality, we present a fully fledged tool with a graphical interface to design them. Our interface is based on conceptual modelling principles and it is integrated with our translation tool and a temporal reasoner.
LOMay 6, 2020
On the Learnability of Possibilistic TheoriesCosimo Persia, Ana Ozaki
We investigate learnability of possibilistic theories from entailments in light of Angluin's exact learning model. We consider cases in which only membership, only equivalence, and both kinds of queries can be posed by the learner. We then show that, for a large class of problems, polynomial time learnability results for classical logic can be transferred to the respective possibilistic extension. In particular, it follows from our results that the possibilistic extension of propositional Horn theories is exactly learnable in polynomial time. As polynomial time learnability in the exact model is transferable to the classical probably approximately correct model extended with membership queries, our work also establishes such results in this model.
LOJan 21, 2020
Provenance for the Description Logic ELHrCamille Bourgaux, Ana Ozaki, Rafael Peñaloza et al.
We address the problem of handling provenance information in ELHr ontologies. We consider a setting recently introduced for ontology-based data access, based on semirings and extending classical data provenance, in which ontology axioms are annotated with provenance tokens. A consequence inherits the provenance of the axioms involved in deriving it, yielding a provenance polynomial as an annotation. We analyse the semantics for the ELHr case and show that the presence of conjunctions poses various difficulties for handling provenance, some of which are mitigated by assuming multiplicative idempotency of the semiring. Under this assumption, we study three problems: ontology completion with provenance, computing the set of relevant axioms for a consequence, and query answering.
AINov 17, 2019
Learning Query Inseparable ELH OntologiesAna Ozaki, Cosimo Persia, Andrea Mazzullo
We investigate the complexity of learning query inseparable ELH ontologies in a variant of Angluin's exact learning model. Given a fixed data instance A* and a query language Q, we are interested in computing an ontology H that entails the same queries as a target ontology T on A*, that is, H and T are inseparable w.r.t. A* and Q. The learner is allowed to pose two kinds of questions. The first is `Does (T,A)\models q?', with A an arbitrary data instance and q and query in Q. An oracle replies this question with `yes' or `no'. In the second, the learner asks `Are H and T inseparable w.r.t. A* and Q?'. If so, the learning process finishes, otherwise, the learner receives (A*,q) with q in Q, (T,A*)\models q and (H,A*)\not\models q (or vice-versa). Then, we analyse conditions in which query inseparability is preserved if A* changes. Finally, we consider the PAC learning model and a setting where the algorithms learn from a batch of classified data, limiting interactions with the oracles.
DBJun 1, 2019
Enriching Ontology-based Data Access with Provenance (Extended Version)Diego Calvanese, Davide Lanti, Ana Ozaki et al.
Ontology-based data access (OBDA) is a popular paradigm for querying heterogeneous data sources by connecting them through mappings to an ontology. In OBDA, it is often difficult to reconstruct why a tuple occurs in the answer of a query. We address this challenge by enriching OBDA with provenance semirings, taking inspiration from database theory. In particular, we investigate the problems of (i) deciding whether a provenance annotated OBDA instance entails a provenance annotated conjunctive query, and (ii) computing a polynomial representing the provenance of a query entailed by a provenance annotated OBDA instance. Differently from pure databases, in our case these polynomials may be infinite. To regain finiteness, we consider idempotent semirings, and study the complexity in the case of DL-Lite ontologies. We implement Task (ii) in a state-of-the-art OBDA system and show the practical feasibility of the approach through an extensive evaluation against two popular benchmarks.
AIFeb 8, 2019
Learning Ontologies with Epistemic Reasoning: The EL CaseAna Ozaki, Nicolas Troquard
We investigate the problem of learning description logic ontologies from entailments via queries, using epistemic reasoning. We introduce a new learning model consisting of epistemic membership and example queries and show that polynomial learnability in this model coincides with polynomial learnability in Angluin's exact learning model with membership and equivalence queries. We then instantiate our learning framework to EL and show some complexity results for an epistemic extension of EL where epistemic operators can be applied over the axioms. Finally, we transfer known results for EL ontologies and its fragments to our learning model based on epistemic reasoning.
LGSep 20, 2017
Exact Learning of Lightweight Description Logic OntologiesBoris Konev, Carsten Lutz, Ana Ozaki et al.
We study the problem of learning description logic (DL) ontologies in Angluin et al.'s framework of exact learning via queries. We admit membership queries ("is a given subsumption entailed by the target ontology?") and equivalence queries ("is a given ontology equivalent to the target ontology?"). We present three main results: (1) ontologies formulated in (two relevant versions of) the description logic DL-Lite can be learned with polynomially many queries of polynomial size; (2) this is not the case for ontologies formulated in the description logic EL, even when only acyclic ontologies are admitted; and (3) ontologies formulated in a fragment of EL related to the web ontology language OWL 2 RL can be learned in polynomial time. We also show that neither membership nor equivalence queries alone are sufficient in cases (1) and (3).
LGSep 10, 2016
New Steps on the Exact Learning of CNFMontserrat Hermo, Ana Ozaki
A major problem in computational learning theory is whether the class of formulas in conjunctive normal form (CNF) is efficiently learnable. Although it is known that this class cannot be polynomially learned using either membership or equivalence queries alone, it is open whether CNF can be polynomially learned using both types of queries. One of the most important results concerning a restriction of the class CNF is that propositional Horn formulas are polynomial time learnable in Angluin's exact learning model with membership and equivalence queries. In this work we push this boundary and show that the class of multivalued dependency formulas (MVDF) is polynomially learnable from interpretations. We then provide a notion of reduction between learning problems in Angluin's model, showing that a transformation of the algorithm suffices to efficiently learn multivalued database dependencies from data relations. We also show via reductions that our main result extends well known previous results and allows us to find alternative solutions for them.