DBMar 13, 2023
A Framework for Combining Entity Resolution and Query Answering in Knowledge BasesRonald Fagin, Phokion G. Kolaitis, Domenico Lembo et al.
We propose a new framework for combining entity resolution and query answering in knowledge bases (KBs) with tuple-generating dependencies (tgds) and equality-generating dependencies (egds) as rules. We define the semantics of the KB in terms of special instances that involve equivalence classes of entities and sets of values. Intuitively, the former collect all entities denoting the same real-world object, while the latter collect all alternative values for an attribute. This approach allows us to both resolve entities and bypass possible inconsistencies in the data. We then design a chase procedure that is tailored to this new framework and has the feature that it never fails; moreover, when the chase procedure terminates, it produces a universal solution, which in turn can be used to obtain the certain answers to conjunctive queries. We finally discuss challenges arising when the chase does not terminate.
87.3LOMay 29
Aspects of Coherence in Dependence LogicTimon Barlag, Nicolas Fröhlich, Miika Hannula et al.
Dependence logic extends first-order logic with dependence atoms asserting that the value of a variable is determined by the values of certain other variables. The semantics of dependence logic has a second-order character and involves sets of assignments, called teams, instead of individual assignments as in the classical Tarski semantics. Since the model-checking problem is known to be NP-complete even for quantifier-free dependence logic (DQF) formulas, researchers have pursued conditions on formulas that make this problem tractable. In 2010, Jarmo Kontinen introduced the notion of k-coherence for dependence logic formulas, where k is a positive integer. This notion asserts that if the formula is satisfied in a structure by all k-element subteams of a given team, then the given team itself satisfies the formula. It has been proved that k-coherent DQF-formulas have a tame model-checking problem, because such formulas admit a first-order rewriting. In this paper, we investigate the structural and algorithmic aspects of coherence. We show that if a DQF-formula is first-order ewritable, then it is k-coherent for some positive integer k. Thus, for DQF-formulas, coherence is equivalent to first-order rewritability. Furthermore, we show that an analogous result holds for universally quantified dependence logic formulas under a stronger notion of coherence. After this, we focus on the complexity of deciding if a given dependence logic formula is k-coherent. We establish that this decision problem is highly undecidable for arbitrary dependence logic formulas, while for DQF-formulas this problem is co-recursively enumerable. Furthermore, we pinpoint the computational complexity of the coherence problem for propositional dependence logic formulas by showing that this problem is complete for the second level of the exponential hierarchy.
AIJan 29, 2019
Knowledge Refinement via Rule SelectionPhokion G. Kolaitis, Lucian Popa, Kun Qian
In several different applications, including data transformation and entity resolution, rules are used to capture aspects of knowledge about the application at hand. Often, a large set of such rules is generated automatically or semi-automatically, and the challenge is to refine the encapsulated knowledge by selecting a subset of rules based on the expected operational behavior of the rules on available data. In this paper, we carry out a systematic complexity-theoretic investigation of the following rule selection problem: given a set of rules specified by Horn formulas, and a pair of an input database and an output database, find a subset of the rules that minimizes the total error, that is, the number of false positive and false negative errors arising from the selected rules. We first establish computational hardness results for the decision problems underlying this minimization problem, as well as upper and lower bounds for its approximability. We then investigate a bi-objective optimization version of the rule selection problem in which both the total error and the size of the selected rules are taken into account. We show that testing for membership in the Pareto front of this bi-objective optimization problem is DP-complete. Finally, we show that a similar DP-completeness result holds for a bi-level optimization version of the rule selection problem, where one minimizes first the total error and then the size.
DBMay 10, 2018
Computational Social Choice Meets DatabasesBenny Kimelfeld, Phokion G. Kolaitis, Julia Stoyanovich
We develop a novel framework that aims to create bridges between the computational social choice and the database management communities. This framework enriches the tasks currently supported in computational social choice with relational database context, thus making it possible to formulate sophisticated queries about voting rules, candidates, voters, issues, and positions. At the conceptual level, we give rigorous semantics to queries in this framework by introducing the notions of necessary answers and possible answers to queries. At the technical level, we embark on an investigation of the computational complexity of the necessary answers. We establish a number of results about the complexity of the necessary answers of conjunctive queries involving positional scoring rules that contrast sharply with earlier results about the complexity of the necessary winners.