AIJul 30, 2016

A Linear Algebraic Approach to Datalog Evaluation

arXiv:1608.00139v23 citations
Originality Highly original
AI Analysis

This work addresses performance bottlenecks in Datalog evaluation for database and logic programming communities, offering a significant speedup over existing methods.

The paper tackles the problem of evaluating Datalog programs by proposing a linear algebraic approach that translates programs into matrix equations, achieving O(N^3) time complexity for certain program classes. In experiments, this method runs 10^1 to 10^4 times faster than state-of-the-art symbolic systems on non-sparse data.

In this paper, we propose a fundamentally new approach to Datalog evaluation. Given a linear Datalog program DB written using N constants and binary predicates, we first translate if-and-only-if completions of clauses in DB into a set Eq(DB) of matrix equations with a non-linear operation where relations in M_DB, the least Herbrand model of DB, are encoded as adjacency matrices. We then translate Eq(DB) into another, but purely linear matrix equations tilde_Eq(DB). It is proved that the least solution of tilde_Eq(DB) in the sense of matrix ordering is converted to the least solution of Eq(DB) and the latter gives M_DB as a set of adjacency matrices. Hence computing the least solution of tilde_Eq(DB) is equivalent to computing M_DB specified by DB. For a class of tail recursive programs and for some other types of programs, our approach achieves O(N^3) time complexity irrespective of the number of variables in a clause since only matrix operations costing O(N^3) or less are used. We conducted two experiments that compute the least Herbrand models of linear Datalog programs. The first experiment computes transitive closure of artificial data and real network data taken from the Koblenz Network Collection. The second one compared the proposed approach with the state-of-the-art symbolic systems including two Prolog systems and two ASP systems, in terms of computation time for a transitive closure program and the same generation program. In the experiment, it is observed that our linear algebraic approach runs 10^1 ~ 10^4 times faster than the symbolic systems when data is not sparse. To appear in Theory and Practice of Logic Programming (TPLP).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes