Synthesizing Datalog Programs Using Numerical Relaxation
This addresses the challenge of learning logical rules for applications in program synthesis, logic programming, and machine learning, offering a novel method to overcome computational bottlenecks in existing approaches.
The paper tackles the problem of learning logical rules from examples by introducing Difflog, a technique that extends Datalog to a continuous setting with weighted rules, enabling numerical relaxation to optimize rule weights before recovering discrete programs. It demonstrates significant improvements on 34 benchmark problems in knowledge discovery, formal verification, and database query-by-example, handling complex programs with recursive rules and invented predicates.
The problem of learning logical rules from examples arises in diverse fields, including program synthesis, logic programming, and machine learning. Existing approaches either involve solving computationally difficult combinatorial problems, or performing parameter estimation in complex statistical models. In this paper, we present Difflog, a technique to extend the logic programming language Datalog to the continuous setting. By attaching real-valued weights to individual rules of a Datalog program, we naturally associate numerical values with individual conclusions of the program. Analogous to the strategy of numerical relaxation in optimization problems, we can now first determine the rule weights which cause the best agreement between the training labels and the induced values of output tuples, and subsequently recover the classical discrete-valued target program from the continuous optimum. We evaluate Difflog on a suite of 34 benchmark problems from recent literature in knowledge discovery, formal verification, and database query-by-example, and demonstrate significant improvements in learning complex programs with recursive rules, invented predicates, and relations of arbitrary arity.