Principled and Efficient Motif Finding for Structure Learning of Lifted Graphical Models
This work addresses a core problem in neuro-symbolic AI and statistical relational learning by providing a more efficient and accurate method for structure learning, though it appears incremental as it builds on existing motif-finding concepts.
The paper tackles the problem of structure learning in lifted graphical models by introducing a principled approach for mining structural motifs, which reduces the search space and guides formula learning. The results show improvements of up to 6% in accuracy and up to 80% in runtime compared to state-of-the-art methods.
Structure learning is a core problem in AI central to the fields of neuro-symbolic AI and statistical relational learning. It consists in automatically learning a logical theory from data. The basis for structure learning is mining repeating patterns in the data, known as structural motifs. Finding these patterns reduces the exponential search space and therefore guides the learning of formulas. Despite the importance of motif learning, it is still not well understood. We present the first principled approach for mining structural motifs in lifted graphical models, languages that blend first-order logic with probabilistic models, which uses a stochastic process to measure the similarity of entities in the data. Our first contribution is an algorithm, which depends on two intuitive hyperparameters: one controlling the uncertainty in the entity similarity measure, and one controlling the softness of the resulting rules. Our second contribution is a preprocessing step where we perform hierarchical clustering on the data to reduce the search space to the most relevant data. Our third contribution is to introduce an O(n ln n) (in the size of the entities in the data) algorithm for clustering structurally-related data. We evaluate our approach using standard benchmarks and show that we outperform state-of-the-art structure learning approaches by up to 6% in terms of accuracy and up to 80% in terms of runtime.