AI LGJul 26, 2021

An A*-algorithm for the Unordered Tree Edit Distance with Custom Costs

arXiv:2108.00953v14.54 citations

Originality Incremental advance

AI Analysis

This work addresses a bottleneck in applying unordered tree edit distances to domains like chemical informatics where custom costs can encode domain knowledge, though it appears incremental as it builds on existing A* algorithms.

The paper tackles the problem of computing unordered tree edit distances with custom cost functions, which was previously limited to unit costs. They present three novel A* heuristics that work with custom costs, showing in experiments on chemical datasets that these custom costs make computation faster and improve the error of a 5-nearest neighbor regressor for predicting chemical properties.

The unordered tree edit distance is a natural metric to compute distances between trees without intrinsic child order, such as representations of chemical molecules. While the unordered tree edit distance is MAX SNP-hard in principle, it is feasible for small cases, e.g. via an A* algorithm. Unfortunately, current heuristics for the A* algorithm assume unit costs for deletions, insertions, and replacements, which limits our ability to inject domain knowledge. In this paper, we present three novel heuristics for the A* algorithm that work with custom cost functions. In experiments on two chemical data sets, we show that custom costs make the A* computation faster and improve the error of a 5-nearest neighbor regressor, predicting chemical properties. We also show that, on these data, polynomial edit distances can achieve similar results as the unordered tree edit distance.

View on arXiv PDF

Similar