DSMay 20

The Complexity of Maximal/Closed Frequent Tree Mining for Bounded Height Trees

Kenta Komoto, Kazuhiro Kurita, Hirotaka Ono

arXiv:2602.0343649.0h-index: 5

AI Analysis

For researchers in bioinformatics and data mining working with tree-structured data, this paper provides a precise complexity classification for frequent tree mining under height constraints, revealing that even small height bounds do not generally make the problem easy.

The paper studies the complexity of enumerating closed and maximal frequent trees in databases of bounded-height rooted trees. It shows that for unordered trees of height ≤2, closed frequent trees can be enumerated with polynomial delay, while for ordered trees of height ≤2, such enumeration would imply an output-polynomial algorithm for Dualization. For maximal frequent trees, no output-polynomial algorithm exists unless P=NP for ordered trees of height ≤2 and unordered trees of height ≤3.

Frequent tree mining asks us to enumerate tree patterns that occur frequently in a database of rooted trees. This problem is motivated by tree-structured data in bioinformatics, such as glycans and pseudoknot-free RNA secondary structures. A direct enumeration of all frequent trees is often highly redundant, because every subtree of a frequent tree is again frequent. Closed and maximal frequent trees are standard ways to reduce this redundancy, but their enumeration can still be computationally hard. In this paper, we study the effect of bounding the height of the input trees. This is a natural restriction for rooted trees, since the height is the depth of the hierarchy. We ask whether closed/maximal frequent tree mining remains hard when every input tree has a small height. Our results show that the answer depends sharply on the model. For rooted unordered trees of height at most 2, we give a polynomial-delay algorithm for enumerating closed frequent trees. On the other hand, for rooted ordered trees of height at most 2, we show that an output-polynomial time algorithm for enumerating closed frequent trees would imply an output-polynomial time algorithm for Dualization. For maximal frequent tree enumeration, we prove that no output-polynomial time algorithm exists unless P = NP already for rooted ordered trees of height at most 2 and for rooted unordered trees of height at most 3. Thus, even very small height bounds do not make the enumeration problems easy in general. At the same time, the unordered closed case of height at most 2 admits polynomial-delay enumeration. These results give a height-based classification of the complexity of closed and maximal frequent tree mining on shallow rooted trees.

View on arXiv PDF

Similar