ML LGApr 10, 2019

The Weight Function in the Subtree Kernel is Decisive

arXiv:1904.05421v412 citations

Originality Incremental advance

AI Analysis

This work addresses the analysis of non-Euclidean tree data, such as plant architectures or RNA structures, by improving kernel methods, though it is incremental as it builds on an existing kernel.

The authors investigated the influence of the weight function in the subtree kernel for tree data, showing that performance improves when leaf weights vanish, and introduced a data-learned weight function that demonstrated high efficiency in eight real classification problems, particularly for small datasets.

Tree data are ubiquitous because they model a large variety of situations, e.g., the architecture of plants, the secondary structure of RNA, or the hierarchy of XML files. Nevertheless, the analysis of these non-Euclidean data is difficult per se. In this paper, we focus on the subtree kernel that is a convolution kernel for tree data introduced by Vishwanathan and Smola in the early 2000's. More precisely, we investigate the influence of the weight function from a theoretical perspective and in real data applications. We establish on a 2-classes stochastic model that the performance of the subtree kernel is improved when the weight of leaves vanishes, which motivates the definition of a new weight function, learned from the data and not fixed by the user as usually done. To this end, we define a unified framework for computing the subtree kernel from ordered or unordered trees, that is particularly suitable for tuning parameters. We show through eight real data classification problems the great efficiency of our approach, in particular for small datasets, which also states the high importance of the weight function. Finally, a visualization tool of the significant features is derived.

View on arXiv PDF

Similar