Combining Heterogeneous Classifiers for Relational Databases
This addresses the challenge of efficiently classifying data in enterprise relational databases, though it appears incremental as it builds on existing meta-classification and relational learning techniques.
The paper tackled the problem of applying machine learning to data distributed across multiple relational databases without losing semantic information or incurring computational penalties from flattening. It introduced a two-phase hierarchical meta-classification algorithm that reduced classification time by a considerable amount while maintaining prediction accuracy on three benchmark datasets.
Most enterprise data is distributed in multiple relational databases with expert-designed schema. Using traditional single-table machine learning techniques over such data not only incur a computational penalty for converting to a 'flat' form (mega-join), even the human-specified semantic information present in the relations is lost. In this paper, we present a practical, two-phase hierarchical meta-classification algorithm for relational databases with a semantic divide and conquer approach. We propose a recursive, prediction aggregation technique over heterogeneous classifiers applied on individual database tables. The proposed algorithm was evaluated on three diverse datasets, namely TPCH, PKDD and UCI benchmarks and showed considerable reduction in classification time without any loss of prediction accuracy.