LG DBJan 13, 2012

Combining Heterogeneous Classifiers for Relational Databases

Geetha Manjunatha, M Narasimha Murty, Dinkar Sitaram

arXiv:1201.2925v214 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of efficiently classifying data in enterprise relational databases, though it appears incremental as it builds on existing meta-classification and relational learning techniques.

The paper tackled the problem of applying machine learning to data distributed across multiple relational databases without losing semantic information or incurring computational penalties from flattening. It introduced a two-phase hierarchical meta-classification algorithm that reduced classification time by a considerable amount while maintaining prediction accuracy on three benchmark datasets.

Most enterprise data is distributed in multiple relational databases with expert-designed schema. Using traditional single-table machine learning techniques over such data not only incur a computational penalty for converting to a 'flat' form (mega-join), even the human-specified semantic information present in the relations is lost. In this paper, we present a practical, two-phase hierarchical meta-classification algorithm for relational databases with a semantic divide and conquer approach. We propose a recursive, prediction aggregation technique over heterogeneous classifiers applied on individual database tables. The proposed algorithm was evaluated on three diverse datasets, namely TPCH, PKDD and UCI benchmarks and showed considerable reduction in classification time without any loss of prediction accuracy.

View on arXiv PDF

Similar