LGDBJan 13, 2012

Combining Heterogeneous Classifiers for Relational Databases

arXiv:1201.2925v214 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of efficiently classifying data in enterprise relational databases, though it appears incremental as it builds on existing meta-classification and relational learning techniques.

The paper tackled the problem of applying machine learning to data distributed across multiple relational databases without losing semantic information or incurring computational penalties from flattening. It introduced a two-phase hierarchical meta-classification algorithm that reduced classification time by a considerable amount while maintaining prediction accuracy on three benchmark datasets.

Most enterprise data is distributed in multiple relational databases with expert-designed schema. Using traditional single-table machine learning techniques over such data not only incur a computational penalty for converting to a 'flat' form (mega-join), even the human-specified semantic information present in the relations is lost. In this paper, we present a practical, two-phase hierarchical meta-classification algorithm for relational databases with a semantic divide and conquer approach. We propose a recursive, prediction aggregation technique over heterogeneous classifiers applied on individual database tables. The proposed algorithm was evaluated on three diverse datasets, namely TPCH, PKDD and UCI benchmarks and showed considerable reduction in classification time without any loss of prediction accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes