LGITJan 11, 2024

Semantic-Preserving Feature Partitioning for Multi-View Ensemble Learning

arXiv:2401.06251v13 citationsh-index: 41Inf Fusion
Originality Incremental advance
AI Analysis

This addresses the curse of dimensionality for machine learning practitioners by improving multi-view ensemble learning, though it is an incremental advancement over existing feature partitioning methods.

The paper tackles the challenge of high-dimensional sparse data in machine learning by introducing the Semantic-Preserving Feature Partitioning (SPFP) algorithm, which partitions datasets into semantically consistent views for multi-view ensemble learning, showing notable efficacy in maintaining accuracy or uncertainty metrics across eight real-world datasets with large effect sizes.

In machine learning, the exponential growth of data and the associated ``curse of dimensionality'' pose significant challenges, particularly with expansive yet sparse datasets. Addressing these challenges, multi-view ensemble learning (MEL) has emerged as a transformative approach, with feature partitioning (FP) playing a pivotal role in constructing artificial views for MEL. Our study introduces the Semantic-Preserving Feature Partitioning (SPFP) algorithm, a novel method grounded in information theory. The SPFP algorithm effectively partitions datasets into multiple semantically consistent views, enhancing the MEL process. Through extensive experiments on eight real-world datasets, ranging from high-dimensional with limited instances to low-dimensional with high instances, our method demonstrates notable efficacy. It maintains model accuracy while significantly improving uncertainty measures in scenarios where high generalization performance is achievable. Conversely, it retains uncertainty metrics while enhancing accuracy where high generalization accuracy is less attainable. An effect size analysis further reveals that the SPFP algorithm outperforms benchmark models by large effect size and reduces computational demands through effective dimensionality reduction. The substantial effect sizes observed in most experiments underscore the algorithm's significant improvements in model performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes