LG ITJan 11, 2024

Semantic-Preserving Feature Partitioning for Multi-View Ensemble Learning

Mohammad Sadegh Khorshidi, Navid Yazdanjue, Hassan Gharoun, Danial Yazdani, Mohammad Reza Nikoo, Fang Chen, Amir H. Gandomi

arXiv:2401.06251v16.43 citationsh-index: 41Inf Fusion

Originality Incremental advance

AI Analysis

This addresses the curse of dimensionality for machine learning practitioners by improving multi-view ensemble learning, though it is an incremental advancement over existing feature partitioning methods.

The paper tackles the challenge of high-dimensional sparse data in machine learning by introducing the Semantic-Preserving Feature Partitioning (SPFP) algorithm, which partitions datasets into semantically consistent views for multi-view ensemble learning, showing notable efficacy in maintaining accuracy or uncertainty metrics across eight real-world datasets with large effect sizes.

In machine learning, the exponential growth of data and the associated ``curse of dimensionality'' pose significant challenges, particularly with expansive yet sparse datasets. Addressing these challenges, multi-view ensemble learning (MEL) has emerged as a transformative approach, with feature partitioning (FP) playing a pivotal role in constructing artificial views for MEL. Our study introduces the Semantic-Preserving Feature Partitioning (SPFP) algorithm, a novel method grounded in information theory. The SPFP algorithm effectively partitions datasets into multiple semantically consistent views, enhancing the MEL process. Through extensive experiments on eight real-world datasets, ranging from high-dimensional with limited instances to low-dimensional with high instances, our method demonstrates notable efficacy. It maintains model accuracy while significantly improving uncertainty measures in scenarios where high generalization performance is achievable. Conversely, it retains uncertainty metrics while enhancing accuracy where high generalization accuracy is less attainable. An effect size analysis further reveals that the SPFP algorithm outperforms benchmark models by large effect size and reduces computational demands through effective dimensionality reduction. The substantial effect sizes observed in most experiments underscore the algorithm's significant improvements in model performance.

View on arXiv PDF

Similar