NE AI LGOct 13, 2024

Large-scale Multi-objective Feature Selection: A Multi-phase Search Space Shrinking Approach

Azam Asilian Bidgoli, Shahryar Rahnamayan

arXiv:2410.21293v13.31 citationsh-index: 13

Originality Highly original

AI Analysis

This addresses the problem of computational inefficiency and degraded model performance in machine learning for researchers and practitioners dealing with large-scale data, representing a strong specific gain rather than a foundational advance.

The paper tackled the challenge of feature selection for high-dimensional datasets by proposing a multi-objective evolutionary algorithm with search space shrinking, achieving more accurate feature subsets compared to state-of-the-art methods on 15 large-scale datasets.

Feature selection is a crucial step in machine learning, especially for high-dimensional datasets, where irrelevant and redundant features can degrade model performance and increase computational costs. This paper proposes a novel large-scale multi-objective evolutionary algorithm based on the search space shrinking, termed LMSSS, to tackle the challenges of feature selection particularly as a sparse optimization problem. The method includes a shrinking scheme to reduce dimensionality of the search space by eliminating irrelevant features before the main evolutionary process. This is achieved through a ranking-based filtering method that evaluates features based on their correlation with class labels and frequency in an initial, cost-effective evolutionary process. Additionally, a smart crossover scheme based on voting between parent solutions is introduced, giving higher weight to the parent with better classification accuracy. An intelligent mutation process is also designed to target features prematurely excluded from the population, ensuring they are evaluated in combination with other features. These integrated techniques allow the evolutionary process to explore the search space more efficiently and effectively, addressing the sparse and high-dimensional nature of large-scale feature selection problems. The effectiveness of the proposed algorithm is demonstrated through comprehensive experiments on 15 large-scale datasets, showcasing its potential to identify more accurate feature subsets compared to state-of-the-art large-scale feature selection algorithms. These results highlight LMSSS's capability to improve model performance and computational efficiency, setting a new benchmark in the field.

View on arXiv PDF

Similar