LGJul 25, 2024

Enhancing Diversity in Multi-objective Feature Selection

arXiv:2407.17795v23 citationsh-index: 13
AI Analysis

This is an incremental improvement for researchers and practitioners in machine learning, addressing diversity issues in feature selection to enhance model performance and efficiency.

The paper tackled the problem of limited diversity in multi-objective feature selection using NSGA-II, by introducing a re-initialization method that replaces the worst individuals with new random ones, resulting in improved population quality and algorithm performance across 12 real-world classification problems with features ranging from 2,400 to nearly 50,000.

Feature selection plays a pivotal role in the data preprocessing and model-building pipeline, significantly enhancing model performance, interpretability, and resource efficiency across diverse domains. In population-based optimization methods, the generation of diverse individuals holds utmost importance for adequately exploring the problem landscape, particularly in highly multi-modal multi-objective optimization problems. Our study reveals that, in line with findings from several prior research papers, commonly employed crossover and mutation operations lack the capability to generate high-quality diverse individuals and tend to become confined to limited areas around various local optima. This paper introduces an augmentation to the diversity of the population in the well-established multi-objective scheme of the genetic algorithm, NSGA-II. This enhancement is achieved through two key components: the genuine initialization method and the substitution of the worst individuals with new randomly generated individuals as a re-initialization approach in each generation. The proposed multi-objective feature selection method undergoes testing on twelve real-world classification problems, with the number of features ranging from 2,400 to nearly 50,000. The results demonstrate that replacing the last front of the population with an equivalent number of new random individuals generated using the genuine initialization method and featuring a limited number of features substantially improves the population's quality and, consequently, enhances the performance of the multi-objective algorithm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes