LG AIJul 22, 2024

Cascaded two-stage feature clustering and selection via separability and consistency in fuzzy decision systems

Yuepeng Chen, Weiping Ding, Hengrong Ju, Jiashuang Huang, Tao Yin

arXiv:2407.15893v12.611 citationsh-index: 26

Originality Incremental advance

AI Analysis

This addresses feature selection for fuzzy decision systems, which is an incremental improvement in a domain-specific area.

The paper tackles feature selection challenges in high-dimensional fuzzy decision systems by proposing a cascaded two-stage algorithm that clusters features and selects them using a novel metric based on global separability and local consistency. Experimental results on 18 public datasets and a schizophrenia dataset show the algorithm outperforms benchmarks in classification accuracy and reduces the number of selected features.

Feature selection is a vital technique in machine learning, as it can reduce computational complexity, improve model performance, and mitigate the risk of overfitting. However, the increasing complexity and dimensionality of datasets pose significant challenges in the selection of features. Focusing on these challenges, this paper proposes a cascaded two-stage feature clustering and selection algorithm for fuzzy decision systems. In the first stage, we reduce the search space by clustering relevant features and addressing inter-feature redundancy. In the second stage, a clustering-based sequentially forward selection method that explores the global and local structure of data is presented. We propose a novel metric for assessing the significance of features, which considers both global separability and local consistency. Global separability measures the degree of intra-class cohesion and inter-class separation based on fuzzy membership, providing a comprehensive understanding of data separability. Meanwhile, local consistency leverages the fuzzy neighborhood rough set model to capture uncertainty and fuzziness in the data. The effectiveness of our proposed algorithm is evaluated through experiments conducted on 18 public datasets and a real-world schizophrenia dataset. The experiment results demonstrate our algorithm's superiority over benchmarking algorithms in both classification accuracy and the number of selected features.

View on arXiv PDF

Similar