LGMay 13, 2017

Automatically Redundant Features Removal for Unsupervised Feature Selection via Sparse Feature Graph

arXiv:1705.04804v20.7

Originality Incremental advance

AI Analysis

This addresses a domain-specific issue for researchers and practitioners in machine learning dealing with high-dimensional data, though it is incremental as it builds on existing sparse learning frameworks.

The paper tackles the problem of redundant features in high-dimensional datasets affecting learning algorithms by proposing a graph-based approach to automatically detect and remove them, resulting in consistent performance improvements for unsupervised feature selection algorithms on benchmark datasets.

The redundant features existing in high dimensional datasets always affect the performance of learning and mining algorithms. How to detect and remove them is an important research topic in machine learning and data mining research. In this paper, we propose a graph based approach to find and remove those redundant features automatically for high dimensional data. Based on the sparse learning based unsupervised feature selection framework, Sparse Feature Graph (SFG) is introduced not only to model the redundancy between two features, but also to disclose the group redundancy between two groups of features. With SFG, we can divide the whole features into different groups, and improve the intrinsic structure of data by removing detected redundant features. With accurate data structure, quality indicator vectors can be obtained to improve the learning performance of existing unsupervised feature selection algorithms such as multi-cluster feature selection (MCFS). Our experimental results on benchmark datasets show that the proposed SFG and feature redundancy remove algorithm can improve the performance of unsupervised feature selection algorithms consistently.

View on arXiv PDF

Similar