LGJul 8, 2022

UDRN: Unified Dimensional Reduction Neural Network for Feature Selection and Feature Projection

arXiv:2207.03809v216 citationsh-index: 13
Originality Incremental advance
AI Analysis

This addresses a methodological gap in machine learning for researchers and practitioners needing interpretable and structure-preserving dimensional reduction, though it appears incremental as it combines existing concepts into a new framework.

The paper tackles the incompatibility between feature selection (FS) and feature projection (FP) in dimensional reduction by proposing UDRN, a unified neural network framework that integrates both tasks, showing advantages in classification and visualization on image and biological datasets.

Dimensional reduction~(DR) maps high-dimensional data into a lower dimensions latent space with minimized defined optimization objectives. The DR method usually falls into feature selection~(FS) and feature projection~(FP). FS focuses on selecting a critical subset of dimensions but risks destroying the data distribution (structure). On the other hand, FP combines all the input features into lower dimensions space, aiming to maintain the data structure; but lacks interpretability and sparsity. FS and FP are traditionally incompatible categories; thus, they have not been unified into an amicable framework. We propose that the ideal DR approach combines both FS and FP into a unified end-to-end manifold learning framework, simultaneously performing fundamental feature discovery while maintaining the intrinsic relationships between data samples in the latent space. In this work, we develop a unified framework, Unified Dimensional Reduction Neural-network~(UDRN), that integrates FS and FP in a compatible, end-to-end way. We improve the neural network structure by implementing FS and FP tasks separately using two stacked sub-networks. In addition, we designed data augmentation of the DR process to improve the generalization ability of the method when dealing with extensive feature datasets and designed loss functions that can cooperate with the data augmentation. Extensive experimental results on four image and four biological datasets, including very high-dimensional data, demonstrate the advantages of DRN over existing methods~(FS, FP, and FS\&FP pipeline), especially in downstream tasks such as classification and visualization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes