DCLGMLSep 7, 2017

Feature selection in high-dimensional dataset using MapReduce

arXiv:1709.02327v120 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This provides a scalable solution for feature selection in bioinformatics and network inference, but it is incremental as it adapts an existing method to a distributed framework.

The paper tackled the problem of feature selection in high-dimensional datasets by implementing a distributed MapReduce version of the minimum Redundancy Maximum Relevance algorithm, achieving scalability on datasets with millions of observations or features.

This paper describes a distributed MapReduce implementation of the minimum Redundancy Maximum Relevance algorithm, a popular feature selection method in bioinformatics and network inference problems. The proposed approach handles both tall/narrow and wide/short datasets. We further provide an open source implementation based on Hadoop/Spark, and illustrate its scalability on datasets involving millions of observations or features.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes