Feature reduction for machine learning on molecular features: The GeneScore
This work addresses feature reduction for biomedical data analysis, but it appears incremental as it builds on existing knowledge to improve classification in a specific domain.
The authors tackled the problem of feature reduction for machine learning on biomedical data by introducing the GeneScore, which integrates multiple molecular data types into a single score using expert knowledge, and showed it outperforms a binary matrix in classifying cancer entities from various data types.
We present the GeneScore, a concept of feature reduction for Machine Learning analysis of biomedical data. Using expert knowledge, the GeneScore integrates different molecular data types into a single score. We show that the GeneScore is superior to a binary matrix in the classification of cancer entities from SNV, Indel, CNV, gene fusion and gene expression data. The GeneScore is a straightforward way to facilitate state-of-the-art analysis, while making use of the available scientific knowledge on the nature of molecular data features used.