CELGJul 12, 2013

Unsupervised Gene Expression Data using Enhanced Clustering Method

arXiv:1307.3337v18 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of identifying co-expressed genes in bioinformatics, but it is incremental as it builds on existing clustering methods with enhancements for initialization and feature selection.

The authors tackled the problem of clustering gene expression data by proposing an unsupervised gene selection method combined with an Enhanced Center Initialization Algorithm (ECIA) and K-Means, which improved cluster compactness and performance as measured by Silhouette Coefficients.

Microarrays are made it possible to simultaneously monitor the expression profiles of thousands of genes under various experimental conditions. Identification of co-expressed genes and coherent patterns is the central goal in microarray or gene expression data analysis and is an important task in bioinformatics research. Feature selection is a process to select features which are more informative. It is one of the important steps in knowledge discovery. The problem is that not all features are important. Some of the features may be redundant, and others may be irrelevant and noisy. In this work the unsupervised Gene selection method and Enhanced Center Initialization Algorithm (ECIA) with K-Means algorithms have been applied for clustering of Gene Expression Data. This proposed clustering algorithm overcomes the drawbacks in terms of specifying the optimal number of clusters and initialization of good cluster centroids. Gene Expression Data show that could identify compact clusters with performs well in terms of the Silhouette Coefficients cluster measure.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes