Performance Analysis of Clustering Algorithms for Gene Expression Data
This is an incremental improvement for bioinformatics researchers analyzing gene expression data, addressing the challenge of unknown cluster counts in real-world applications.
The paper tackled the problem of clustering gene expression data without requiring a predefined number of clusters, by analyzing the K-Means with Automatic Generations of Merge Factor for ISODATA (AGMFI) algorithm, which improved clustering quality by automatically generating initial parameters instead of manual selection.
Microarray technology is a process that allows thousands of genes simultaneously monitor to various experimental conditions. It is used to identify the co-expressed genes in specific cells or tissues that are actively used to make proteins, This method is used to analysis the gene expression, an important task in bioinformatics research. Cluster analysis of gene expression data has proved to be a useful tool for identifying co-expressed genes, biologically relevant groupings of genes and samples. In this paper we analysed K-Means with Automatic Generations of Merge Factor for ISODATA- AGMFI, to group the microarray data sets on the basic of ISODATA. AGMFI is to generate initial values for merge and Spilt factor, maximum merge times instead of selecting efficient values as in ISODATA. The initial seeds for each cluster were normally chosen either sequentially or randomly. The quality of the final clusters was found to be influenced by these initial seeds. For the real life problems, the suitable number of clusters cannot be predicted. To overcome the above drawback the current research focused on developing the clustering algorithms without giving the initial number of clusters.