QMLGMLApr 19, 2019

Random Fragments Classification of Microbial Marker Clades with Multi-class SVM and N-Best Algorithm

arXiv:1904.09061v12 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of microbial species classification for biologists, but it is incremental as it builds on existing SVM and N-best methods with specific feature extraction and data splitting strategies.

The paper tackled microbial clade classification by developing a multi-class SVM with an N-best algorithm to classify random fragments of marker genome sequences, achieving recognition accuracy rates above 28% for top-1 and above 91% for top-10 candidates on training and testing sets.

Microbial clades modeling is a challenging problem in biology based on microarray genome sequences, especially in new species gene isolates discovery and category. Marker family genome sequences play important roles in describing specific microbial clades within species, a framework of support vector machine (SVM) based microbial species classification with N-best algorithm is constructed to classify the centroid marker genome fragments randomly generated from marker genome sequences on MetaRef. A time series feature extraction method is proposed by segmenting the centroid gene sequences and mapping into different dimensional spaces. Two ways of data splitting are investigated according to random splitting fragments along genome sequence (DI) , or separating genome sequences into two parts (DII).Two strategies of fragments recognition tasks, dimension-by-dimension and sequence--by--sequence, are investigated. The k-mer size selection, overlap of segmentation and effects of random split percents are also discussed. Experiments on 12390 maker genome sequences belonging to marker families of 17 species from MetaRef show that, both for DI and DII in dimension-by-dimension and sequence-by-sequence recognition, the recognition accuracy rates can achieve above 28\% in top-1 candidate, and above 91\% in top-10 candidate both on training and testing sets overall.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes