NELGNCOct 12, 2020

A Neurochaos Learning Architecture for Genome Classification

arXiv:2010.10995v13 citations
Originality Highly original
AI Analysis

This addresses genome sequence classification, particularly for viruses like SARS-CoV-2, with a novel hybrid method that shows high performance in low-data regimes.

The paper tackles genome classification by proposing a Neurochaos Learning architecture that uses chaotic neurons for feature extraction, achieving an average macro F1-score > 0.99 with just one training sample per class for SARS-CoV-2 vs. SARS-CoV-1 classification.

There has been empirical evidence of presence of non-linearity and chaos at the level of single neurons in biological neural networks. The properties of chaotic neurons inspires us to employ them in artificial learning systems. Here, we propose a Neurochaos Learning (NL) architecture, where the neurons used to extract features from data are 1D chaotic maps. ChaosFEX+SVM, an instance of this NL architecture, is proposed as a hybrid combination of chaos and classical machine learning algorithm. We formally prove that a single layer of NL with a finite number of 1D chaotic neurons satisfies the Universal Approximation Theorem with an exact value for the number of chaotic neurons needed to approximate a discrete real valued function with finite support. This is made possible due to the topological transitivity property of chaos and the existence of uncountably infinite number of dense orbits for the chosen 1D chaotic map. The chaotic neurons in NL get activated under the presence of an input stimulus (data) and output a chaotic firing trajectory. From such chaotic firing trajectories of individual neurons of NL, we extract Firing Time, Firing Rate, Energy and Entropy that constitute ChaosFEX features. These ChaosFEX features are then fed to a Support Vector Machine with linear kernel for classification. The effectiveness of chaotic feature engineering performed by NL (ChaosFEX+SVM) is demonstrated for synthetic and real world datasets in the low and high training sample regimes. Specifically, we consider the problem of classification of genome sequences of SARS-CoV-2 from other coronaviruses (SARS-CoV-1, MERS-CoV and others). With just one training sample per class for 1000 random trials of training, we report an average macro F1-score > 0.99 for the classification of SARS-CoV-2 from SARS-CoV-1 genome sequences. Robustness of ChaosFEX features to additive noise is also demonstrated.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes