LGFeb 17, 2023

Efficient Classification of SARS-CoV-2 Spike Sequences Using Federated Learning

Prakash Chourasia, Taslim Murad, Zahra Tayebi, Sarwan Ali, Imdad Ullah Khan, Murray Patterson

arXiv:2302.08688v26.67 citationsh-index: 20

Originality Synthesis-oriented

AI Analysis

This work addresses the need for privacy-preserving classification of coronavirus variants, which is incremental as it applies an existing federated learning method to a new domain.

The paper tackles the problem of classifying SARS-CoV-2 spike sequences by using a federated learning approach to train an AI model without sharing data, achieving an overall accuracy of 93% for variant identification.

This paper presents a federated learning (FL) approach to train an AI model for SARS-Cov-2 variant classification. We analyze the SARS-CoV-2 spike sequences in a distributed way, without data sharing, to detect different variants of this rapidly mutating coronavirus. Our method maintains the confidentiality of local data (that could be stored in different locations) yet allows us to reliably detect and identify different known and unknown variants of the novel coronavirus SARS-CoV-2. Using the proposed approach, we achieve an overall accuracy of $93\%$ on the coronavirus variant identification task. We also provide details regarding how the proposed model follows the main laws of federated learning, such as Laws of data ownership, data privacy, model aggregation, and model heterogeneity. Since the proposed model is distributed, it could scale on ``Big Data'' easily. We plan to use this proof-of-concept to implement a privacy-preserving pandemic response strategy.

View on arXiv PDF

Similar