DC AI DSJul 20, 2020

A Big Data Approach for Sequences Indexing on the Cloud via Burrows Wheeler Transform

arXiv:2007.10095v11 citations

Originality Incremental advance

AI Analysis

This addresses the need for efficient big data processing in precision medicine to analyze omics data for patient categorization and therapy selection, though it is incremental as it builds on existing technologies.

The paper tackled the problem of indexing large-scale sequence data for precision medicine by proposing a distributed algorithm for computing the Burrows Wheeler transform using Apache Spark and Hadoop, enabling full utilization of cloud resources.

Indexing sequence data is important in the context of Precision Medicine, where large amounts of ``omics'' data have to be daily collected and analyzed in order to categorize patients and identify the most effective therapies. Here we propose an algorithm for the computation of Burrows Wheeler transform relying on Big Data technologies, i.e., Apache Spark and Hadoop. Our approach is the first that distributes the index computation and not only the input dataset, allowing to fully benefit of the available cloud resources.

View on arXiv PDF

Similar