DC LG MLSep 29, 2017

Toward Scalable Machine Learning and Data Mining: the Bioinformatics Case

Faraz Faghri, Sayed Hadi Hashemi, Mohammad Babaeizadeh, Mike A. Nalls, Saurabh Sinha, Roy H. Campbell

arXiv:1710.00112v11.24 citations

Originality Synthesis-oriented

AI Analysis

This work targets bioinformatics researchers and scalable computing experts by providing a focused list of algorithms for optimization, but it is incremental as it primarily reviews and prioritizes existing methods without introducing new techniques.

The paper identifies widely used machine learning and data mining algorithms in bioinformatics to guide scalable computing experts in optimizing these algorithms for big data challenges, aiming to address the data deluge in computational biology.

In an effort to overcome the data deluge in computational biology and bioinformatics and to facilitate bioinformatics research in the era of big data, we identify some of the most influential algorithms that have been widely used in the bioinformatics community. These top data mining and machine learning algorithms cover classification, clustering, regression, graphical model-based learning, and dimensionality reduction. The goal of this study is to guide the focus of scalable computing experts in the endeavor of applying new storage and scalable computation designs to bioinformatics algorithms that merit their attention most, following the engineering maxim of "optimize the common case".

View on arXiv PDF

Similar