GN CE LGApr 5, 2015

Ultra-large alignments using Phylogeny-aware Profiles

Nam-phuong Nguyen, Siavash Mirarab, Keerthana Kumar, Tandy Warnow

arXiv:1504.01142v1130 citationsHas Code

AI Analysis

This addresses a critical problem in biology for researchers needing reliable alignments in evolutionary studies and homology detection, offering a novel method for handling fragmentary data.

The paper tackles the challenge of accurate multiple sequence alignment for large datasets with fragmentary sequences by introducing UPP, which uses an Ensemble of Hidden Markov Models, resulting in highly accurate alignments for ultra-large nucleotide and amino acid datasets.

Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments (MSAs) and phylogenetic trees of large datasets. However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences. We present UPP, an MSA method that uses a new machine learning technique - the Ensemble of Hidden Markov Models - that we propose here. UPP produces highly accurate alignments for both nucleotide and amino acid sequences, even on ultra-large datasets or datasets containing fragmentary sequences. UPP is available at https://github.com/smirarab/sepp.

View on arXiv PDF Code

Similar