QM MLMay 19, 2017

Beyond similarity assessment: Selecting the optimal model for sequence alignment via the Factorized Asymptotic Bayesian algorithm

arXiv:1705.06911v21.2

Originality Incremental advance

AI Analysis

This work addresses a specific bottleneck in bioinformatics for researchers by providing an incremental improvement in model selection for sequence alignment.

The authors tackled the problem of selecting the optimal number of hidden states in Pair Hidden Markov Models for sequence alignment, developing a method based on Factorized Information Criteria that improved alignment accuracy and selected more complex models in DNA datasets from multiple species.

Pair Hidden Markov Models (PHMMs) are probabilistic models used for pairwise sequence alignment, a quintessential problem in bioinformatics. PHMMs include three types of hidden states: match, insertion and deletion. Most previous studies have used one or two hidden states for each PHMM state type. However, few studies have examined the number of states suitable for representing sequence data or improving alignment accuracy.We developed a novel method to select superior models (including the number of hidden states) for PHMM. Our method selects models with the highest posterior probability using Factorized Information Criteria (FIC), which is widely utilised in model selection for probabilistic models with hidden variables. Our simulations indicated this method has excellent model selection capabilities with slightly improved alignment accuracy. We applied our method to DNA datasets from 5 and 28 species, ultimately selecting more complex models than those used in previous studies.

View on arXiv PDF

Similar