QM LGMay 19, 2013

Generalized Centroid Estimators in Bioinformatics

Michiaki Hamada, Hisanori Kiryu, Wataru Iwasaki, Kiyoshi Asai

arXiv:1305.4339v115 citations

Originality Synthesis-oriented

AI Analysis

This provides a unified framework for designing estimators in bioinformatics, addressing a domain-specific problem for researchers in that field, though it appears incremental as it builds on existing principles.

The paper tackles the discrepancy between estimators and accuracy measures in bioinformatics by introducing a general class of efficient estimators for high-dimensional binary spaces, showing they fit with common accuracy measures like sensitivity and F-score and can be computed efficiently.

In a number of estimation problems in bioinformatics, accuracy measures of the target problem are usually given, and it is important to design estimators that are suitable to those accuracy measures. However, there is often a discrepancy between an employed estimator and a given accuracy measure of the problem. In this study, we introduce a general class of efficient estimators for estimation problems on high-dimensional binary spaces, which representmany fundamental problems in bioinformatics. Theoretical analysis reveals that the proposed estimators generally fit with commonly-used accuracy measures (e.g. sensitivity, PPV, MCC and F-score) as well as it can be computed efficiently in many cases, and cover a wide range of problems in bioinformatics from the viewpoint of the principle of maximum expected accuracy (MEA). It is also shown that some important algorithms in bioinformatics can be interpreted in a unified manner. Not only the concept presented in this paper gives a useful framework to design MEA-based estimators but also it is highly extendable and sheds new light on many problems in bioinformatics.

View on arXiv PDF

Similar