SEQMApr 16, 2019

Metamorphic Testing for Quality Assurance of Protein Function Prediction Tools

arXiv:1904.08007v116 citations
Originality Synthesis-oriented
AI Analysis

This work addresses quality assurance for protein function prediction tools, which are critical for applications like drug design, but it is incremental as it applies an existing testing method to a new domain.

The authors tackled the challenge of testing protein function prediction tools, which lack a reliable test oracle due to incomplete gold-standard data, by applying metamorphic testing to nine state-of-the-art tools and found that several tools failed all test cases, raising concerns about prediction quality.

Proteins are the workhorses of life and gaining insight on their functions is of paramount importance for applications such as drug design. However, the experimental validation of functions of proteins is highly-resource consuming. Therefore, recently, automated protein function prediction (AFP) using machine learning has gained significant interest. Many of these AFP tools are based on supervised learning models trained using existing gold-standard functional annotations, which are known to be incomplete. The main challenge associated with conducting systematic testing on AFP software is the lack of a test oracle, which determines passing or failing of a test case; unfortunately, due to the incompleteness of gold-standard data, the exact expected outcomes are not well defined for the AFP task. Thus, AFP tools face the \emph{oracle problem}. In this work, we use metamorphic testing (MT) to test nine state-of-the-art AFP tools by defining a set of metamorphic relations (MRs) that apply input transformations to protein sequences. According to our results, we observe that several AFP tools fail all the test cases causing concerns over the quality of their predictions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes