CLApr 7, 2022

tmVar 3.0: an improved variant concept recognition and normalization tool

arXiv:2204.03637v131 citationsh-index: 77
Originality Incremental advance
AI Analysis

This tool improves variant information extraction from scientific literature for researchers and clinicians, but it is incremental as it builds on previous versions.

The authors tackled the problem of limited recognition scope and precision in automated variant concept recognition and normalization tools by developing tmVar 3.0, which achieved over 90% accuracy in F-measure on three benchmarking datasets.

Previous studies have shown that automated text-mining tools are becoming increasingly important for successfully unlocking variant information in scientific literature at large scale. Despite multiple attempts in the past, existing tools are still of limited recognition scope and precision. We propose tmVar 3.0: an improved variant recognition and normalization tool. Compared to its predecessors, tmVar 3.0 is able to recognize a wide spectrum of variant related entities (e.g., allele and copy number variants), and to group different variant mentions belonging to the same concept in an article for improved accuracy. Moreover, tmVar3 provides additional variant normalization options such as allele-specific identifiers from the ClinGen Allele Registry. tmVar3 exhibits a state-of-the-art performance with over 90% accuracy in F-measure in variant recognition and normalization, when evaluated on three independent benchmarking datasets. tmVar3 is freely available for download. We have also processed the entire PubMed and PMC with tmVar3 and released its annotations on our FTP. Availability: ftp://ftp.ncbi.nlm.nih.gov/pub/lu/tmVar3

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes