CLDec 1, 2022

Biomedical NER for the Enterprise with Distillated BERN2 and the Kazu Framework

Harvard
arXiv:2212.00223v1285 citationsh-index: 45Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the specific needs of pharmaceutical companies for efficient and extensible biomedical NER in drug discovery, though it is incremental as it builds on existing technologies like BERN2.

The authors tackled the lack of a comprehensive open-source biomedical NER system for pharmaceutical companies by developing Kazu, a scalable framework that integrates a computationally efficient BERN2 model (TinyBERN2) and other BioNLP technologies, resulting in an open-source tool available for enterprise use.

In order to assist the drug discovery/development process, pharmaceutical companies often apply biomedical NER and linking techniques over internal and public corpora. Decades of study of the field of BioNLP has produced a plethora of algorithms, systems and datasets. However, our experience has been that no single open source system meets all the requirements of a modern pharmaceutical company. In this work, we describe these requirements according to our experience of the industry, and present Kazu, a highly extensible, scalable open source framework designed to support BioNLP for the pharmaceutical sector. Kazu is a built around a computationally efficient version of the BERN2 NER model (TinyBERN2), and subsequently wraps several other BioNLP technologies into one coherent system. KAZU framework is open-sourced: https://github.com/AstraZeneca/KAZU

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes