LASIGE and UNICAGE solution to the NASA LitCoin NLP Competition
This work addresses efficiency and flexibility issues in biomedical NLP for researchers, but it is incremental as it combines existing methods.
The authors tackled the challenge of processing heterogeneous biomedical text by integrating industry data engineering tools with academic Named Entity Recognition and Relation Extraction systems, achieving 7th place out of about 200 teams in the 2022 LitCoin NLP Competition.
Biomedical Natural Language Processing (NLP) tends to become cumbersome for most researchers, frequently due to the amount and heterogeneity of text to be processed. To address this challenge, the industry is continuously developing highly efficient tools and creating more flexible engineering solutions. This work presents the integration between industry data engineering solutions for efficient data processing and academic systems developed for Named Entity Recognition (LasigeUnicage\_NER) and Relation Extraction (BiOnt). Our design reflects an integration of those components with external knowledge in the form of additional training data from other datasets and biomedical ontologies. We used this pipeline in the 2022 LitCoin NLP Challenge, where our team LasigeUnicage was awarded the 7th Prize out of approximately 200 participating teams, reflecting a successful collaboration between the academia (LASIGE) and the industry (Unicage). The software supporting this work is available at \url{https://github.com/lasigeBioTM/Litcoin-Lasige_Unicage}.