IRLGDec 28, 2021

Automatic Pharma News Categorization

arXiv:2201.00688v1
Originality Synthesis-oriented
AI Analysis

This work addresses categorization challenges for pharmaceutical information systems, but is incremental as it primarily compares existing methods.

The researchers tackled the problem of automatically categorizing pharmaceutical news by comparing fine-tuning performance of multiple transformer models on a 23-category dataset, and found that an ensemble of top-performing models provided a modest improvement in F1 score.

We use a text dataset consisting of 23 news categories relevant to pharma information science, in order to compare the fine-tuning performance of multiple transformer models in a classification task. Using a well-balanced dataset with multiple autoregressive and autocoding transformation models, we compare their fine-tuning performance. To validate the winning approach, we perform diagnostics of model behavior on mispredicted instances, including inspection of category-wise metrics, evaluation of prediction certainty and assessment of latent space representations. Lastly, we propose an ensemble model consisting of the top performing individual predictors and demonstrate that this approach offers a modest improvement in the F1 metric.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes