CLAINov 3, 2022

Using Large Pre-Trained Language Model to Assist FDA in Premarket Medical Device

arXiv:2212.01217v1h-index: 1
Originality Synthesis-oriented
AI Analysis

This work addresses the FDA's need for more efficient premarket device review, though it is incremental as it applies existing NLP methods to a new regulatory dataset.

The paper tackles the problem of automating FDA medical device classification by using pre-trained language models to match device descriptions with regulatory categories, achieving high accuracy in narrowing correct labels to the top 15 results out of 2585 types and identifying completely incorrect labels, but failing to detect closely related misclassifications.

This paper proposes a possible method using natural language processing that might assist in the FDA medical device marketing process. Actual device descriptions are taken and matched with the device description in FDA Title 21 of CFR to determine their corresponding device type. Both pre-trained word embeddings such as FastText and large pre-trained sentence embedding models such as sentence transformers are evaluated on their accuracy in characterizing a piece of device description. An experiment is also done to test whether these models can identify the devices wrongly classified in the FDA database. The result shows that sentence transformer with T5 and MPNet and GPT-3 semantic search embedding show high accuracy in identifying the correct classification by narrowing down the correct label to be contained in the first 15 most likely results, as compared to 2585 types of device descriptions that must be manually searched through. On the other hand, all methods demonstrate high accuracy in identifying completely incorrectly labeled devices, but all fail to identify false device classifications that are wrong but closely related to the true label.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes