CLFeb 12, 2025

Data Augmentation to Improve Large Language Models in Food Hazard and Product Detection

arXiv:2502.08687v11 citationsh-index: 6Has Code
Originality Incremental advance
AI Analysis

This research addresses the problem of improving the accuracy of food hazard and product detection for individuals and organizations relying on large language models, and it does so in an incremental manner.

This study tackled the problem of improving large language models for food hazard and product detection, resulting in improved performance across key metrics such as recall, F1 score, precision, and accuracy. The use of augmented data generated by ChatGPT-4o-mini led to better model performance compared to using only the provided dataset.

The primary objective of this study is to demonstrate the impact of data augmentation using ChatGPT-4o-mini on food hazard and product analysis. The augmented data is generated using ChatGPT-4o-mini and subsequently used to train two large language models: RoBERTa-base and Flan-T5-base. The models are evaluated on test sets. The results indicate that using augmented data helped improve model performance across key metrics, including recall, F1 score, precision, and accuracy, compared to using only the provided dataset. The full code, including model training and the augmented dataset, can be found in this repository: https://github.com/AREEG94FAHAD/food-hazard-prdouct-cls

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes