BMLGDec 23, 2024

Machine learning and natural language processing models to predict the extent of food processing

arXiv:2412.17217v15 citationsh-index: 2J Food Compos Anal
Originality Synthesis-oriented
AI Analysis

This addresses public health concerns by helping identify ultra-processed foods, but it is incremental as it applies existing methods to a new dataset.

The study tackled predicting food processing levels using machine learning and NLP models on nutrient data, achieving F1-scores up to 0.9411 and MCC up to 0.8691, and developed a web server for practical use.

The dramatic increase in consumption of ultra-processed food has been associated with numerous adverse health effects. Given the public health consequences linked to ultra-processed food consumption, it is highly relevant to build computational models to predict the processing of food products. We created a range of machine learning, deep learning, and NLP models to predict the extent of food processing by integrating the FNDDS dataset of food products and their nutrient profiles with their reported NOVA processing level. Starting with the full nutritional panel of 102 features, we further implemented coarse-graining of features to 65 and 13 nutrients by dropping flavonoids and then by considering the 13-nutrient panel of FDA, respectively. LGBM Classifier and Random Forest emerged as the best model for 102 and 65 nutrients, respectively, with an F1-score of 0.9411 and 0.9345 and MCC of 0.8691 and 0.8543. For the 13-nutrient panel, Gradient Boost achieved the best F1-score of 0.9284 and MCC of 0.8425. We also implemented NLP based models, which exhibited state-of-the-art performance. Besides distilling nutrients critical for model performance, we present a user-friendly web server for predicting processing level based on the nutrient panel of a food product: https://cosylab.iiitd.edu.in/food-processing/.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes