BMDec 19, 2025
Application of machine learning to predict food processing level using Open Food FactsNalin Arora, Aviral Chauhan, Siddhant Rana et al.
Ultra-processed foods are increasingly linked to health issues like obesity, cardiovascular disease, type 2 diabetes, and mental health disorders due to poor nutritional quality. This first-of-its-kind study at such a scale uses machine learning to classify food processing levels (NOVA) based on the Open Food Facts dataset of over 900,000 products. Models including LightGBM, Random Forest, and CatBoost were trained on nutrient concentration data. LightGBM performed best, achieving 80-85% accuracy across different nutrient panels and effectively distinguishing minimally from ultra-processed foods. Exploratory analysis revealed strong associations between higher NOVA classes and lower Nutri-Scores, indicating poorer nutritional quality. Products in NOVA 3 and 4 also had higher carbon footprints and lower Eco-Scores, suggesting greater environmental impact. Allergen analysis identified gluten and milk as common in ultra-processed items, posing risks to sensitive individuals. Categories like Cakes and Snacks were dominant in higher NOVA classes, which also had more additives, highlighting the role of ingredient modification. This study, leveraging the largest dataset of NOVA-labeled products, emphasizes the health, environmental, and allergenic implications of food processing and showcases machine learning's value in scalable classification. A user-friendly web tool is available for NOVA prediction using nutrient data: https://cosylab.iiitd.edu.in/foodlabel/.
BMDec 23, 2024
Machine learning and natural language processing models to predict the extent of food processingNalin Arora, Sumit Bhagat, Riya Dhama et al.
The dramatic increase in consumption of ultra-processed food has been associated with numerous adverse health effects. Given the public health consequences linked to ultra-processed food consumption, it is highly relevant to build computational models to predict the processing of food products. We created a range of machine learning, deep learning, and NLP models to predict the extent of food processing by integrating the FNDDS dataset of food products and their nutrient profiles with their reported NOVA processing level. Starting with the full nutritional panel of 102 features, we further implemented coarse-graining of features to 65 and 13 nutrients by dropping flavonoids and then by considering the 13-nutrient panel of FDA, respectively. LGBM Classifier and Random Forest emerged as the best model for 102 and 65 nutrients, respectively, with an F1-score of 0.9411 and 0.9345 and MCC of 0.8691 and 0.8543. For the 13-nutrient panel, Gradient Boost achieved the best F1-score of 0.9284 and MCC of 0.8425. We also implemented NLP based models, which exhibited state-of-the-art performance. Besides distilling nutrients critical for model performance, we present a user-friendly web server for predicting processing level based on the nutrient panel of a food product: https://cosylab.iiitd.edu.in/food-processing/.