Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food
This addresses the challenge of nutritional understanding from visual data for public health applications, though it appears incremental as it builds on existing dataset creation approaches.
The authors tackled the problem of automatically estimating nutritional content from food images by introducing Nutrition5k, a dataset of 5,000 diverse real-world dishes with detailed annotations. They demonstrated that a computer vision algorithm trained on this dataset can predict caloric and macronutrient values more accurately than professional nutritionists.
Understanding the nutritional content of food from visual data is a challenging computer vision problem, with the potential to have a positive and widespread impact on public health. Studies in this area are limited to existing datasets in the field that lack sufficient diversity or labels required for training models with nutritional understanding capability. We introduce Nutrition5k, a novel dataset of 5k diverse, real world food dishes with corresponding video streams, depth images, component weights, and high accuracy nutritional content annotation. We demonstrate the potential of this dataset by training a computer vision algorithm capable of predicting the caloric and macronutrient values of a complex, real world dish at an accuracy that outperforms professional nutritionists. Further we present a baseline for incorporating depth sensor data to improve nutrition predictions. We will publicly release Nutrition5k in the hope that it will accelerate innovation in the space of nutritional understanding.