CVApr 14

OmniFood8K: Single-Image Nutrition Estimation via Hierarchical Frequency-Aligned Fusion

arXiv:2604.1235657.8h-index: 28
Predicted impact top 60% in CV · last 90 daysOriginality Synthesis-oriented
AI Analysis

For researchers in food computing and nutrition estimation, this work provides a new dataset for Chinese cuisine and a method that eliminates the need for depth sensors, but the novelty is incremental.

The paper introduces OmniFood8K, a multimodal dataset of 8,036 Chinese food samples with nutritional annotations, and proposes a single-image nutrition estimation method using depth prediction and frequency-aligned fusion. The method outperforms existing approaches on multiple datasets.

Accurate estimation of food nutrition plays a vital role in promoting healthy dietary habits and personalized diet management. Most existing food datasets primarily focus on Western cuisines and lack sufficient coverage of Chinese dishes, which restricts accurate nutritional estimation for Chinese meals. Moreover, many state-of-the-art nutrition prediction methods rely on depth sensors, restricting their applicability in daily scenarios. To address these limitations, we introduce OmniFood8K, a comprehensive multimodal dataset comprising 8,036 food samples, each with detailed nutritional annotations and multi-view images. In addition, to enhance models' capability in nutritional prediction, we construct NutritionSynth-115K, a large-scale synthetic dataset that introduces compositional variations while preserving precise nutritional labels. Moreover, we propose an end-to-end framework for nutritional prediction from a single RGB image. First, we predict a depth map from a single RGB image and design the Scale-Shift Residual Adapter (SSRA) to refine it for global scale consistency and local structural preservation. Second, we propose the Frequency-Aligned Fusion Module (FAFM) to hierarchically align and fuse RGB and depth features in the frequency domain. Finally, we design a Mask-based Prediction Head (MPH) to emphasize key ingredient regions via dynamic channel selection for more accurate prediction. Extensive experiments on multiple datasets demonstrate the superiority of our method over existing approaches. Project homepage: https://yudongjian.github.io/OmniFood8K-food/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes