CV LGMay 13, 2025

Multimodal Fusion of Glucose Monitoring and Food Imagery for Caloric Content Prediction

arXiv:2505.09018v23.6

Originality Incremental advance

AI Analysis

This work addresses the problem of dietary monitoring for individuals with Type 2 diabetes, representing an incremental improvement through multimodal fusion.

The paper tackled the challenge of accurately estimating caloric intake for Type 2 diabetes management by developing a multimodal deep learning framework that fuses glucose monitoring, demographic/microbiome data, and food images, achieving a Root Mean Squared Relative Error of 0.2544 and outperforming baselines by over 50%.

Effective dietary monitoring is critical for managing Type 2 diabetes, yet accurately estimating caloric intake remains a major challenge. While continuous glucose monitors (CGMs) offer valuable physiological data, they often fall short in capturing the full nutritional profile of meals due to inter-individual and meal-specific variability. In this work, we introduce a multimodal deep learning framework that jointly leverages CGM time-series data, Demographic/Microbiome, and pre-meal food images to enhance caloric estimation. Our model utilizes attention based encoding and a convolutional feature extraction for meal imagery, multi-layer perceptrons for CGM and Microbiome data followed by a late fusion strategy for joint reasoning. We evaluate our approach on a curated dataset of over 40 participants, incorporating synchronized CGM, Demographic and Microbiome data and meal photographs with standardized caloric labels. Our model achieves a Root Mean Squared Relative Error (RMSRE) of 0.2544, outperforming the baselines models by over 50%. These findings demonstrate the potential of multimodal sensing to improve automated dietary assessment tools for chronic disease management.

View on arXiv PDF

Similar