CVIVMar 24, 2025

Improving Food Image Recognition with Noisy Vision Transformer

arXiv:2503.18997v110 citationsh-index: 8EMBC
Originality Incremental advance
AI Analysis

This work addresses food classification for dietary assessment and healthcare, showing incremental improvements over existing methods.

The study tackled food image recognition by applying Noisy Vision Transformers (NoisyViT) to reduce task complexity and improve accuracy, achieving Top-1 accuracies of 95%, 99.5%, and 96.6% on three benchmark datasets.

Food image recognition is a challenging task in computer vision due to the high variability and complexity of food images. In this study, we investigate the potential of Noisy Vision Transformers (NoisyViT) for improving food classification performance. By introducing noise into the learning process, NoisyViT reduces task complexity and adjusts the entropy of the system, leading to enhanced model accuracy. We fine-tune NoisyViT on three benchmark datasets: Food2K (2,000 categories, ~1M images), Food-101 (101 categories, ~100K images), and CNFOOD-241 (241 categories, ~190K images). The performance of NoisyViT is evaluated against state-of-the-art food recognition models. Our results demonstrate that NoisyViT achieves Top-1 accuracies of 95%, 99.5%, and 96.6% on Food2K, Food-101, and CNFOOD-241, respectively, significantly outperforming existing approaches. This study underscores the potential of NoisyViT for dietary assessment, nutritional monitoring, and healthcare applications, paving the way for future advancements in vision-based food computing. Code for reproducing NoisyViT for food recognition is available at NoisyViT_Food.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes