CVNov 20, 2025

LLMs-based Augmentation for Domain Adaptation in Long-tailed Food Datasets

arXiv:2511.16037v1h-index: 4MMM
Originality Incremental advance
AI Analysis

This work addresses domain adaptation and long-tailed classification problems for food recognition in real-world applications, representing an incremental improvement by combining LLMs with existing techniques.

The paper tackles the challenges of domain shift and long-tailed distribution in food recognition by using large language models to generate text from images and aligning text and image features in a shared embedding space, achieving superior performance over existing methods on two food datasets.

Training a model for food recognition is challenging because the training samples, which are typically crawled from the Internet, are visually different from the pictures captured by users in the free-living environment. In addition to this domain-shift problem, the real-world food datasets tend to be long-tailed distributed and some dishes of different categories exhibit subtle variations that are difficult to distinguish visually. In this paper, we present a framework empowered with large language models (LLMs) to address these challenges in food recognition. We first leverage LLMs to parse food images to generate food titles and ingredients. Then, we project the generated texts and food images from different domains to a shared embedding space to maximize the pair similarities. Finally, we take the aligned features of both modalities for recognition. With this simple framework, we show that our proposed approach can outperform the existing approaches tailored for long-tailed data distribution, domain adaptation, and fine-grained classification, respectively, on two food datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes