CVAIJul 23, 2025

Swin-TUNA : A Novel PEFT Approach for Accurate Food Image Segmentation

arXiv:2507.17347v3h-index: 1
Originality Incremental advance
AI Analysis

This provides an efficient solution for industrial food image processing, though it is incremental as it builds on existing PEFT and Transformer methods.

The paper tackles the problem of high computational demands in food image segmentation by proposing Swin-TUNA, a PEFT method that updates only 4% of parameters, achieving mIoU of 50.56% and 74.94% on two datasets while reducing parameters by 98.7% compared to FoodSAM.

In the field of food image processing, efficient semantic segmentation techniques are crucial for industrial applications. However, existing large-scale Transformer-based models (such as FoodSAM) face challenges in meeting practical deploymentrequirements due to their massive parameter counts and high computational resource demands. This paper introduces TUNable Adapter module (Swin-TUNA), a Parameter Efficient Fine-Tuning (PEFT) method that integrates multiscale trainable adapters into the Swin Transformer architecture, achieving high-performance food image segmentation by updating only 4% of the parameters. The core innovation of Swin-TUNA lies in its hierarchical feature adaptation mechanism: it designs separable convolutions in depth and dimensional mappings of varying scales to address the differences in features between shallow and deep networks, combined with a dynamic balancing strategy for tasks-agnostic and task-specific features. Experiments demonstrate that this method achieves mIoU of 50.56% and 74.94% on the FoodSeg103 and UECFoodPix Complete datasets, respectively, surpassing the fully parameterized FoodSAM model while reducing the parameter count by 98.7% (to only 8.13M). Furthermore, Swin-TUNA exhibits faster convergence and stronger generalization capabilities in low-data scenarios, providing an efficient solution for assembling lightweight food image.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes