CVAICLFeb 11, 2025

Vision-Language Models for Edge Networks: A Comprehensive Survey

arXiv:2502.07855v244 citationsh-index: 17IEEE Internet of Things Journal
AI Analysis

This work addresses the problem of deploying advanced AI models on edge devices for various stakeholders, including those in healthcare, environmental monitoring, and autonomous systems, which is an incremental step towards making AI more accessible in resource-limited settings.

This survey tackles the challenge of deploying Vision Large Language Models (VLMs) on resource-constrained edge devices, resulting in the exploration of various model compression techniques and efficient training methods. The survey highlights the growing impact of lightweight VLMs across diverse applications, including healthcare and autonomous systems.

Vision Large Language Models (VLMs) combine visual understanding with natural language processing, enabling tasks like image captioning, visual question answering, and video analysis. While VLMs show impressive capabilities across domains such as autonomous vehicles, smart surveillance, and healthcare, their deployment on resource-constrained edge devices remains challenging due to processing power, memory, and energy limitations. This survey explores recent advancements in optimizing VLMs for edge environments, focusing on model compression techniques, including pruning, quantization, knowledge distillation, and specialized hardware solutions that enhance efficiency. We provide a detailed discussion of efficient training and fine-tuning methods, edge deployment challenges, and privacy considerations. Additionally, we discuss the diverse applications of lightweight VLMs across healthcare, environmental monitoring, and autonomous systems, illustrating their growing impact. By highlighting key design strategies, current challenges, and offering recommendations for future directions, this survey aims to inspire further research into the practical deployment of VLMs, ultimately making advanced AI accessible in resource-limited settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes