CVCLNov 13, 2025

Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals

arXiv:2511.10615v11 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses accessibility for blind and low-vision users by focusing on practical deployment of lightweight models, though it is incremental as it builds on existing VLMs with new evaluation methods.

The paper tackled the challenge of making vision-language models (VLMs) accessible for blind and low-vision users by evaluating lightweight models (SmolVLM2 with 500M and 2.2B parameters) on datasets like AVCaps and Charades, and introduced two novel evaluation frameworks (Multi-Context BLV and Navigational Assistance) to assess description quality, with deployment tests on smartphones using FP32 and INT8 precision.

Large Vision-Language Models (VLMs) excel at understanding and generating video descriptions but their high memory, computation, and deployment demands hinder practical use particularly for blind and low-vision (BLV) users who depend on detailed, context-aware descriptions. To study the effect of model size on accessibility-focused description quality, we evaluate SmolVLM2 variants with 500M and 2.2B parameters across two diverse datasets: AVCaps (outdoor), and Charades (indoor). In this work, we introduce two novel evaluation frameworks specifically designed for BLV accessibility assessment: the Multi-Context BLV Framework evaluating spatial orientation, social interaction, action events, and ambience contexts; and the Navigational Assistance Framework focusing on mobility-critical information. Additionally, we conduct a systematic evaluation of four different prompt design strategies and deploy both models on a smartphone, evaluating FP32 and INT8 precision variants to assess real-world performance constraints on resource-limited mobile devices.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes