CV CLNov 13, 2025

Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals

Shruti Singh Baghel, Yash Pratap Singh Rathore, Sushovan Jena, Anurag Pradhan, Amit Shukla, Arnav Bhavsar, Pawan Goyal

arXiv:2511.10615v16.21 citationsh-index: 3

Originality Incremental advance

AI Analysis

This work addresses accessibility for blind and low-vision users by focusing on practical deployment of lightweight models, though it is incremental as it builds on existing VLMs with new evaluation methods.

The paper tackled the challenge of making vision-language models (VLMs) accessible for blind and low-vision users by evaluating lightweight models (SmolVLM2 with 500M and 2.2B parameters) on datasets like AVCaps and Charades, and introduced two novel evaluation frameworks (Multi-Context BLV and Navigational Assistance) to assess description quality, with deployment tests on smartphones using FP32 and INT8 precision.

Large Vision-Language Models (VLMs) excel at understanding and generating video descriptions but their high memory, computation, and deployment demands hinder practical use particularly for blind and low-vision (BLV) users who depend on detailed, context-aware descriptions. To study the effect of model size on accessibility-focused description quality, we evaluate SmolVLM2 variants with 500M and 2.2B parameters across two diverse datasets: AVCaps (outdoor), and Charades (indoor). In this work, we introduce two novel evaluation frameworks specifically designed for BLV accessibility assessment: the Multi-Context BLV Framework evaluating spatial orientation, social interaction, action events, and ambience contexts; and the Navigational Assistance Framework focusing on mobility-critical information. Additionally, we conduct a systematic evaluation of four different prompt design strategies and deploy both models on a smartphone, evaluating FP32 and INT8 precision variants to assess real-world performance constraints on resource-limited mobile devices.

View on arXiv PDF

Similar