A Pilot Study on Curator-Guided Multilingual Art Description for Blind and Low-Vision Audiences with Small Vision-Language Models

arXiv:2605.310809.9
Predicted impact top 53% in MM · last 90 daysOriginality Incremental advance
AI Analysis

This study addresses the problem of underserved visual art descriptions for blind and low-vision audiences, particularly in multilingual museum settings with privacy constraints, by investigating the use of small on-premise vision-language models. It is an incremental step towards improving accessibility.

This pilot study explored curator-guided multilingual art description for blind and low-vision (BLV) audiences using a small vision-language model (Qwen2.5-VL-3B-Instruct) for German, Romanian, and Serbian. It found that language-specific LoRA adapters provided more stable controllability and visually grounded description quality for Romanian and Serbian, while a single multilingual adapter was competitive for German.

Blind and low-vision (BLV) audiences remain underserved by visual art descriptions, particularly across languages and in museum settings where privacy and intellectual-property constraints may favour small on-premise vision-language models (VLMs). This pilot study investigates curator-guided multilingual art description with Qwen2.5-VL-3B-Instruct for German, Romanian, and Serbian. We construct a parallel BLV-oriented caption corpus from artwork images and metadata, and compare language-specific LoRA adapters with a single multilingual adapter under a fixed backbone and training budget. Evaluation combines automatic lexical and embedding-based metrics with an LLM-as-Judge protocol calibrated against a small Romanian BLV pilot study. Under our pilot setup, language-specific adapters show more stable controllability and visually grounded description quality for Romanian and Serbian, while multilingual adaptation remains competitive in German. We frame these findings as deployment-oriented evidence for small on-premise VLMs, and highlight the need for larger BLV user studies and broader language coverage before drawing general conclusions about multilingual accessibility.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes