AI CV HCMar 17, 2025

Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions

Wan Ju Kang, Eunki Kim, Na Min An, Sangryul Kim, Haemin Choi, Ki Hoon Kwak, James Thorne

arXiv:2503.13369v13 citationsh-index: 4ACL

Originality Incremental advance

AI Analysis

This addresses the challenge of creating BLV-aligned diagram descriptions, which is important for visually impaired learners and educators, though it is incremental as it builds on existing vision-language models with a new assessment approach.

The study tackled the problem of generating diagram descriptions for blind and low-vision (BLV) users by having sighted individuals assess descriptions from vision-language models instead of creating them directly, resulting in the release of Sightation, a dataset with 5k diagrams and 137k samples that shows fine-tuning potential in downstream tasks.

Often, the needs and visual abilities differ between the annotator group and the end user group. Generating detailed diagram descriptions for blind and low-vision (BLV) users is one such challenging domain. Sighted annotators could describe visuals with ease, but existing studies have shown that direct generations by them are costly, bias-prone, and somewhat lacking by BLV standards. In this study, we ask sighted individuals to assess -- rather than produce -- diagram descriptions generated by vision-language models (VLM) that have been guided with latent supervision via a multi-pass inference. The sighted assessments prove effective and useful to professional educators who are themselves BLV and teach visually impaired learners. We release Sightation, a collection of diagram description datasets spanning 5k diagrams and 137k samples for completion, preference, retrieval, question answering, and reasoning training purposes and demonstrate their fine-tuning potential in various downstream tasks.

View on arXiv PDF

Similar