CLOct 13, 2025

Template-Based Text-to-Image Alignment for Language Accessibility: A Study on Visualizing Text Simplifications

Belkiss Souayed, Sarah Ebling, Yingqiang Gao

arXiv:2510.11314v11 citationsh-index: 1Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025)

Originality Incremental advance

AI Analysis

This work addresses visual accessibility for people with intellectual disabilities, though it is incremental as it applies structured prompting to an existing vision-language model.

This paper tackles the problem of generating accessible images from simplified texts for individuals with intellectual disabilities, finding that a Basic Object Focus prompt template achieved the highest semantic alignment and Retro style was rated most accessible by experts.

Individuals with intellectual disabilities often have difficulties in comprehending complex texts. While many text-to-image models prioritize aesthetics over accessibility, it is not clear how visual illustrations relate to text simplifications (TS) generated from them. This paper presents a structured vision-language model (VLM) prompting framework for generating accessible images from simplified texts. We designed five prompt templates, i.e., Basic Object Focus, Contextual Scene, Educational Layout, Multi-Level Detail, and Grid Layout, each following distinct spatial arrangements while adhering to accessibility constraints such as object count limits, spatial separation, and content restrictions. Using 400 sentence-level simplifications from four established TS datasets (OneStopEnglish, SimPA, Wikipedia, and ASSET), we conducted a two-phase evaluation: Phase 1 assessed prompt template effectiveness with CLIPScores, and Phase 2 involved human annotation of generated images across ten visual styles by four accessibility experts. Results show that the Basic Object Focus prompt template achieved the highest semantic alignment, indicating that visual minimalism enhances language accessibility. Expert evaluation further identified Retro style as the most accessible and Wikipedia as the most effective data source. Inter-annotator agreement varied across dimensions, with Text Simplicity showing strong reliability and Image Quality proving more subjective. Overall, our framework offers practical guidelines for accessible content generation and underscores the importance of structured prompting in AI-generated visual accessibility tools.

View on arXiv PDF

Similar