Not All Visitors are Bilingual: A Measurement Study of the Multilingual Web from an Accessibility Perspective
This addresses accessibility barriers for visually impaired users in multilingual contexts, but it is incremental as it builds on existing accessibility testing with a new dataset and tool.
The study tackled the problem of multilingual web accessibility for users with visual impairments by analyzing 120,000 websites across 12 non-Latin script languages, finding widespread neglect of accessibility hints that reduces screen reader effectiveness.
English is the predominant language on the web, powering nearly half of the world's top ten million websites. Support for multilingual content is nevertheless growing, with many websites increasingly combining English with regional or native languages in both visible content and hidden metadata. This multilingualism introduces significant barriers for users with visual impairments, as assistive technologies like screen readers frequently lack robust support for non-Latin scripts and misrender or mispronounce non-English text, compounding accessibility challenges across diverse linguistic contexts. Yet, large-scale studies of this issue have been limited by the lack of comprehensive datasets on multilingual web content. To address this gap, we introduce LangCrUX, the first large-scale dataset of 120,000 popular websites across 12 languages that primarily use non-Latin scripts. Leveraging this dataset, we conduct a systematic analysis of multilingual web accessibility and uncover widespread neglect of accessibility hints. We find that these hints often fail to reflect the language diversity of visible content, reducing the effectiveness of screen readers and limiting web accessibility. We finally propose Kizuki, a language-aware automated accessibility testing extension to account for the limited utility of language-inconsistent accessibility hints.