3 Papers

24.1SEMay 26
LLM Based Web Accessibility Repair: An Empirical Study of Detection, Remediation, and Cost

Oluwatoyosi Oyelayo, Ghada Abushaqra, Parham Asadi et al.

Ensuring web accessibility at scale remains challenging because rule-based tools provide limited coverage while manual remediation is costly and error-prone. This paper evaluates large language model based agents, specifically Kimi K2.5, for automated accessibility detection and repair compared with rule-based approaches. For detection, the LLM achieves performance comparable to rule-based tools, with F1 around 0.65, strong semantic understanding with F1 of 0.83, but lower reliability for syntactic and layout-related violations. For remediation, LLM-generated fixes are syntactically valid in over 99.7 percent of cases and improve accessibility compliance in 80.2 percent of instances, reducing violations from 3.98 to 1.7 per file. However, fewer than 26 percent of cases are fully resolved, and about 30 percent of patches introduce structural changes. We also find that iterative agent-based refinement increases computational cost by 52 percent and API usage by 1.64 times without improving remediation outcomes. These findings indicate that while LLMs are effective for partial accessibility repair, they are insufficient for complete and reliable remediation. Scalable accessibility solutions require hybrid approaches that combine LLM capabilities with rule-based validation and constraint-aware correction mechanisms.

1.7CVMay 25
CNNs, Transformers, Hybrid, and Vision Language Models for Skin Cancer Detection

Durjoy Dey, Yuhong Yan, Hassan Hajjdiab

Skin cancer is a common and fast rising malignancy worldwide. Early detection is critical for improving outcomes. Deep learning models trained on dermoscopic and clinical images can support automated and fast triage. However, many studies evaluate only a limited set of architectures. Experimental setups also vary across studies. In this paper, we present a unified evaluation of twelve deep learning models for binary skin cancer detection on the PAD-UFES-20 dataset. The models span four families: convolutional neural networks (CNN), vision transformers (ViT), hybrid convolution transformer backbones, and vision language models (VLM). Performance is assessed using AUC, the maximum F1 score with its precision and recall, and sensitivity at 80% specificity, reflecting screening oriented requirements. Our results show that well tuned CNNs already provide strong baselines, but transformer based families consistently improve discrimination. Hybrid models (MaxViT Tiny, CoAtNet0) and a SigLIP based VLM achieve the best overall trade off between ranking performance and clinically relevant operating points, while CLIP based model offers high precision. The full codebase for all experiments is publicly released. Together, these findings offer practical guidance on which model families are most suitable for real world deployment in skin cancer screening and establish a reproducible reference point for future work on PAD-UFES-20.

3.8CVMay 25
Benchmarking Convolutional, Transformer, Hybrid, and Vision Language Models for Multi Disease Retinal Screening

Durjoy Dey, Aymane Ajbar, Yuhong Yan

Modern deep learning offers powerful tools for automated retinal screening, but it remains unclear how different visual model families compare in realistic multi-disease settings and under domain shift. In this work, we benchmark twelve architectures across four model families: convolutional neural networks, vision transformers, hybrid CNN-transformer backbones, and vision-language models, using the Retinal Fundus Multi-disease Image Dataset (RFMiD). We evaluate two tasks: binary screening for any retinal disease and multi-label classification across 28 disease classes. Using standardized training, calibration, and evaluation protocols, we report AUC, F1, precision, recall, and sensitivity at a clinically relevant operating point with specificity near 80%. On RFMiD, all architectures perform well on binary screening, with AUC above 84%, but attention-based models perform best. SwinTiny and the hybrid CoAtNet0 and MaxViTTiny models achieve the strongest binary screening results and improve macro and micro F1 in the multi-label setting. Vision-language models, including CLIP ViT-B/16 and SigLIP-Base384, are competitive with CNN baselines but do not surpass the best transformer and hybrid backbones. In external validation on Messidor-2 for referable diabetic retinopathy, AUC ranges from 66.8% to 84.7%, with hybrid and transformer models again showing strong performance. These results provide a reproducible reference for model selection in multi-disease retinal screening and guide future automated screening tools for clinical deployment.