38.3AIMay 24
Privacy-Preserving Local Language Models for Longitudinal Data Retrieval in Chronic Dermatologic Disease: Implementation in Pemphigus PatientsAbdurrahim Yilmaz, Ayşe Esra Koku Aksu, Duygu Yamen et al.
Chronic dermatologic diseases such as pemphigus require long-term follow-up, generating extensive longitudinal clinical documentation that is difficult to review comprehensively during routine visits and increasing clinician workload as well as the risk of missing critical historical information. We evaluated whether a locally deployed, privacy-preserving small language model (SLM) could retrieve structured clinical features and generate longitudinal summaries from long-term dermatology follow-up records. In this retrospective case series, thirty pemphigus patients contributed 541 visit notes that were aggregated into full longitudinal records (89,336 words); 56 clinically relevant features were annotated by two expert dermatologists. The locally deployed SLM (Qwen3 4B Thinking 2507) was queried with each complete record to retrieve 56 features and generate one final report summaries. Across 1,680 feature retrieval tasks, mean accuracy was 82.25%. Dermatologists' ratings of AI-generated summaries were high for overall quality (8.23-8.47), clinical accuracy (7.93-8.20), and usefulness (8.47-8.50), with no significant inter-evaluator differences and an overall preference for AI summaries in 53.3% of evaluations. These findings suggest that privacy-preserving, locally deployed SLMs can outperform medical experts and reliably generate clinically meaningful longitudinal summaries. SLMs may support clinical decision-making when integrated with appropriate oversight.
CVJan 20
DermaBench: A Clinician-Annotated Benchmark Dataset for Dermatology Visual Question Answering and ReasoningAbdurrahim Yilmaz, Ozan Erdem, Ece Gokyayla et al.
Vision-language models (VLMs) are increasingly important in medical applications; however, their evaluation in dermatology remains limited by datasets that focus primarily on image-level classification tasks such as lesion recognition. While valuable for recognition, such datasets cannot assess the full visual understanding, language grounding, and clinical reasoning capabilities of multimodal models. Visual question answering (VQA) benchmarks are required to evaluate how models interpret dermatological images, reason over fine-grained morphology, and generate clinically meaningful descriptions. We introduce DermaBench, a clinician-annotated dermatology VQA benchmark built on the Diverse Dermatology Images (DDI) dataset. DermaBench comprises 656 clinical images from 570 unique patients spanning Fitzpatrick skin types I-VI. Using a hierarchical annotation schema with 22 main questions (single-choice, multi-choice, and open-ended), expert dermatologists annotated each image for diagnosis, anatomic site, lesion morphology, distribution, surface features, color, and image quality, together with open-ended narrative descriptions and summaries, yielding approximately 14.474 VQA-style annotations. DermaBench is released as a metadata-only dataset to respect upstream licensing and is publicly available at Harvard Dataverse.
CVFeb 22
Artefact-Aware Fungal Detection in Dermatophytosis: A Real-Time Transformer-Based Approach for KOH MicroscopyRana Gursoy, Abdurrahim Yilmaz, Baris Kizilyaprak et al.
Dermatophytosis is commonly assessed using potassium hydroxide (KOH) microscopy, yet accurate recognition of fungal hyphae is hindered by artefacts, heterogeneous keratin clearance, and notable inter-observer variability. This study presents a transformer-based detection framework using the RT-DETR model architecture to achieve precise, query-driven localization of fungal structures in high-resolution KOH images. A dataset of 2,540 routinely acquired microscopy images was manually annotated using a multi-class strategy to explicitly distinguish fungal elements from confounding artefacts. The model was trained with morphology-preserving augmentations to maintain the structural integrity of thin hyphae. Evaluation on an independent test set demonstrated robust object-level performance, with a recall of 0.9737, precision of 0.8043, and an AP@0.50 of 93.56%. When aggregated for image-level diagnosis, the model achieved 100% sensitivity and 98.8% accuracy, correctly identifying all positive cases without missing a single diagnosis. Qualitative outputs confirmed the robust localization of low-contrast hyphae even in artefact-rich fields. These results highlight that an artificial intelligence (AI) system can serve as a highly reliable, automated screening tool, effectively bridging the gap between image-level analysis and clinical decision-making in dermatomycology.
CVJan 31, 2025Code
DermaSynth: Rich Synthetic Image-Text Pairs Using Open Access Dermatology DatasetsAbdurrahim Yilmaz, Furkan Yuceyalcin, Ece Gokyayla et al.
A major barrier to developing vision large language models (LLMs) in dermatology is the lack of large image--text pairs dataset. We introduce DermaSynth, a dataset comprising of 92,020 synthetic image--text pairs curated from 45,205 images (13,568 clinical and 35,561 dermatoscopic) for dermatology-related clinical tasks. Leveraging state-of-the-art LLMs, using Gemini 2.0, we used clinically related prompts and self-instruct method to generate diverse and rich synthetic texts. Metadata of the datasets were incorporated into the input prompts by targeting to reduce potential hallucinations. The resulting dataset builds upon open access dermatological image repositories (DERM12345, BCN20000, PAD-UFES-20, SCIN, and HIBA) that have permissive CC-BY-4.0 licenses. We also fine-tuned a preliminary Llama-3.2-11B-Vision-Instruct model, DermatoLlama 1.0, on 5,000 samples. We anticipate this dataset to support and accelerate AI research in dermatology. Data and code underlying this work are accessible at https://github.com/abdurrahimyilmaz/DermaSynth.
CVApr 7, 2025
An ensemble deep learning approach to detect tumors on Mohs micrographic surgery slidesAbdurrahim Yilmaz, Serra Atilla Aydin, Deniz Temur et al.
Mohs micrographic surgery (MMS) is the gold standard technique for removing high risk nonmelanoma skin cancer however, intraoperative histopathological examination demands significant time, effort, and professionality. The objective of this study is to develop a deep learning model to detect basal cell carcinoma (BCC) and artifacts on Mohs slides. A total of 731 Mohs slides from 51 patients with BCCs were used in this study, with 91 containing tumor and 640 without tumor which was defined as non-tumor. The dataset was employed to train U-Net based models that segment tumor and non-tumor regions on the slides. The segmented patches were classified as tumor, or non-tumor to produce predictions for whole slide images (WSIs). For the segmentation phase, the deep learning model success was measured using a Dice score with 0.70 and 0.67 value, area under the curve (AUC) score with 0.98 and 0.96 for tumor and non-tumor, respectively. For the tumor classification, an AUC of 0.98 for patch-based detection, and AUC of 0.91 for slide-based detection was obtained on the test dataset. We present an AI system that can detect tumors and non-tumors in Mohs slides with high success. Deep learning can aid Mohs surgeons and dermatopathologists in making more accurate decisions.
IVJun 11, 2024
DERM12345: A Large, Multisource Dermatoscopic Skin Lesion Dataset with 38 SubclassesAbdurrahim Yilmaz, Sirin Pekcan Yasar, Gulsum Gencoglan et al.
Skin lesion datasets provide essential information for understanding various skin conditions and developing effective diagnostic tools. They aid the artificial intelligence-based early detection of skin cancer, facilitate treatment planning, and contribute to medical education and research. Published large datasets have partially coverage the subclassifications of the skin lesions. This limitation highlights the need for more expansive and varied datasets to reduce false predictions and help improve the failure analysis for skin lesions. This study presents a diverse dataset comprising 12,345 dermatoscopic images with 38 subclasses of skin lesions collected in Turkiye which comprises different skin types in the transition zone between Europe and Asia. Each subgroup contains high-resolution photos and expert annotations, providing a strong and reliable basis for future research. The detailed analysis of each subgroup provided in this study facilitates targeted research endeavors and enhances the depth of understanding regarding the skin lesions. This dataset distinguishes itself through a diverse structure with 5 super classes, 15 main classes, 38 subclasses and its 12,345 high-resolution dermatoscopic images.
CVJun 30, 2021
Deep Convolutional Neural Networks for Onychomycosis DetectionAbdurrahim Yilmaz, Fatih Goktay, Rahmetullah Varol et al.
The diagnosis of superficial fungal infections in dermatology is still mostly based on manual direct microscopic examination with Potassium Hydroxide (KOH) solution. However, this method can be time consuming and its diagnostic accuracy rates vary widely depending on the clinician's experience. With the increase of neural network applications in the field of clinical microscopy, it is now possible to automate such manual processes increasing both efficiency and accuracy. This study presents a deep neural network structure that enables the rapid solutions for these problems and can perform automatic fungi detection in grayscale images without dyes. 160 microscopic field photographs containing the fungal element, obtained from patients with onychomycosis, and 297 microscopic field photographs containing dissolved keratin obtained from normal nails were collected. Smaller patches containing 4234 fungi and 4981 keratin were extracted from these images. In order to detect fungus and keratin, VGG16 and InceptionV3 models were developed. The VGG16 model had 95.98% accuracy, and the area under the curve (AUC) value of 0.9930, while the InceptionV3 model had 95.90% accuracy and the AUC value of 0.9917. However, average accuracy and AUC value of clinicians is 72.8% and 0.87, respectively. This deep learning model allows the development of an automated system that can detect fungi within microscopic images.