AIMay 24
Privacy-Preserving Local Language Models for Longitudinal Data Retrieval in Chronic Dermatologic Disease: Implementation in Pemphigus PatientsAbdurrahim Yilmaz, Ayşe Esra Koku Aksu, Duygu Yamen et al.
Chronic dermatologic diseases such as pemphigus require long-term follow-up, generating extensive longitudinal clinical documentation that is difficult to review comprehensively during routine visits and increasing clinician workload as well as the risk of missing critical historical information. We evaluated whether a locally deployed, privacy-preserving small language model (SLM) could retrieve structured clinical features and generate longitudinal summaries from long-term dermatology follow-up records. In this retrospective case series, thirty pemphigus patients contributed 541 visit notes that were aggregated into full longitudinal records (89,336 words); 56 clinically relevant features were annotated by two expert dermatologists. The locally deployed SLM (Qwen3 4B Thinking 2507) was queried with each complete record to retrieve 56 features and generate one final report summaries. Across 1,680 feature retrieval tasks, mean accuracy was 82.25%. Dermatologists' ratings of AI-generated summaries were high for overall quality (8.23-8.47), clinical accuracy (7.93-8.20), and usefulness (8.47-8.50), with no significant inter-evaluator differences and an overall preference for AI summaries in 53.3% of evaluations. These findings suggest that privacy-preserving, locally deployed SLMs can outperform medical experts and reliably generate clinically meaningful longitudinal summaries. SLMs may support clinical decision-making when integrated with appropriate oversight.
CVJan 20
DermaBench: A Clinician-Annotated Benchmark Dataset for Dermatology Visual Question Answering and ReasoningAbdurrahim Yilmaz, Ozan Erdem, Ece Gokyayla et al.
Vision-language models (VLMs) are increasingly important in medical applications; however, their evaluation in dermatology remains limited by datasets that focus primarily on image-level classification tasks such as lesion recognition. While valuable for recognition, such datasets cannot assess the full visual understanding, language grounding, and clinical reasoning capabilities of multimodal models. Visual question answering (VQA) benchmarks are required to evaluate how models interpret dermatological images, reason over fine-grained morphology, and generate clinically meaningful descriptions. We introduce DermaBench, a clinician-annotated dermatology VQA benchmark built on the Diverse Dermatology Images (DDI) dataset. DermaBench comprises 656 clinical images from 570 unique patients spanning Fitzpatrick skin types I-VI. Using a hierarchical annotation schema with 22 main questions (single-choice, multi-choice, and open-ended), expert dermatologists annotated each image for diagnosis, anatomic site, lesion morphology, distribution, surface features, color, and image quality, together with open-ended narrative descriptions and summaries, yielding approximately 14.474 VQA-style annotations. DermaBench is released as a metadata-only dataset to respect upstream licensing and is publicly available at Harvard Dataverse.
CVFeb 22
Artefact-Aware Fungal Detection in Dermatophytosis: A Real-Time Transformer-Based Approach for KOH MicroscopyRana Gursoy, Abdurrahim Yilmaz, Baris Kizilyaprak et al.
Dermatophytosis is commonly assessed using potassium hydroxide (KOH) microscopy, yet accurate recognition of fungal hyphae is hindered by artefacts, heterogeneous keratin clearance, and notable inter-observer variability. This study presents a transformer-based detection framework using the RT-DETR model architecture to achieve precise, query-driven localization of fungal structures in high-resolution KOH images. A dataset of 2,540 routinely acquired microscopy images was manually annotated using a multi-class strategy to explicitly distinguish fungal elements from confounding artefacts. The model was trained with morphology-preserving augmentations to maintain the structural integrity of thin hyphae. Evaluation on an independent test set demonstrated robust object-level performance, with a recall of 0.9737, precision of 0.8043, and an AP@0.50 of 93.56%. When aggregated for image-level diagnosis, the model achieved 100% sensitivity and 98.8% accuracy, correctly identifying all positive cases without missing a single diagnosis. Qualitative outputs confirmed the robust localization of low-contrast hyphae even in artefact-rich fields. These results highlight that an artificial intelligence (AI) system can serve as a highly reliable, automated screening tool, effectively bridging the gap between image-level analysis and clinical decision-making in dermatomycology.
CVJan 31, 2025Code
DermaSynth: Rich Synthetic Image-Text Pairs Using Open Access Dermatology DatasetsAbdurrahim Yilmaz, Furkan Yuceyalcin, Ece Gokyayla et al.
A major barrier to developing vision large language models (LLMs) in dermatology is the lack of large image--text pairs dataset. We introduce DermaSynth, a dataset comprising of 92,020 synthetic image--text pairs curated from 45,205 images (13,568 clinical and 35,561 dermatoscopic) for dermatology-related clinical tasks. Leveraging state-of-the-art LLMs, using Gemini 2.0, we used clinically related prompts and self-instruct method to generate diverse and rich synthetic texts. Metadata of the datasets were incorporated into the input prompts by targeting to reduce potential hallucinations. The resulting dataset builds upon open access dermatological image repositories (DERM12345, BCN20000, PAD-UFES-20, SCIN, and HIBA) that have permissive CC-BY-4.0 licenses. We also fine-tuned a preliminary Llama-3.2-11B-Vision-Instruct model, DermatoLlama 1.0, on 5,000 samples. We anticipate this dataset to support and accelerate AI research in dermatology. Data and code underlying this work are accessible at https://github.com/abdurrahimyilmaz/DermaSynth.
IVJun 11, 2024
DERM12345: A Large, Multisource Dermatoscopic Skin Lesion Dataset with 38 SubclassesAbdurrahim Yilmaz, Sirin Pekcan Yasar, Gulsum Gencoglan et al.
Skin lesion datasets provide essential information for understanding various skin conditions and developing effective diagnostic tools. They aid the artificial intelligence-based early detection of skin cancer, facilitate treatment planning, and contribute to medical education and research. Published large datasets have partially coverage the subclassifications of the skin lesions. This limitation highlights the need for more expansive and varied datasets to reduce false predictions and help improve the failure analysis for skin lesions. This study presents a diverse dataset comprising 12,345 dermatoscopic images with 38 subclasses of skin lesions collected in Turkiye which comprises different skin types in the transition zone between Europe and Asia. Each subgroup contains high-resolution photos and expert annotations, providing a strong and reliable basis for future research. The detailed analysis of each subgroup provided in this study facilitates targeted research endeavors and enhances the depth of understanding regarding the skin lesions. This dataset distinguishes itself through a diverse structure with 5 super classes, 15 main classes, 38 subclasses and its 12,345 high-resolution dermatoscopic images.
ROAug 13, 2018
Intraoperative robotic-assisted large-area high-speed microscopic imaging and interventionPetros Giataganas, Michael Hughes, Christopher J. Payne et al.
Objective: Probe-based confocal endomicroscopy is an emerging high-magnification optical imaging technique that provides in vivo and in situ cellular-level imaging for real-time assessment of tissue pathology. Endomicroscopy could potentially be used for intraoperative surgical guidance, but it is challenging to assess a surgical site using individual microscopic images due to the limited field-of-view and difficulties associated with manually manipulating the probe. Methods: In this paper, a novel robotic device for large-area endomicroscopy imaging is proposed, demonstrating a rapid, but highly accurate, scanning mechanism with image-based motion control which is able to generate histology-like endomicroscopy mosaics. The device also includes, for the first time in robotic-assisted endomicroscopy, the capability to ablate tissue without the need for an additional tool. Results: The device achieves pre-programmed trajectories with positioning accuracy of less than 30 um, while the image-based approach demonstrated that it can suppress random motion disturbances up to 1.25 mm/s. Mosaics are presented from a range of ex vivo human and animal tissues, over areas of more than 3 mm^2, scanned in approximate 10 seconds. Conclusion: This work demonstrates the potential of the proposed instrument to generate large-area, high-resolution microscopic images for intraoperative tissue identification and margin assessment. Significance: This approach presents an important alternative to current histology techniques, significantly reducing the tissue assessment time, while simultaneously providing the capability to mark and ablate suspicious areas intraoperatively.