Fatih Uysal

IV
h-index1
4papers
141citations
Novelty24%
AI Score33

4 Papers

LGJun 13, 2025Code
SWE-Bench-CL: Continual Learning for Coding Agents

Thomas Joshi, Shayan Chowdhury, Fatih Uysal

Large Language Models (LLMs) have achieved impressive results on static code-generation benchmarks, but real-world software development unfolds as a continuous stream of evolving issues, fixes, and feature requests. We introduce SWE-Bench-CL, a novel continual learning benchmark built on the human-verified SWE-Bench Verified dataset introduced by OpenAI and Princeton-NLP in 2024. By organizing GitHub issues into chronologically ordered sequences that reflect natural repository evolution, SWE-Bench-CL enables direct evaluation of an agent's ability to accumulate experience, transfer knowledge across tasks, and resist catastrophic forgetting. We complement the dataset with (i) a preliminary analysis of inter-task structural similarity and contextual sensitivity, (ii) an interactive LangGraph-based evaluation framework augmented with a FAISS-backed semantic memory module, and (iii) a suite of specialized continual learning metrics -- including average accuracy, forgetting, forward/backward transfer, tool-use efficiency, and a generalized Composite Continual Learning Score and CL-F-beta score -- to capture the stability-plasticity trade-off. We outline a rigorous experimental protocol comparing memory-enabled and memory-disabled agents across diverse Python repositories. All code and data are publicly available at https://github.com/thomasjoshi/agents-never-forget, providing the community with a reproducible platform for developing more adaptive and robust AI agents in software engineering.

IVNov 14, 2021Code
Fracture Detection in Wrist X-ray Images Using Deep Learning-Based Object Detection Models

Fırat Hardalaç, Fatih Uysal, Ozan Peker et al.

Hospitals, especially their emergency services, receive a high number of wrist fracture cases. For correct diagnosis and proper treatment of these, images obtained from various medical equipment must be viewed by physicians, along with the patients medical records and physical examination. The aim of this study is to perform fracture detection by use of deep learning on wrist Xray images to support physicians in the diagnosis of these fractures, particularly in the emergency services. Using SABL, RegNet, RetinaNet, PAA, Libra R_CNN, FSAF, Faster R_CNN, Dynamic R_CNN and DCN deep learning based object detection models with various backbones, 20 different fracture detection procedures were performed on Gazi University Hospitals dataset of wrist Xray images. To further improve these procedures, five different ensemble models were developed and then used to reform an ensemble model to develop a unique detection model, wrist fracture detection_combo (WFD_C). From 26 different models for fracture detection, the highest detection result obtained was 0.8639 average precision (AP50) in the WFD-C model. Huawei Turkey R&D Center supports this study within the scope of the ongoing cooperation project coded 071813 between Gazi University, Huawei and Medskor. Code is available at https://github.com/fatihuysal88/wrist-d

CLJan 12, 2022
Detection of Increased Time Intervals of Anti-Vaccine Tweets for COVID-19 Vaccine with BERT Model

Ülkü Tuncer Küçüktaş, Fatih Uysal, Fırat Hardalaç et al.

The most effective of the solutions against Covid-19 is the various vaccines developed. Distrust of vaccines can hinder the rapid and effective use of this remedy. One of the means of expressing the thoughts of society is social media. Determining the time intervals during which anti-vaccination increases in social media can help institutions determine the strategy to be used in combating anti-vaccination. Recording and tracking every tweet entered with human labor would be inefficient, so various automation solutions are needed. In this study, The Bidirectional Encoder Representations from Transformers (BERT) model, which is a deep learning-based natural language processing (NLP) model, was used. In a dataset of 1506 tweets divided into four different categories as news, irrelevant, anti-vaccine, and vaccine supporters, the model was trained with a learning rate of 5e-6 for 25 epochs. To determine the intervals in which anti-vaccine tweets are concentrated, the categories to which 652840 tweets belong were determined by using the trained model. The change of the determined categories overtime was visualized and the events that could cause the change were determined. As a result of model training, in the test dataset, the f-score of 0.81 and AUC values for different classes were obtained as 0.99,0.91, 0.92, 0.92, respectively. In this model, unlike the studies in the literature, an auxiliary system is designed that provides data that institutions can use when determining their strategy by measuring and visualizing the frequency of anti-vaccine tweets in a time interval, different from detecting and censoring such tweets.

IVJan 31, 2021
Classification of Shoulder X-Ray Images with Deep Learning Ensemble Models

Fatih Uysal, Fırat Hardalaç, Ozan Peker et al.

Fractures occur in the shoulder area, which has a wider range of motion than other joints in the body, for various reasons. To diagnose these fractures, data gathered from Xradiation (X-ray), magnetic resonance imaging (MRI), or computed tomography (CT) are used. This study aims to help physicians by classifying shoulder images taken from X-ray devices as fracture / non-fracture with artificial intelligence. For this purpose, the performances of 26 deep learning-based pretrained models in the detection of shoulder fractures were evaluated on the musculoskeletal radiographs (MURA) dataset, and two ensemble learning models (EL1 and EL2) were developed. The pretrained models used are ResNet, ResNeXt, DenseNet, VGG, Inception, MobileNet, and their spinal fully connected (Spinal FC) versions. In the EL1 and EL2 models developed using pretrained models with the best performance, test accuracy was 0.8455,0.8472, Cohens kappa was 0.6907, 0.6942 and the area that was related with fracture class under the receiver operating characteristic (ROC) curve (AUC) was 0.8862,0.8695. As a result of 28 different classifications in total, the highest test accuracy and Cohens kappa values were obtained in the EL2 model, and the highest AUC value was obtained in the EL1 model.