4.1CVApr 25
From Pixels to Explanations: Interpretable Diabetic Retinopathy Grading with CNN-Transformer Ensembles, Visual Explainability and Vision-Language ModelsPir Bakhsh Khokhar, Carmine Gravino, Fabio Palomba et al.
The quality of diabetic retinopathy (DR) screening relies on the ability to correctly grade severity; however, many deep-learning (DL) classifiers cannot be easily interpreted in the clinical context. This study presents a methodology that combines strong discriminative models with multimodal explanations, converting retinal pixels into clinically interpretable outputs. Using the APTOS 2019 benchmark, we evaluated six representative CNN- and transformer-based backbones under a controlled protocol with stratified five-fold cross-validation. We then compared ensembling strategies (hard voting, weighted soft voting, stacking) and investigated a hybrid class-level fusion variant to exploit grade-specific advantages. For interpretability, we produced Grad-CAM++ visual attribution maps and short textual rationales using vision-language models (VLMs) conditioned on the fundus image and classifier outputs under conservative prompting constraints. Modern CNN backbones (ResNet-50 and ConvNeXt-Tiny) provided the strongest single-model baselines, with cross-validated QWK up to 0.919 and 0.914, respectively. Ensembling improved ordinal agreement, and weighted soft voting was the most consistent across folds (QWK 0.934 +/- 0.017). Hybrid class-level fusion was competitive but did not yield a statistically reliable improvement over standard fusion in paired fold comparisons (Holm-adjusted p >= 1.000). For explanation quality, Grad-CAM++ offered plausible but coarse localization, and VLM rationales were generally grade-consistent. Quantitatively, VLM variants showed a trade-off between clinical completeness and template-level semantic similarity (coverage 0.700 vs. BERTScore 0.072), while image-text alignment was comparable (CLIPScore approximately 0.34).
1.4CVSep 27, 2022
When Handcrafted Features and Deep Features Meet Mismatched Training and Test Sets for Deepfake DetectionYing Xu, Sule Yildirim Yayilgan
The accelerated growth in synthetic visual media generation and manipulation has now reached the point of raising significant concerns and posing enormous intimidations towards society. There is an imperative need for automatic detection networks towards false digital content and avoid the spread of dangerous artificial information to contend with this threat. In this paper, we utilize and compare two kinds of handcrafted features(SIFT and HoG) and two kinds of deep features(Xception and CNN+RNN) for the deepfake detection task. We also check the performance of these features when there are mismatches between training sets and test sets. Evaluation is performed on the famous FaceForensics++ dataset, which contains four sub-datasets, Deepfakes, Face2Face, FaceSwap and NeuralTextures. The best results are from Xception, where the accuracy could surpass over 99\% when the training and test set are both from the same sub-dataset. In comparison, the results drop dramatically when the training set mismatches the test set. This phenomenon reveals the challenge of creating a universal deepfake detection system.
2.9CRJan 10, 2022
An Example of Privacy and Data Protection Best Practices for Biometrics Data Processing in Border Control: Lesson Learned from SMILEMohamed Abomhara, Sule Yildirim Yayilgan
Biometric recognition is a highly adopted technology to support different kinds of applications, ranging from security and access control applications to low enforcement applications. However, such systems raise serious privacy and data protection concerns. Misuse of data, compromising the privacy of individuals and/or authorized processing of data may be irreversible and could have severe consequences on the individual's rights to privacy and data protection. This is partly due to the lack of methods and guidance for the integration of data protection and privacy by design in the system development process. In this paper, we present an example of privacy and data protection best practices to provide more guidance for data controllers and developers on how to comply with the legal obligation for data protection. These privacy and data protection best practices and considerations are based on the lessons learned from the SMart mobILity at the European land borders (SMILE) project.
2.9CRJan 10, 2022
A comparison of primary stakeholders'views on the deployment of biometric technologies in border management: Case study of SMart mobILity at the European land bordersMohamed Abomhara, Sule Yildirim Yayilgan, Livinus Obiora Nweke et al.
Advances in technology have a substantial impact on every aspect of our lives, ranging from the way we communicate to the way we travel. The Smart mobility at the European land borders (SMILE) project is geared towards the deployment of biometric technologies to optimize and monitor the flow of people at land borders. However, despite the anticipated benefits of deploying biometric technologies in border control, there are still divergent views on the use of such technologies by two primary stakeholders travelers and border authorities. In this paper, we provide a comparison of travelers and border authorities views on the deployment of biometric technologies in border management. The overall goal of this study is to enable us to understand the concerns of travelers and border guards in order to facilitate the acceptance of biometric technologies for a secure and more convenient border crossing. Our method of inquiry consisted of in person interviews with border guards (SMILE project end users), observation and field visits (to the Hungarian-Romanian and Bulgarian-Romanian borders) and questionnaires for both travelers and border guards. As a result of our investigation, two conflicting trends emerged. On one hand, border guards argued that biometric technologies had the potential to be a very effective tool that would enhance security levels and make traveler identification and authentication procedures easy, fast and convenient. On the other hand, travelers were more concerned about the technologies representing a threat to fundamental rights, personal privacy and data protection.
7.2CVNov 22, 2020
PS-DeVCEM: Pathology-sensitive deep learning model for video capsule endoscopy based on weakly labeled dataA. Mohammed, I. Farup, M. Pedersen et al.
We propose a novel pathology-sensitive deep learning model (PS-DeVCEM) for frame-level anomaly detection and multi-label classification of different colon diseases in video capsule endoscopy (VCE) data. Our proposed model is capable of coping with the key challenge of colon apparent heterogeneity caused by several types of diseases. Our model is driven by attention-based deep multiple instance learning and is trained end-to-end on weakly labeled data using video labels instead of detailed frame-by-frame annotation. The spatial and temporal features are obtained through ResNet50 and residual Long short-term memory (residual LSTM) blocks, respectively. Additionally, the learned temporal attention module provides the importance of each frame to the final label prediction. Moreover, we developed a self-supervision method to maximize the distance between classes of pathologies. We demonstrate through qualitative and quantitative experiments that our proposed weakly supervised learning model gives superior precision and F1-score reaching, 61.6% and 55.1%, as compared to three state-of-the-art video analysis methods respectively. We also show our model's ability to temporally localize frames with pathologies, without frame annotation information during training. Furthermore, we collected and annotated the first and largest VCE dataset with only video labels. The dataset contains 455 short video segments with 28,304 frames and 14 classes of colorectal diseases and artifacts. Dataset and code supporting this publication will be made available on our home page.
1.2CYAug 11, 2020
Data Privacy in IoT Equipped Future Smart HomesAthar Khodabakhsh, Sule Yildirim Yayilgan
Smart devices are becoming inseparable from daily lives and are improving fast for providing intelligent services and remote monitoring and control. In order to provide personalized and customized services more personal data collection is required. Consequently, intelligent services are becoming intensely personal and they raise concerns regarding data privacy and security. In this paper data privacy requirements in a smart home environment equipped with "Internet of Things" are described and privacy challenges for data and models are addressed.
3.9CVAug 15, 2018
Ensemble of Convolutional Neural Networks for Dermoscopic Images ClassificationTomáš Majtner, Buda Bajić, Sule Yildirim et al.
In this report, we are presenting our automated prediction system for disease classification within dermoscopic images. The proposed solution is based on deep learning, where we employed transfer learning strategy on VGG16 and GoogLeNet architectures. The key feature of our solution is preprocessing based primarily on image augmentation and colour normalization. The solution was evaluated on Task 3: Lesion Diagnosis of the ISIC 2018: Skin Lesion Analysis Towards Melanoma Detection.
11.1CVJun 5, 2018
Y-Net: A deep Convolutional Neural Network for Polyp DetectionAhmed Mohammed, Sule Yildirim, Ivar Farup et al.
Colorectal polyps are important precursors to colon cancer, the third most common cause of cancer mortality for both men and women. It is a disease where early detection is of crucial importance. Colonoscopy is commonly used for early detection of cancer and precancerous pathology. It is a demanding procedure requiring significant amount of time from specialized physicians and nurses, in addition to a significant miss-rates of polyps by specialists. Automated polyp detection in colonoscopy videos has been demonstrated to be a promising way to handle this problem. {However, polyps detection is a challenging problem due to the availability of limited amount of training data and large appearance variations of polyps. To handle this problem, we propose a novel deep learning method Y-Net that consists of two encoder networks with a decoder network. Our proposed Y-Net method} relies on efficient use of pre-trained and un-trained models with novel sum-skip-concatenation operations. Each of the encoders are trained with encoder specific learning rate along the decoder. Compared with the previous methods employing hand-crafted features or 2-D/3-D convolutional neural network, our approach outperforms state-of-the-art methods for polyp detection with 7.3% F1-score and 13% recall improvement.
1.1CVMay 26, 2016
Low-Cost Scene Modeling using a Density Function Improves Segmentation PerformanceVivek Sharma, Sule Yildirim-Yayilgan, Luc Van Gool
We propose a low cost and effective way to combine a free simulation software and free CAD models for modeling human-object interaction in order to improve human & object segmentation. It is intended for research scenarios related to safe human-robot collaboration (SHRC) and interaction (SHRI) in the industrial domain. The task of human and object modeling has been used for detecting activity, and for inferring and predicting actions, different from those works, we do human and object modeling in order to learn interactions in RGB-D data for improving segmentation. For this purpose, we define a novel density function to model a three dimensional (3D) scene in a virtual environment (VREP). This density function takes into account various possible configurations of human-object and object-object relationships and interactions governed by their affordances. Using this function, we synthesize a large, realistic and highly varied synthetic RGB-D dataset that we use for training. We train a random forest classifier, and the pixelwise predictions obtained is integrated as a unary term in a pairwise conditional random fields (CRF). Our evaluation shows that modeling these interactions improves segmentation performance by ~7\% in mean average precision and recall over state-of-the-art methods that ignore these interactions in real-world data. Our approach is computationally efficient, robust and can run real-time on consumer hardware.