Alptekin Temizel

h-index19

32papers

1,810citations

Novelty39%

AI Score50

Ranked #19,765 of 194,257 authors (top 10%)#7,151 in CV (top 12%)

32 Papers

3.7CVDec 8, 2022Code

Deep Architectures for Content Moderation and Movie Content Rating

Fatih Cagatay Akyon, Alptekin Temizel

Rating a video based on its content is an important step for classifying video age categories. Movie content rating and TV show rating are the two most common rating systems established by professional committees. However, manually reviewing and evaluating scene/film content by a committee is a tedious work and it becomes increasingly difficult with the ever-growing amount of online video content. As such, a desirable solution is to use computer vision based video content analysis techniques to automate the evaluation process. In this paper, related works are summarized for action recognition, multi-modal learning, movie genre classification, and sensitive content detection in the context of content moderation and movie content rating. The project page is available at https://github.com/fcakyon/content-moderation-deep-learning.

3.9CVOct 21, 2023Code

Adversarial Image Generation by Spatial Transformation in Perceptual Colorspaces

Ayberk Aydin, Alptekin Temizel

Deep neural networks are known to be vulnerable to adversarial perturbations. The amount of these perturbations are generally quantified using $L_p$ metrics, such as $L_0$, $L_2$ and $L_\infty$. However, even when the measured perturbations are small, they tend to be noticeable by human observers since $L_p$ distance metrics are not representative of human perception. On the other hand, humans are less sensitive to changes in colorspace. In addition, pixel shifts in a constrained neighborhood are hard to notice. Motivated by these observations, we propose a method that creates adversarial examples by applying spatial transformations, which creates adversarial examples by changing the pixel locations independently to chrominance channels of perceptual colorspaces such as $YC_{b}C_{r}$ and $CIELAB$, instead of making an additive perturbation or manipulating pixel values directly. In a targeted white-box attack setting, the proposed method is able to obtain competitive fooling rates with very high confidence. The experimental evaluations show that the proposed method has favorable results in terms of approximate perceptual distance between benign and adversarially generated images. The source code is publicly available at https://github.com/ayberkydn/stadv-torch

2.6CVJul 21, 2022

Sequence Models for Drone vs Bird Classification

Fatih Cagatay Akyon, Erdem Akagunduz, Sinan Onur Altinuc et al.

Drone detection has become an essential task in object detection as drone costs have decreased and drone technology has improved. It is, however, difficult to detect distant drones when there is weak contrast, long range, and low visibility. In this work, we propose several sequence classification architectures to reduce the detected false-positive ratio of drone tracks. Moreover, we propose a new drone vs. bird sequence classification dataset to train and evaluate the proposed architectures. 3D CNN, LSTM, and Transformer based sequence classification architectures have been trained on the proposed dataset to show the effectiveness of the proposed idea. As experiments show, using sequence information, bird classification and overall F1 scores can be increased by up to 73% and 35%, respectively. Among all sequence classification models, R(2+1)D-based fully convolutional model yields the best transfer learning and fine-tuning results.

11.3CVSep 17, 2024

TopoMaskV2: Enhanced Instance-Mask-Based Formulation for the Road Topology Problem

M. Esat Kalfaoglu, Halil Ibrahim Ozturk, Ozsel Kilinc et al.

Recently, the centerline has become a popular representation of lanes due to its advantages in solving the road topology problem. To enhance centerline prediction, we have developed a new approach called TopoMask. Unlike previous methods that rely on keypoints or parametric methods, TopoMask utilizes an instance-mask-based formulation coupled with a masked-attention-based transformer architecture. We introduce a quad-direction label representation to enrich the mask instances with flow information and design a corresponding post-processing technique for mask-to-centerline conversion. Additionally, we demonstrate that the instance-mask formulation provides complementary information to parametric Bezier regressions, and fusing both outputs leads to improved detection and topology performance. Moreover, we analyze the shortcomings of the pillar assumption in the Lift Splat technique and adapt a multi-height bin configuration. Experimental results show that TopoMask achieves state-of-the-art performance in the OpenLane-V2 dataset, increasing from 44.1 to 49.4 for Subset-A and 44.7 to 51.8 for Subset-B in the V1.1 OLS baseline.

6.8CVJun 8, 2023

TopoMask: Instance-Mask-Based Formulation for the Road Topology Problem via Transformer-Based Architecture

M. Esat Kalfaoglu, Halil Ibrahim Ozturk, Ozsel Kilinc et al.

Driving scene understanding task involves detecting static elements such as lanes, traffic signs, and traffic lights, and their relationships with each other. To facilitate the development of comprehensive scene understanding solutions using multiple camera views, a new dataset called Road Genome (OpenLane-V2) has been released. This dataset allows for the exploration of complex road connections and situations where lane markings may be absent. Instead of using traditional lane markings, the lanes in this dataset are represented by centerlines, which offer a more suitable representation of lanes and their connections. In this study, we have introduced a new approach called TopoMask for predicting centerlines in road topology. Unlike existing approaches in the literature that rely on keypoints or parametric methods, TopoMask utilizes an instance-mask based formulation with a transformer-based architecture and, in order to enrich the mask instances with flow information, a direction label representation is proposed. TopoMask have ranked 4th in the OpenLane-V2 Score (OLS) and ranked 2nd in the F1 score of centerline prediction in OpenLane Topology Challenge 2023. In comparison to the current state-of-the-art method, TopoNet, the proposed method has achieved similar performance in Frechet-based lane detection and outperformed TopoNet in Chamfer-based lane detection without utilizing its scene graph neural network.

8.9IVNov 10, 2023

Ulcerative Colitis Mayo Endoscopic Scoring Classification with Active Learning and Generative Data Augmentation

Ümit Mert Çağlar, Alperen İnci, Oğuz Hanoğlu et al.

Endoscopic imaging is commonly used to diagnose Ulcerative Colitis (UC) and classify its severity. It has been shown that deep learning based methods are effective in automated analysis of these images and can potentially be used to aid medical doctors. Unleashing the full potential of these methods depends on the availability of large amount of labeled images; however, obtaining and labeling these images are quite challenging. In this paper, we propose a active learning based generative augmentation method. The method involves generating a large number of synthetic samples by training using a small dataset consisting of real endoscopic images. The resulting data pool is narrowed down by using active learning methods to select the most informative samples, which are then used to train a classifier. We demonstrate the effectiveness of our method through experiments on a publicly available endoscopic image dataset. The results show that using synthesized samples in conjunction with active learning leads to improved classification performance compared to using only the original labeled examples and the baseline classification performance of 68.1% increases to 74.5% in terms of Quadratic Weighted Kappa (QWK) Score. Another observation is that, attaining equivalent performance using only real data necessitated three times higher number of images.

6.0IVJun 1

LALE: Lightweight-Transformer Architecture for Land-Cover Estimation

Ümit Mert Çağlar, Alptekin Temizel

Semantic segmentation of remote sensing imagery requires models that capture both global context and local detail under tight computational budgets. Prior work typically optimizes for one of these axes: attention for global context, convolution for local detail, or compactness for efficiency. While hybrid approaches aim to capture both, they require architectural changes and encoder backbones with computational overhead, limiting efficiency and performance. We present LALE (Lightweight-transformer Architecture for Land-cover Estimation), an end-to-end remote sensing image segmentation architecture, that bifurcates its encoder by resolution: lightweight ConvMixer stages handle high-resolution local features, while transformer stages handle low-resolution global context, confining the quadratic cost of self-attention to deep, downsampled feature maps. An all-MLP multi-scale decoder, together with RMSNorm and StarReLU throughout, further reduces compute and parameter count. On the large-scale ARAS400k remote-sensing segmentation benchmark, LALE establishes a strong efficiency-performance trade-off against CNN, transformer, and hybrid baselines. Our smallest variant, (just 1.6M parameters), reaches within 2.6 F1 points of the best baseline (UPerNet) while using 4.5x fewer parameters, 7x less storage, 17x fewer GMACs, and delivering 1.8x higher throughput.

1.5CVMar 2

TopoMaskV3: 3D Mask Head with Dense Offset and Height Predictions for Road Topology Understanding

Muhammet Esat Kalfaoglu, Halil Ibrahim Ozturk, Ozsel Kilinc et al.

Mask-based paradigms for road topology understanding, such as TopoMaskV2, offer a complementary alternative to query-based methods by generating centerlines via a dense rasterized intermediate representation. However, prior work was limited to 2D predictions and suffered from severe discretization artifacts, necessitating fusion with parametric heads. We introduce TopoMaskV3, which advances this pipeline into a robust, standalone 3D predictor via two novel dense prediction heads: a dense offset field for sub-grid discretization correction within the existing BEV resolution, and a dense height map for direct 3D estimation. Beyond the architecture, we are the first to address geographic data leakage in road topology evaluation by introducing (1) geographically distinct splits to prevent memorization and ensure fair generalization, and (2) a long-range (+/-100 m) benchmark. TopoMaskV3 achieves state-of-the-art 28.5 OLS on this geographically disjoint benchmark, surpassing all prior methods. Our analysis shows that the mask representation is more robust to geographic overfitting than Bezier, while LiDAR fusion is most beneficial at long range and exhibits larger relative gains on the overlapping original split, suggesting overlap-induced memorization effects.

5.9CVDec 26, 2023Code

State-of-the-Art in Nudity Classification: A Comparative Analysis

Fatih Cagatay Akyon, Alptekin Temizel

This paper presents a comparative analysis of existing nudity classification techniques for classifying images based on the presence of nudity, with a focus on their application in content moderation. The evaluation focuses on CNN-based models, vision transformer, and popular open-source safety checkers from Stable Diffusion and Large-scale Artificial Intelligence Open Network (LAION). The study identifies the limitations of current evaluation datasets and highlights the need for more diverse and challenging datasets. The paper discusses the potential implications of these findings for developing more accurate and effective image classification systems on online platforms. Overall, the study emphasizes the importance of continually improving image classification models to ensure the safety and well-being of platform users. The project page, including the demonstrations and results is publicly available at https://github.com/fcakyon/content-moderation-deep-learning.

23.6CVFeb 14, 2022Code

Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection

Fatih Cagatay Akyon, Sinan Onur Altinuc, Alptekin Temizel

Detection of small objects and objects far away in the scene is a major challenge in surveillance applications. Such objects are represented by small number of pixels in the image and lack sufficient details, making them difficult to detect using conventional detectors. In this work, an open-source framework called Slicing Aided Hyper Inference (SAHI) is proposed that provides a generic slicing aided inference and fine-tuning pipeline for small object detection. The proposed technique is generic in the sense that it can be applied on top of any available object detector without any fine-tuning. Experimental evaluations, using object detection baselines on the Visdrone and xView aerial object detection datasets show that the proposed inference method can increase object detection AP by 6.8%, 5.1% and 5.3% for FCOS, VFNet and TOOD detectors, respectively. Moreover, the detection accuracy can be further increased with a slicing aided fine-tuning, resulting in a cumulative increase of 12.7%, 13.4% and 14.5% AP in the same order. Proposed technique has been integrated with Detectron2, MMDetection and YOLOv5 models and it is publicly available at https://github.com/obss/sahi.git .

1.6LGNov 11, 2021Code

Automated question generation and question answering from Turkish texts

Fatih Cagatay Akyon, Devrim Cavusoglu, Cemil Cengiz et al.

While exam-style questions are a fundamental educational tool serving a variety of purposes, manual construction of questions is a complex process that requires training, experience and resources. Automatic question generation (QG) techniques can be utilized to satisfy the need for a continuous supply of new questions by streamlining their generation. However, compared to automatic question answering (QA), QG is a more challenging task. In this work, we fine-tune a multilingual T5 (mT5) transformer in a multi-task setting for QA, QG and answer extraction tasks using Turkish QA datasets. To the best of our knowledge, this is the first academic work that performs automated text-to-text question generation from Turkish texts. Experimental evaluations show that the proposed multi-task setting achieves state-of-the-art Turkish question answering and question generation performance on TQuADv1, TQuADv2 datasets and XQuAD Turkish split. The source code and the pre-trained models are available at https://github.com/obss/turkish-question-generation.

1.2CVJul 28, 2020Code

Optimization of XNOR Convolution for Binary Convolutional Neural Networks on GPU

Mete Can Kaya, Alperen İnci, Alptekin Temizel

Binary convolutional networks have lower computational load and lower memory foot-print compared to their full-precision counterparts. So, they are a feasible alternative for the deployment of computer vision applications on limited capacity embedded devices. Once trained on less resource-constrained computational environments, they can be deployed for real-time inference on such devices. In this study, we propose an implementation of binary convolutional network inference on GPU by focusing on optimization of XNOR convolution. Experimental results show that using GPU can provide a speed-up of up to $42.61\times$ with a kernel size of $3\times3$. The implementation is publicly available at https://github.com/metcan/Binary-Convolutional-Neural-Network-Inference-on-GPU

2.9CRJul 21, 2020Code

Bit-level Parallelization of 3DES Encryption on GPU

Kaan Furkan Altınok, Afşin Peker, Alptekin Temizel

Triple DES (3DES) is a standard fundamental encryption algorithm, used in several electronic payment applications and web browsers. In this paper, we propose a parallel implementation of 3DES on GPU. Since 3DES encrypts data with 64-bit blocks, our approach considers each 64-bit block a kernel block and assign a separate thread to process each bit. Algorithm's permutation operations, XOR operations, and S-box operations are done in parallel within these kernel blocks. The implementation benefits from the use of constant and shared memory types to optimize memory access. The results show an average 10.70x speed-up against the baseline multi-threaded CPU implementation. The implementation is publicly available at https://github.com/kaanfurkan35/3DES_GPU

1.2CVJul 13, 2020Code

Accelerating Translational Image Registration for HDR Images on GPU

Kadir Cenk Alpay, Kadir Berkay Aydemir, Alptekin Temizel

High Dynamic Range (HDR) images are generated using multiple exposures of a scene. When a hand-held camera is used to capture a static scene, these images need to be aligned by globally shifting each image in both dimensions. For a fast and robust alignment, the shift amount is commonly calculated using Median Threshold Bitmaps (MTB) and creating an image pyramid. In this study, we optimize these computations using a parallel processing approach utilizing GPU. Experimental evaluation shows that the proposed implementation achieves a speed-up of up to 6.24 times over the baseline multi-threaded CPU implementation on the alignment of one image pair. The source code is available at https://github.com/kadircenk/WardMTBCuda

5.2IVJun 23

Benchmarking the Alignment of Data-Quality Metrics, Human Judgment and Land-Cover Segmentation Performance for Earth Observation

Ümit Mert Çağlar, Alptekin Temizel

Volume and quality of datasets are crucial for deep learning model training, yet they are often constrained by availability and data acquisition costs. Synthetic data augmentation can extend existing datasets with realistic images, and the quality of these images is generally assessed through fidelity metrics such as FID, KID, IS, LPIPS and SSIM that measure structural or distributional similarity. However, such metrics, including the widely used FID, focus on visual fidelity without reflecting downstream utility, and can diverge from human perception under perturbations that are imperceptible to human observers. In this work, we systematically evaluate Earth observation datasets alongside synthetic counterparts generated by deep generative models, comparing automatic metrics against human perception and downstream tasks. Our results reveal a stark misalignment: semantics-preserving perturbations such as rotation drastically alter metric scores while leaving human recognition unaffected, and synthetic samples that score poorly on automatic metrics achieve comparable or higher perceived realism, and can improve downstream performance when combined with real data. By benchmarking semantic segmentation models trained on mixed real-synthetic datasets, we demonstrate that quality metrics rooted in ImageNet-pretrained feature spaces are unreliable indicators for geospatial data. Our findings underscore that automatic quality evaluation of synthetic datasets should be grounded in downstream task performance and human evaluation.

3.5CVJun 18

BELDE: Building a Large-scale Earth-observation Land-cover Dataset for Europe

Ümit Mert Çağlar, Alptekin Temizel

Earth observation imagery plays a critical role in environmental monitoring, urban planning, disaster assessment, and climate analysis. While multi-spectral sensors are increasingly available, true-color (RGB) imagery remains widely used due to the power, cost, and deployment constraints of many satellite and aerial platforms. However, existing land-cover segmentation datasets are often limited in geographic coverage, scale, or public accessibility. To bridge this gap, we introduce BELDE (Building a Large-scale Earth-observation Land-cover Dataset for Europe), a publicly available dataset tailored for RGB-based remote sensing semantic segmentation. Constructed from Sentinel-2 true-color images and ESA WorldCover data annotations, BELDE contains 1,088,385 curated image-segmentation map pairs spanning Europe with 7 land-cover classes at 10 m spatial resolution, making it one of the largest publicly available RGB land-cover segmentation datasets for Earth observation. To facilitate cross-region generalization studies, we additionally introduce BELDE-K (16,607 pairs) covering the Republic of Korea and BELDE-CA-NV (88,155 pairs) covering California and Nevada in the United States. We establish baseline results using multiple semantic segmentation architectures and evaluate both in-domain and cross-domain performance. Models trained on BELDE achieve an F1 score of 83.0% on the European test set, while performance decreases to 66.4% on BELDE-CA-NV and 58.3% on BELDE-K, highlighting the challenges posed by out-of-distribution geographic domain shift. By providing a continental-scale RGB segmentation and evaluation benchmark, BELDE supports the development of robust and transferable Earth observation models. The dataset and benchmark resources will be publicly released.

7.6CVDec 2, 2024

Class Distance Weighted Cross Entropy Loss for Classification of Disease Severity

Gorkem Polat, Ümit Mert Çağlar, Alptekin Temizel

Assessing disease severity with ordinal classes, where each class reflects increasing severity levels, benefits from loss functions designed for this ordinal structure. Traditional categorical loss functions, like Cross-Entropy (CE), often perform suboptimally in these scenarios. To address this, we propose a novel loss function, Class Distance Weighted Cross-Entropy (CDW-CE), which penalizes misclassifications more severely when the predicted and actual classes are farther apart. We evaluated CDW-CE using various deep architectures, comparing its performance against several categorical and ordinal loss functions. To assess the quality of latent representations, we used t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) visualizations, quantified the clustering quality using the Silhouette Score, and compared Class Activation Maps (CAM) generated by models trained with CDW-CE and CE loss. Feedback from domain experts was incorporated to evaluate how well model attention aligns with expert opinion. Our results show that CDW-CE consistently improves performance in ordinal image classification tasks. It achieves higher Silhouette Scores, indicating better class discrimination capability, and its CAM visualizations show a stronger focus on clinically significant regions, as validated by domain experts. Receiver operator characteristics (ROC) curves and the area under the curve (AUC) scores highlight that CDW-CE outperforms other loss functions, including prominent ordinal loss functions from the literature.

9.6CVDec 25, 2024

TopoBDA: Towards Bezier Deformable Attention for Road Topology Understanding

Muhammet Esat Kalfaoglu, Halil Ibrahim Ozturk, Ozsel Kilinc et al.

Understanding road topology is crucial for autonomous driving. This paper introduces TopoBDA (Topology with Bezier Deformable Attention), a novel approach that enhances road topology comprehension by leveraging Bezier Deformable Attention (BDA). TopoBDA processes multi-camera 360-degree imagery to generate Bird's Eye View (BEV) features, which are refined through a transformer decoder employing BDA. BDA utilizes Bezier control points to drive the deformable attention mechanism, improving the detection and representation of elongated and thin polyline structures, such as lane centerlines. Additionally, TopoBDA integrates two auxiliary components: an instance mask formulation loss and a one-to-many set prediction loss strategy, to further refine centerline detection and enhance road topology understanding. Experimental evaluations on the OpenLane-V2 dataset demonstrate that TopoBDA outperforms existing methods, achieving state-of-the-art results in centerline detection and topology reasoning. TopoBDA also achieves the best results on the OpenLane-V1 dataset in 3D lane detection. Further experiments on integrating multi-modal data -- such as LiDAR, radar, and SDMap -- show that multimodal inputs can further enhance performance in road topology understanding.

3.7CVOct 17, 2024

Adaptive Augmentation Policy Optimization with LLM Feedback

Ant Duru, Alptekin Temizel

Data augmentation is a critical component of deep learning pipelines, enhancing model generalization by increasing dataset diversity. Traditional augmentation strategies rely on manually designed transformations, stochastic sampling, or automated search-based approaches. Although automated methods improve performance, they often require extensive computational resources and are specifically designed for certain datasets. In this work, we propose a Large Language Model (LLM)-guided augmentation optimization strategy that refines augmentation policies based on model performance feedback. We propose two approaches: (1) LLM-Guided Augmentation Policy Optimization, where augmentation policies selected by LLM are refined iteratively across training cycles, and (2) Adaptive LLM-Guided Augmentation Policy Optimization, which adjusts policies at each iteration based on performance metrics. This in-training approach eliminates the need for full model retraining before getting LLM feedback, reducing computational costs while increasing performance. Our methodology employs an LLM to dynamically select augmentation transformations based on dataset characteristics, model architecture, and prior training performance. Leveraging LLMs' contextual knowledge, especially in domain-specific tasks like medical imaging, our method selects augmentations tailored to dataset characteristics and model performance. Experiments across domain-specific image classification datasets show consistent accuracy improvements over traditional methods.

3.6CVJul 7, 2025

Colorectal Cancer Tumor Grade Segmentation in Digital Histopathology Images: From Giga to Mini Challenge

Alper Bahcekapili, Duygu Arslan, Umut Ozdemir et al.

Colorectal cancer (CRC) is the third most diagnosed cancer and the second leading cause of cancer-related death worldwide. Accurate histopathological grading of CRC is essential for prognosis and treatment planning but remains a subjective process prone to observer variability and limited by global shortages of trained pathologists. To promote automated and standardized solutions, we organized the ICIP Grand Challenge on Colorectal Cancer Tumor Grading and Segmentation using the publicly available METU CCTGS dataset. The dataset comprises 103 whole-slide images with expert pixel-level annotations for five tissue classes. Participants submitted segmentation masks via Codalab, evaluated using metrics such as macro F-score and mIoU. Among 39 participating teams, six outperformed the Swin Transformer baseline (62.92 F-score). This paper presents an overview of the challenge, dataset, and the top-performing methods

1.8LGFeb 16, 2022Code

Evaluation and Analysis of Different Aggregation and Hyperparameter Selection Methods for Federated Brain Tumor Segmentation

Ece Isik-Polat, Gorkem Polat, Altan Kocyigit et al.

Availability of large, diverse, and multi-national datasets is crucial for the development of effective and clinically applicable AI systems in the medical imaging domain. However, forming a global model by bringing these datasets together at a central location, comes along with various data privacy and ownership problems. To alleviate these problems, several recent studies focus on the federated learning paradigm, a distributed learning approach for decentralized data. Federated learning leverages all the available data without any need for sharing collaborators' data with each other or collecting them on a central server. Studies show that federated learning can provide competitive performance with conventional central training, while having a good generalization capability. In this work, we have investigated several federated learning approaches on the brain tumor segmentation problem. We explore different strategies for faster convergence and better performance which can also work on strong Non-IID cases.

17.4IVFeb 9, 2022Code

Class Distance Weighted Cross-Entropy Loss for Ulcerative Colitis Severity Estimation

Gorkem Polat, Ilkay Ergenc, Haluk Tarik Kani et al.

In scoring systems used to measure the endoscopic activity of ulcerative colitis, such as Mayo endoscopic score or Ulcerative Colitis Endoscopic Index Severity, levels increase with severity of the disease activity. Such relative ranking among the scores makes it an ordinal regression problem. On the other hand, most studies use categorical cross-entropy loss function to train deep learning models, which is not optimal for the ordinal regression problem. In this study, we propose a novel loss function, class distance weighted cross-entropy (CDW-CE), that respects the order of the classes and takes the distance of the classes into account in calculation of the cost. Experimental evaluations show that models trained with CDW-CE outperform the models trained with conventional categorical cross-entropy and other commonly used loss functions which are designed for the ordinal regression problems. In addition, the class activation maps of models trained with CDW-CE loss are more class-discriminative and they are found to be more reasonable by the domain experts.

5.6CVAug 5, 2021

Imperceptible Adversarial Examples by Spatial Chroma-Shift

Ayberk Aydin, Deniz Sen, Berat Tuna Karli et al.

Deep Neural Networks have been shown to be vulnerable to various kinds of adversarial perturbations. In addition to widely studied additive noise based perturbations, adversarial examples can also be created by applying a per pixel spatial drift on input images. While spatial transformation based adversarial examples look more natural to human observers due to absence of additive noise, they still possess visible distortions caused by spatial transformations. Since the human vision is more sensitive to the distortions in the luminance compared to those in chrominance channels, which is one of the main ideas behind the lossy visual multimedia compression standards, we propose a spatial transformation based perturbation method to create adversarial examples by only modifying the color components of an input image. While having competitive fooling rates on CIFAR-10 and NIPS2017 Adversarial Learning Challenge datasets, examples created with the proposed method have better scores with regards to various perceptual quality metrics. Human visual perception studies validate that the examples are more natural looking and often indistinguishable from their original counterparts.

1.2CVDec 9, 2020

Generative Data Augmentation for Vehicle Detection in Aerial Images

Hilmi Kumdakcı, Cihan Öngün, Alptekin Temizel

Scarcity of training data is one of the prominent problems for deep networks which require large amounts data. Data augmentation is a widely used method to increase the number of training samples and their variations. In this paper, we focus on improving vehicle detection performance in aerial images and propose a generative augmentation method which does not need any extra supervision than the bounding box annotations of the vehicle objects in the training dataset. The proposed method increases the performance of vehicle detection by allowing detectors to be trained with higher number of instances, especially when there are limited number of training instances. The proposed method is generic in the sense that it can be integrated with different generators. The experiments show that the method increases the Average Precision by up to 25.2% and 25.7% when integrated with Pluralistic and DeepFill respectively.

19.6CVOct 12, 2020

Deep learning for detection and segmentation of artefact and disease instances in gastrointestinal endoscopy

Sharib Ali, Mariia Dmitrieva, Noha Ghatwary et al.

The Endoscopy Computer Vision Challenge (EndoCV) is a crowd-sourcing initiative to address eminent problems in developing reliable computer aided detection and diagnosis endoscopy systems and suggest a pathway for clinical translation of technologies. Whilst endoscopy is a widely used diagnostic and treatment tool for hollow-organs, there are several core challenges often faced by endoscopists, mainly: 1) presence of multi-class artefacts that hinder their visual interpretation, and 2) difficulty in identifying subtle precancerous precursors and cancer abnormalities. Artefacts often affect the robustness of deep learning methods applied to the gastrointestinal tract organs as they can be confused with tissue of interest. EndoCV2020 challenges are designed to address research questions in these remits. In this paper, we present a summary of methods developed by the top 17 teams and provide an objective comparison of state-of-the-art methods and methods designed by the participants for two sub-challenges: i) artefact detection and segmentation (EAD2020), and ii) disease detection and segmentation (EDD2020). Multi-center, multi-organ, multi-class, and multi-modal clinical endoscopy datasets were compiled for both EAD2020 and EDD2020 sub-challenges. The out-of-sample generalization ability of detection algorithms was also evaluated. Whilst most teams focused on accuracy improvements, only a few methods hold credibility for clinical usability. The best performing teams provided solutions to tackle class imbalance, and variabilities in size, origin, modality and occurrences by exploring data augmentation, data fusion, and optimal class thresholding techniques.

3.3CVAug 8, 2020Code

LPMNet: Latent Part Modification and Generation for 3D Point Clouds

Cihan Öngün, Alptekin Temizel

In this paper, we focus on latent modification and generation of 3D point cloud object models with respect to their semantic parts. Different to the existing methods which use separate networks for part generation and assembly, we propose a single end-to-end Autoencoder model that can handle generation and modification of both semantic parts, and global shapes. The proposed method supports part exchange between 3D point cloud models and composition by different parts to form new models by directly editing latent representations. This holistic approach does not need part-based training to learn part representations and does not introduce any extra loss besides the standard reconstruction loss. The experiments demonstrate the robustness of the proposed method with different object categories and varying number of points. The method can generate new models by integration of generative models such as GANs and VAEs and can work with unannotated point clouds by integration of a segmentation module.

1.9SDMay 25, 2020

Speaker and Posture Classification using Instantaneous Intraspeech Breathing Features

Atıl İlerialkan, Alptekin Temizel, Hüseyin Hacıhabiboğlu

Acoustic features extracted from speech are widely used in problems such as biometric speaker identification and first-person activity detection. However, the use of speech for such purposes raises privacy issues as the content is accessible to the processing party. In this work, we propose a method for speaker and posture classification using intraspeech breathing sounds. Instantaneous magnitude features are extracted using the Hilbert-Huang transform (HHT) and fed into a CNN-GRU network for classification of recordings from the open intraspeech breathing sound dataset, BreathBase, that we collected for this study. Using intraspeech breathing sounds, 87% speaker classification, and 98% posture classification accuracy were obtained.

1.8CVMar 7, 2019

Attack Type Agnostic Perceptual Enhancement of Adversarial Images

Bilgin Aksoy, Alptekin Temizel

Adversarial images are samples that are intentionally modified to deceive machine learning systems. They are widely used in applications such as CAPTHAs to help distinguish legitimate human users from bots. However, the noise introduced during the adversarial image generation process degrades the perceptual quality and introduces artificial colours; making it also difficult for humans to classify images and recognise objects. In this letter, we propose a method to enhance the perceptual quality of these adversarial images. The proposed method is attack type agnostic and could be used in association with the existing attacks in the literature. Our experiments show that the generated adversarial images have lower Euclidean distance values while maintaining the same adversarial attack performance. Distances are reduced by 5.88% to 41.27% with an average reduction of 22% over the different attack and network types.

5.2CVAug 9, 2018

Paired 3D Model Generation with Conditional Generative Adversarial Networks

Cihan Öngün, Alptekin Temizel

Generative Adversarial Networks (GANs) are shown to be successful at generating new and realistic samples including 3D object models. Conditional GAN, a variant of GANs, allows generating samples in given conditions. However, objects generated for each condition are different and it does not allow generation of the same object in different conditions. In this paper, we first adapt conditional GAN, which is originally designed for 2D image generation, to the problem of generating 3D models in different rotations. We then propose a new approach to guide the network to generate the same 3D sample in different and controllable rotation angles (sample pairs). Unlike previous studies, the proposed method does not require modification of the standard conditional GAN architecture and it can be integrated into the training step of any conditional GAN. Experimental results and visual comparison of 3D models show that the proposed method is successful at generating model pairs in different conditions.

4.6CVJul 2, 2018

Multi-modal Egocentric Activity Recognition using Audio-Visual Features

Mehmet Ali Arabacı, Fatih Özkan, Elif Surer et al.

Egocentric activity recognition in first-person videos has an increasing importance with a variety of applications such as lifelogging, summarization, assisted-living and activity tracking. Existing methods for this task are based on interpretation of various sensor information using pre-determined weights for each feature. In this work, we propose a new framework for egocentric activity recognition problem based on combining audio-visual features with multi-kernel learning (MKL) and multi-kernel boosting (MKBoost). For that purpose, firstly grid optical-flow, virtual-inertia feature, log-covariance, cuboid are extracted from the video. The audio signal is characterized using a "supervector", obtained based on Gaussian mixture modelling of frame-level features, followed by a maximum a-posteriori adaptation. Then, the extracted multi-modal features are adaptively fused by MKL classifiers in which both the feature and kernel selection/weighing and recognition tasks are performed together. The proposed framework was evaluated on a number of egocentric datasets. The results showed that using multi-modal features with MKL outperforms the existing methods.

11.4CVMar 28, 2018

The Effects of JPEG and JPEG2000 Compression on Attacks using Adversarial Examples

Ayse Elvan Aydemir, Alptekin Temizel, Tugba Taskaya Temizel

Adversarial examples are known to have a negative effect on the performance of classifiers which have otherwise good performance on undisturbed images. These examples are generated by adding non-random noise to the testing samples in order to make classifier misclassify the given data. Adversarial attacks use these intentionally generated examples and they pose a security risk to the machine learning based systems. To be immune to such attacks, it is desirable to have a pre-processing mechanism which removes these effects causing misclassification while keeping the content of the image. JPEG and JPEG2000 are well-known image compression techniques which suppress the high-frequency content taking the human visual system into account. JPEG has been also shown to be an effective method for reducing adversarial noise. In this paper, we propose applying JPEG2000 compression as an alternative and systematically compare the classification performance of adversarial images compressed using JPEG and JPEG2000 at different target PSNR values and maximum compression levels. Our experiments show that JPEG2000 is more effective in reducing adversarial noise as it allows higher compression rates with less distortion and it does not introduce blocking artifacts.

4.4CVFeb 22, 2017

Boosted Multiple Kernel Learning for First-Person Activity Recognition

Fatih Ozkan, Mehmet Ali Arabaci, Elif Surer et al.

Activity recognition from first-person (ego-centric) videos has recently gained attention due to the increasing ubiquity of the wearable cameras. There has been a surge of efforts adapting existing feature descriptors and designing new descriptors for the first-person videos. An effective activity recognition system requires selection and use of complementary features and appropriate kernels for each feature. In this study, we propose a data-driven framework for first-person activity recognition which effectively selects and combines features and their respective kernels during the training. Our experimental results show that use of Multiple Kernel Learning (MKL) and Boosted MKL in first-person activity recognition problem exhibits improved results in comparison to the state-of-the-art. In addition, these techniques enable the expansion of the framework with new features in an efficient and convenient way.