Ömer Nezih Gerek

CV
h-index25
4papers
33citations
Novelty43%
AI Score28

4 Papers

CVJul 30, 2024
PLANesT-3D: A new annotated dataset for segmentation of 3D plant point clouds

Kerem Mertoğlu, Yusuf Şalk, Server Karahan Sarıkaya et al.

Creation of new annotated public datasets is crucial in helping advances in 3D computer vision and machine learning meet their full potential for automatic interpretation of 3D plant models. Despite the proliferation of deep neural network architectures for segmentation and phenotyping of 3D plant models in the last decade, the amount of data, and diversity in terms of species and data acquisition modalities are far from sufficient for evaluation of such tools for their generalization ability. To contribute to closing this gap, we introduce PLANesT-3D; a new annotated dataset of 3D color point clouds of plants. PLANesT-3D is composed of 34 point cloud models representing 34 real plants from three different plant species: \textit{Capsicum annuum}, \textit{Rosa kordana}, and \textit{Ribes rubrum}. Both semantic labels in terms of "leaf" and "stem", and organ instance labels were manually annotated for the full point clouds. PLANesT-3D introduces diversity to existing datasets by adding point clouds of two new species and providing 3D data acquired with the low-cost SfM/MVS technique as opposed to laser scanning or expensive setups. Point clouds reconstructed with SfM/MVS modality exhibit challenges such as missing data, variable density, and illumination variations. As an additional contribution, SP-LSCnet, a novel semantic segmentation method that is a combination of unsupervised superpoint extraction and a 3D point-based deep learning approach is introduced and evaluated on the new dataset. The advantages of SP-LSCnet over other deep learning methods are its modular structure and increased interpretability. Two existing deep neural network architectures, PointNet++ and RoseSegNet, were also tested on the point clouds of PLANesT-3D for semantic segmentation.

CLMay 11, 2025
TSLFormer: A Lightweight Transformer Model for Turkish Sign Language Recognition Using Skeletal Landmarks

Kutay Ertürk, Furkan Altınışık, İrem Sarıaltın et al.

This study presents TSLFormer, a light and robust word-level Turkish Sign Language (TSL) recognition model that treats sign gestures as ordered, string-like language. Instead of using raw RGB or depth videos, our method only works with 3D joint positions - articulation points - extracted using Google's Mediapipe library, which focuses on the hand and torso skeletal locations. This creates efficient input dimensionality reduction while preserving important semantic gesture information. Our approach revisits sign language recognition as sequence-to-sequence translation, inspired by the linguistic nature of sign languages and the success of transformers in natural language processing. Since TSLFormer uses the self-attention mechanism, it effectively captures temporal co-occurrence within gesture sequences and highlights meaningful motion patterns as words unfold. Evaluated on the AUTSL dataset with over 36,000 samples and 227 different words, TSLFormer achieves competitive performance with minimal computational cost. These results show that joint-based input is sufficient for enabling real-time, mobile, and assistive communication systems for hearing-impaired individuals.

CVMay 14, 2025
Super-Resolution Generative Adversarial Networks based Video Enhancement

Kağan Çetin, Hacer Akça, Ömer Nezih Gerek

This study introduces an enhanced approach to video super-resolution by extending ordinary Single-Image Super-Resolution (SISR) Super-Resolution Generative Adversarial Network (SRGAN) structure to handle spatio-temporal data. While SRGAN has proven effective for single-image enhancement, its design does not account for the temporal continuity required in video processing. To address this, a modified framework that incorporates 3D Non-Local Blocks is proposed, which is enabling the model to capture relationships across both spatial and temporal dimensions. An experimental training pipeline is developed, based on patch-wise learning and advanced data degradation techniques, to simulate real-world video conditions and learn from both local and global structures and details. This helps the model generalize better and maintain stability across varying video content while maintaining the general structure besides the pixel-wise correctness. Two model variants-one larger and one more lightweight-are presented to explore the trade-offs between performance and efficiency. The results demonstrate improved temporal coherence, sharper textures, and fewer visual artifacts compared to traditional single-image methods. This work contributes to the development of practical, learning-based solutions for video enhancement tasks, with potential applications in streaming, gaming, and digital restoration.

CVOct 19, 2018
CVABS: Moving Object Segmentation with Common Vector Approach for Videos

Şahin Işık, Kemal Özkan, Ömer Nezih Gerek

Background modelling is a fundamental step for several real-time computer vision applications that requires security systems and monitoring. An accurate background model helps detecting activity of moving objects in the video. In this work, we have developed a new subspace based background modelling algorithm using the concept of Common Vector Approach with Gram-Schmidt orthogonalization. Once the background model that involves the common characteristic of different views corresponding to the same scene is acquired, a smart foreground detection and background updating procedure is applied based on dynamic control parameters. A variety of experiments is conducted on different problem types related to dynamic backgrounds. Several types of metrics are utilized as objective measures and the obtained visual results are judged subjectively. It was observed that the proposed method stands successfully for all problem types reported on CDNet2014 dataset by updating the background frames with a self-learning feedback mechanism.