Antonios Gasteratos

h-index53

15papers

189citations

Novelty42%

AI Score47

Ranked #55,703 of 201,326 authors (top 28%)#20,553 in CV (top 35%)

15 Papers

DCJan 28Code

Meeting SLOs, Slashing Hours: Automated Enterprise LLM Optimization with OptiKIT

Nicholas Santavas, Kareem Eissa, Patrycja Cieplicka et al.

Enterprise LLM deployment faces a critical scalability challenge: organizations must optimize models systematically to scale AI initiatives within constrained compute budgets, yet the specialized expertise required for manual optimization remains a niche and scarce skillset. This challenge is particularly evident in managing GPU utilization across heterogeneous infrastructure while enabling teams with diverse workloads and limited LLM optimization experience to deploy models efficiently. We present OptiKIT, a distributed LLM optimization framework that democratizes model compression and tuning by automating complex optimization workflows for non-expert teams. OptiKIT provides dynamic resource allocation, staged pipeline execution with automatic cleanup, and seamless enterprise integration. In production, it delivers more than 2x GPU throughput improvement while empowering application teams to achieve consistent performance improvements without deep LLM optimization expertise. We share both the platform design and key engineering insights into resource allocation algorithms, pipeline orchestration, and integration patterns that enable large-scale, production-grade democratization of model optimization. Finally, we open-source the system to enable external contributions and broader reproducibility.

LGMar 21, 2023

Dens-PU: PU Learning with Density-Based Positive Labeled Augmentation

Vasileios Sevetlidis, George Pavlidis, Spyridon Mouroutsos et al.

This study proposes a novel approach for solving the PU learning problem based on an anomaly-detection strategy. Latent encodings extracted from positive-labeled data are linearly combined to acquire new samples. These new samples are used as embeddings to increase the density of positive-labeled data and, thus, define a boundary that approximates the positive class. The further a sample is from the boundary the more it is considered as a negative sample. Once a set of negative samples is obtained, the PU learning problem reduces to binary classification. The approach, named Dens-PU due to its reliance on the density of positive-labeled data, was evaluated using benchmark image datasets, and state-of-the-art results were attained.

LGMar 27, 2023

Defect detection using weakly supervised learning

Vasileios Sevetlidis, George Pavlidis, Vasiliki Balaska et al.

In many real-world scenarios, obtaining large amounts of labeled data can be a daunting task. Weakly supervised learning techniques have gained significant attention in recent years as an alternative to traditional supervised learning, as they enable training models using only a limited amount of labeled data. In this paper, the performance of a weakly supervised classifier to its fully supervised counterpart is compared on the task of defect detection. Experiments are conducted on a dataset of images containing defects, and evaluate the two classifiers based on their accuracy, precision, and recall. Our results show that the weakly supervised classifier achieves comparable performance to the supervised classifier, while requiring significantly less labeled data.

LGDec 7, 2025

Angular Regularization for Positive-Unlabeled Learning on the Hypersphere

Vasileios Sevetlidis, George Pavlidis, Antonios Gasteratos

Positive-Unlabeled (PU) learning addresses classification problems where only a subset of positive examples is labeled and the remaining data is unlabeled, making explicit negative supervision unavailable. Existing PU methods often rely on negative-risk estimation or pseudo-labeling, which either require strong distributional assumptions or can collapse in high-dimensional settings. We propose AngularPU, a novel PU framework that operates on the unit hypersphere using cosine similarity and angular margin. In our formulation, the positive class is represented by a learnable prototype vector, and classification reduces to thresholding the cosine similarity between an embedding and this prototype-eliminating the need for explicit negative modeling. To counteract the tendency of unlabeled embeddings to cluster near the positive prototype, we introduce an angular regularizer that encourages dispersion of the unlabeled set over the hypersphere, improving separation. We provide theoretical guarantees on the Bayes-optimality of the angular decision rule, consistency of the learned prototype, and the effect of the regularizer on the unlabeled distribution. Experiments on benchmark datasets demonstrate that AngularPU achieves competitive or superior performance compared to state-of-the-art PU methods, particularly in settings with scarce positives and high-dimensional embeddings, while offering geometric interpretability and scalability.

CVJul 20, 2025

Visual Place Recognition for Large-Scale UAV Applications

Ioannis Tsampikos Papapetros, Ioannis Kansizoglou, Antonios Gasteratos

Visual Place Recognition (vPR) plays a crucial role in Unmanned Aerial Vehicle (UAV) navigation, enabling robust localization across diverse environments. Despite significant advancements, aerial vPR faces unique challenges due to the limited availability of large-scale, high-altitude datasets, which limits model generalization, along with the inherent rotational ambiguity in UAV imagery. To address these challenges, we introduce LASED, a large-scale aerial dataset with approximately one million images, systematically sampled from 170,000 unique locations throughout Estonia over a decade, offering extensive geographic and temporal diversity. Its structured design ensures clear place separation significantly enhancing model training for aerial scenarios. Furthermore, we propose the integration of steerable Convolutional Neural Networks (CNNs) to explicitly handle rotational variance, leveraging their inherent rotational equivariance to produce robust, orientation-invariant feature representations. Our extensive benchmarking demonstrates that models trained on LASED achieve significantly higher recall compared to those trained on smaller, less diverse datasets, highlighting the benefits of extensive geographic coverage and temporal diversity. Moreover, steerable CNNs effectively address rotational ambiguity inherent in aerial imagery, consistently outperforming conventional convolutional architectures, achieving on average 12\% recall improvement over the best-performing non-steerable network. By combining structured, large-scale datasets with rotation-equivariant neural networks, our approach significantly enhances model robustness and generalization for aerial vPR.

IVJun 16, 2025

UAV Object Detection and Positioning in a Mining Industrial Metaverse with Custom Geo-Referenced Data

Vasiliki Balaska, Ioannis Tsampikos Papapetros, Katerina Maria Oikonomou et al.

The mining sector increasingly adopts digital tools to improve operational efficiency, safety, and data-driven decision-making. One of the key challenges remains the reliable acquisition of high-resolution, geo-referenced spatial information to support core activities such as extraction planning and on-site monitoring. This work presents an integrated system architecture that combines UAV-based sensing, LiDAR terrain modeling, and deep learning-based object detection to generate spatially accurate information for open-pit mining environments. The proposed pipeline includes geo-referencing, 3D reconstruction, and object localization, enabling structured spatial outputs to be integrated into an industrial digital twin platform. Unlike traditional static surveying methods, the system offers higher coverage and automation potential, with modular components suitable for deployment in real-world industrial contexts. While the current implementation operates in post-flight batch mode, it lays the foundation for real-time extensions. The system contributes to the development of AI-enhanced remote sensing in mining by demonstrating a scalable and field-validated geospatial data workflow that supports situational awareness and infrastructure safety.

NEFeb 5, 2025

Spiking Neural Network Feature Discrimination Boosts Modality Fusion

Katerina Maria Oikonomou, Ioannis Kansizoglou, Antonios Gasteratos

Feature discrimination is a crucial aspect of neural network design, as it directly impacts the network's ability to distinguish between classes and generalize across diverse datasets. The accomplishment of achieving high-quality feature representations ensures high intra-class separability and poses one of the most challenging research directions. While conventional deep neural networks (DNNs) rely on complex transformations and very deep networks to come up with meaningful feature representations, they usually require days of training and consume significant energy amounts. To this end, spiking neural networks (SNNs) offer a promising alternative. SNN's ability to capture temporal and spatial dependencies renders them particularly suitable for complex tasks, where multi-modal data are required. In this paper, we propose a feature discrimination approach for multi-modal learning with SNNs, focusing on audio-visual data. We employ deep spiking residual learning for visual modality processing and a simpler yet efficient spiking network for auditory modality processing. Lastly, we deploy a spiking multilayer perceptron for modality fusion. We present our findings and evaluate our approach against similar works in the field of classification challenges. To the best of our knowledge, this is the first work investigating feature discrimination in SNNs.

CVDec 17, 2024

Future Aspects in Human Action Recognition: Exploring Emerging Techniques and Ethical Influences

Antonios Gasteratos, Stavros N. Moutsis, Konstantinos A. Tsintotas et al.

Visual-based human action recognition can be found in various application fields, e.g., surveillance systems, sports analytics, medical assistive technologies, or human-robot interaction frameworks, and it concerns the identification and classification of individuals' activities within a video. Since actions typically occur over a sequence of consecutive images, it is particularly challenging due to the inclusion of temporal analysis, which introduces an extra layer of complexity. However, although multiple approaches try to handle temporal analysis, there are still difficulties because of their computational cost and lack of adaptability. Therefore, different types of vision data, containing transition information between consecutive images, provided by next-generation hardware sensors will guide the robotics community in tackling the problem of human action recognition. On the other hand, while there is a plethora of still-image datasets, that researchers can adopt to train new artificial intelligence models, videos representing human activities are of limited capabilities, e.g., small and unbalanced datasets or selected without control from multiple sources. To this end, generating new and realistic synthetic videos is possible since labeling is performed throughout the data creation process, while reinforcement learning techniques can permit the avoidance of considerable dataset dependence. At the same time, human factors' involvement raises ethical issues for the research community, as doubts and concerns about new technologies already exist.

CVNov 18, 2024

Exploring Emerging Trends and Research Opportunities in Visual Place Recognition

Antonios Gasteratos, Konstantinos A. Tsintotas, Tobias Fischer et al.

Visual-based recognition, e.g., image classification, object detection, etc., is a long-standing challenge in computer vision and robotics communities. Concerning the roboticists, since the knowledge of the environment is a prerequisite for complex navigation tasks, visual place recognition is vital for most localization implementations or re-localization and loop closure detection pipelines within simultaneous localization and mapping (SLAM). More specifically, it corresponds to the system's ability to identify and match a previously visited location using computer vision tools. Towards developing novel techniques with enhanced accuracy and robustness, while motivated by the success presented in natural language processing methods, researchers have recently turned their attention to vision-language models, which integrate visual and textual data.

LGApr 14, 2021

Do Neural Network Weights account for Classes Centers?

Ioannis Kansizoglou, Loukas Bampis, Antonios Gasteratos

The exploitation of Deep Neural Networks (DNNs) as descriptors in feature learning challenges enjoys apparent popularity over the past few years. The above tendency focuses on the development of effective loss functions that ensure both high feature discrimination among different classes, as well as low geodesic distance between the feature vectors of a given class. The vast majority of the contemporary works rely their formulation on an empirical assumption about the feature space of a network's last hidden layer, claiming that the weight vector of a class accounts for its geometrical center in the studied space. The paper at hand follows a theoretical approach and indicates that the aforementioned hypothesis is not exclusively met. This fact raises stability issues regarding the training procedure of a DNN, as shown in our experimental study. Consequently, a specific symmetry is proposed and studied both analytically and empirically that satisfies the above assumption, addressing the established convergence issues.

CVSep 29, 2020

Fast and Incremental Loop Closure Detection with Deep Features and Proximity Graphs

Shan An, Haogang Zhu, Dong Wei et al.

In recent years, the robotics community has extensively examined methods concerning the place recognition task within the scope of simultaneous localization and mapping applications.This article proposes an appearance-based loop closure detection pipeline named ``FILD++" (Fast and Incremental Loop closure Detection).First, the system is fed by consecutive images and, via passing them twice through a single convolutional neural network, global and local deep features are extracted.Subsequently, a hierarchical navigable small-world graph incrementally constructs a visual database representing the robot's traversed path based on the computed global features.Finally, a query image, grabbed each time step, is set to retrieve similar locations on the traversed route.An image-to-image pairing follows, which exploits local features to evaluate the spatial information. Thus, in the proposed article, we propose a single network for global and local feature extraction in contrast to our previous work (FILD), while an exhaustive search for the verification process is adopted over the generated deep local features avoiding the utilization of hash codes. Exhaustive experiments on eleven publicly available datasets exhibit the system's high performance (achieving the highest recall score on eight of them) and low execution times (22.05 ms on average in New College, which is the largest one containing 52480 images) compared to other state-of-the-art approaches.

CVAug 8, 2020

HASeparator: Hyperplane-Assisted Softmax

Ioannis Kansizoglou, Nicholas Santavas, Loukas Bampis et al.

Efficient feature learning with Convolutional Neural Networks (CNNs) constitutes an increasingly imperative property since several challenging tasks of computer vision tend to require cascade schemes and modalities fusion. Feature learning aims at CNN models capable of extracting embeddings, exhibiting high discrimination among the different classes, as well as intra-class compactness. In this paper, a novel approach is introduced that has separator, which focuses on an effective hyperplane-based segregation of the classes instead of the common class centers separation scheme. Accordingly, an innovatory separator, namely the Hyperplane-Assisted Softmax separator (HASeparator), is proposed that demonstrates superior discrimination capabilities, as evaluated on popular image classification benchmarks.

CVJun 30, 2020

Deep Feature Space: A Geometrical Perspective

Ioannis Kansizoglou, Loukas Bampis, Antonios Gasteratos

One of the most prominent attributes of Neural Networks (NNs) constitutes their capability of learning to extract robust and descriptive features from high dimensional data, like images. Hence, such an ability renders their exploitation as feature extractors particularly frequent in an abundant of modern reasoning systems. Their application scope mainly includes complex cascade tasks, like multi-modal recognition and deep Reinforcement Learning (RL). However, NNs induce implicit biases that are difficult to avoid or to deal with and are not met in traditional image descriptors. Moreover, the lack of knowledge for describing the intra-layer properties -- and thus their general behavior -- restricts the further applicability of the extracted features. With the paper at hand, a novel way of visualizing and understanding the vector space before the NNs' output layer is presented, aiming to enlighten the deep feature vectors' properties under classification tasks. Main attention is paid to the nature of overfitting in the feature space and its adverse effect on further exploitation. We present the findings that can be derived from our model's formulation, and we evaluate them on realistic recognition scenarios, proving its prominence by improving the obtained results.

CVJan 22, 2020

Attention! A Lightweight 2D Hand Pose Estimation Approach

Nicholas Santavas, Ioannis Kansizoglou, Loukas Bampis et al.

Vision based human pose estimation is an non-invasive technology for Human-Computer Interaction (HCI). Direct use of the hand as an input device provides an attractive interaction method, with no need for specialized sensing equipment, such as exoskeletons, gloves etc, but a camera. Traditionally, HCI is employed in various applications spreading in areas including manufacturing, surgery, entertainment industry and architecture, to mention a few. Deployment of vision based human pose estimation algorithms can give a breath of innovation to these applications. In this letter, we present a novel Convolutional Neural Network architecture, reinforced with a Self-Attention module that it can be deployed on an embedded system, due to its lightweight nature, with just 1.9 Million parameters. The source code and qualitative results are publicly available.

RODec 10, 2013

3D Maps Registration and Path Planning for Autonomous Robot Navigation

Konstantinos Charalampous, Ioannis Kostavelis, Dimitrios Chrysostomou et al.

Mobile robots dedicated in security tasks should be capable of clearly perceiving their environment to competently navigate within cluttered areas, so as to accomplish their assigned mission. The paper in hand describes such an autonomous agent designed to deploy competently in hazardous environments equipped with a laser scanner sensor. During the robot's motion, consecutive scans are obtained to produce dense 3D maps of the area. A 3D point cloud registration technique is exploited to merge the successively created maps during the robot's motion followed by an ICP refinement step. The reconstructed 3D area is then top-down projected with great resolution, to be fed in a path planning algorithm suitable to trace obstacle-free trajectories in the explored area. The main characteristic of the path planner is that the robot's embodiment is considered for producing detailed and safe trajectories of $1$ $cm$ resolution. The proposed method has been evaluated with our mobile robot in several outdoor scenarios revealing remarkable performance.