Kien Nguyen

CV
h-index65
35papers
710citations
Novelty39%
AI Score55

35 Papers

CVOct 12, 2022Code
Deep Learning for Iris Recognition: A Survey

Kien Nguyen, Hugo Proença, Fernando Alonso-Fernandez

In this survey, we provide a comprehensive review of more than 200 papers, technical reports, and GitHub repositories published over the last 10 years on the recent developments of deep learning techniques for iris recognition, covering broad topics on algorithm designs, open-source tools, open challenges, and emerging research. First, we conduct a comprehensive analysis of deep learning techniques developed for two main sub-tasks in iris biometrics: segmentation and recognition. Second, we focus on deep learning techniques for the robustness of iris recognition systems against presentation attacks and via human-machine pairing. Third, we delve deep into deep learning techniques for forensic application, especially in post-mortem iris recognition. Fourth, we review open-source resources and tools in deep learning techniques for iris recognition. Finally, we highlight the technical challenges, emerging research trends, and outlook for the future of deep learning in iris recognition.

CVMar 15, 2023Code
Aerial-Ground Person Re-ID

Huy Nguyen, Kien Nguyen, Sridha Sridharan et al.

Person re-ID matches persons across multiple non-overlapping cameras. Despite the increasing deployment of airborne platforms in surveillance, current existing person re-ID benchmarks' focus is on ground-ground matching and very limited efforts on aerial-aerial matching. We propose a new benchmark dataset - AG-ReID, which performs person re-ID matching in a new setting: across aerial and ground cameras. Our dataset contains 21,983 images of 388 identities and 15 soft attributes for each identity. The data was collected by a UAV flying at altitudes between 15 to 45 meters and a ground-based CCTV camera on a university campus. Our dataset presents a novel elevated-viewpoint challenge for person re-ID due to the significant difference in person appearance across these cameras. We propose an explainable algorithm to guide the person re-ID model's training with soft attributes to address this challenge. Experiments demonstrate the efficacy of our method on the aerial-ground person re-ID task. The dataset will be published and the baseline codes will be open-sourced at https://github.com/huynguyen792/AG-ReID to facilitate research in this area.

56.8CVMay 6
Ilov3Splat: Instance-Level Open-Vocabulary 3D Scene Understanding in Gaussian Splatting

Binh Long Nguyen, Kien Nguyen, Sridha Sridharan et al.

We introduce Ilov3Splat, a novel framework for instance-level open-vocabulary 3D scene understanding built on 3D Gaussian Splatting (3D-GS). Most prior work depends on 2D rendering-based matching or point-level semantic association, which undermines cross-view consistency, lacks coherent instance-level reasoning, and limits precision in downstream 3D tasks. To address these limitations, our method jointly optimizes scene geometry and semantic representations by augmenting Gaussian splats with view-consistent feature fields. Specifically, we leverage multi-resolution hash embedding to efficiently encode language-aligned CLIP features, enabling dense and coherent language grounding in 3D space. We further train an instance feature field using contrastive loss over SAM masks, supporting fine-grained object distinction across views. At inference time, CLIP-encoded queries are matched against the learned features, followed by two-stage 3D clustering to retrieve relevant Gaussian groups. This enables our framework to identify arbitrary objects in 3D scenes based on natural language descriptions, without requiring category supervision or manual annotations. Experiments on standard benchmarks demonstrate that Ilov3Splat outperforms prior open-vocabulary 3D-GS methods in both object selection and instance segmentation, offering a flexible and accurate solution for language-driven 3D scene understanding. Project page: https://csiro-robotics.github.io/Ilov3Splat.

LGSep 5, 2023
A Survey on Physics Informed Reinforcement Learning: Review and Open Problems

Chayan Banerjee, Kien Nguyen, Clinton Fookes et al.

The inclusion of physical information in machine learning frameworks has revolutionized many application areas. This involves enhancing the learning process by incorporating physical constraints and adhering to physical laws. In this work we explore their utility for reinforcement learning applications. We present a thorough review of the literature on incorporating physics information, as known as physics priors, in reinforcement learning approaches, commonly referred to as physics-informed reinforcement learning (PIRL). We introduce a novel taxonomy with the reinforcement learning pipeline as the backbone to classify existing works, compare and contrast them, and derive crucial insights. Existing works are analyzed with regard to the representation/ form of the governing physics modeled for integration, their specific contribution to the typical reinforcement learning architecture, and their connection to the underlying reinforcement learning pipeline stages. We also identify core learning architectures and physics incorporation biases (i.e., observational, inductive and learning) of existing PIRL approaches and use them to further categorize the works for better understanding and adaptation. By providing a comprehensive perspective on the implementation of the physics-informed capability, the taxonomy presents a cohesive approach to PIRL. It identifies the areas where this approach has been applied, as well as the gaps and opportunities that exist. Additionally, the taxonomy sheds light on unresolved issues and challenges, which can guide future research. This nascent field holds great potential for enhancing reinforcement learning algorithms by increasing their physical plausibility, precision, data efficiency, and applicability in real-world scenarios.

IVJul 13, 2024Code
Size and Smoothness Aware Adaptive Focal Loss for Small Tumor Segmentation

Md Rakibul Islam, Riad Hassan, Abdullah Nazib et al.

Deep learning has achieved remarkable accuracy in medical image segmentation, particularly for larger structures with well-defined boundaries. However, its effectiveness can be challenged by factors such as irregular object shapes and edges, non-smooth surfaces, small target areas, etc. which complicate the ability of networks to grasp the intricate and diverse nature of anatomical regions. In response to these challenges, we propose an Adaptive Focal Loss (A-FL) that takes both object boundary smoothness and size into account, with the goal to improve segmentation performance in intricate anatomical regions. The proposed A-FL dynamically adjusts itself based on an object's surface smoothness, size, and the class balancing parameter based on the ratio of targeted area and background. We evaluated the performance of the A-FL on the PICAI 2022 and BraTS 2018 datasets. In the PICAI 2022 dataset, the A-FL achieved an Intersection over Union (IoU) score of 0.696 and a Dice Similarity Coefficient (DSC) of 0.769, outperforming the regular Focal Loss (FL) by 5.5% and 5.4% respectively. It also surpassed the best baseline by 2.0% and 1.2%. In the BraTS 2018 dataset, A-FL achieved an IoU score of 0.883 and a DSC score of 0.931. Our ablation experiments also show that the proposed A-FL surpasses conventional losses (this includes Dice Loss, Focal Loss, and their hybrid variants) by large margin in IoU, DSC, and other metrics. The code is available at https://github.com/rakibuliuict/AFL-CIBM.git.

CVJul 7, 2023
General-Purpose Multimodal Transformer meets Remote Sensing Semantic Segmentation

Nhi Kieu, Kien Nguyen, Sridha Sridharan et al.

The advent of high-resolution multispectral/hyperspectral sensors, LiDAR DSM (Digital Surface Model) information and many others has provided us with an unprecedented wealth of data for Earth Observation. Multimodal AI seeks to exploit those complementary data sources, particularly for complex tasks like semantic segmentation. While specialized architectures have been developed, they are highly complicated via significant effort in model design, and require considerable re-engineering whenever a new modality emerges. Recent trends in general-purpose multimodal networks have shown great potential to achieve state-of-the-art performance across multiple multimodal tasks with one unified architecture. In this work, we investigate the performance of PerceiverIO, one in the general-purpose multimodal family, in the remote sensing semantic segmentation domain. Our experiments reveal that this ostensibly universal network struggles with object scale variation in remote sensing images and fails to detect the presence of cars from a top-down view. To address these issues, even with extreme class imbalance issues, we propose a spatial and volumetric learning component. Specifically, we design a UNet-inspired module that employs 3D convolution to encode vital local information and learn cross-modal features simultaneously, while reducing network computational burden via the cross-attention mechanism of PerceiverIO. The effectiveness of the proposed component is validated through extensive experiments comparing it with other methods such as 2D convolution, and dual local module (\ie the combination of Conv2D 1x1 and Conv2D 3x3 inspired by UNetFormer). The proposed method achieves competitive results with specialized architectures like UNetFormer and SwinUNet, showing its potential to minimize network architecture engineering with a minimal compromise on the performance.

CVJan 20
DIS2: Disentanglement Meets Distillation with Classwise Attention for Robust Remote Sensing Segmentation under Missing Modalities

Nhi Kieu, Kien Nguyen, Arnold Wiliem et al.

The efficacy of multimodal learning in remote sensing (RS) is severely undermined by missing modalities. The challenge is exacerbated by the RS highly heterogeneous data and huge scale variation. Consequently, paradigms proven effective in other domains often fail when confronted with these unique data characteristics. Conventional disentanglement learning, which relies on significant feature overlap between modalities (modality-invariant), is insufficient for this heterogeneity. Similarly, knowledge distillation becomes an ill-posed mimicry task where a student fails to focus on the necessary compensatory knowledge, leaving the semantic gap unaddressed. Our work is therefore built upon three pillars uniquely designed for RS: (1) principled missing information compensation, (2) class-specific modality contribution, and (3) multi-resolution feature importance. We propose a novel method DIS2, a new paradigm shifting from modality-shared feature dependence and untargeted imitation to active, guided missing features compensation. Its core novelty lies in a reformulated synergy between disentanglement learning and knowledge distillation, termed DLKD. Compensatory features are explicitly captured which, when fused with the features of the available modality, approximate the ideal fused representation of the full-modality case. To address the class-specific challenge, our Classwise Feature Learning Module (CFLM) adaptively learn discriminative evidence for each target depending on signal availability. Both DLKD and CFLM are supported by a hierarchical hybrid fusion (HF) structure using features across resolutions to strengthen prediction. Extensive experiments validate that our proposed approach significantly outperforms state-of-the-art methods across benchmarks.

IVAug 2, 2024
PINNs for Medical Image Analysis: A Survey

Chayan Banerjee, Kien Nguyen, Olivier Salvado et al.

The incorporation of physical information in machine learning frameworks is transforming medical image analysis (MIA). By integrating fundamental knowledge and governing physical laws, these models achieve enhanced robustness and interpretability. In this work, we explore the utility of physics-informed approaches for MIA (PIMIA) tasks such as registration, generation, classification, and reconstruction. We present a systematic literature review of over 80 papers on physics-informed methods dedicated to MIA. We propose a unified taxonomy to investigate what physics knowledge and processes are modelled, how they are represented, and the strategies to incorporate them into MIA models. We delve deep into a wide range of image analysis tasks, from imaging, generation, prediction, inverse imaging (super-resolution and reconstruction), registration, and image analysis (segmentation and classification). For each task, we thoroughly examine and present in a tabular format the central physics-guided operation, the region of interest (with respect to human anatomy), the corresponding imaging modality, the dataset used for model training, the deep network architecture employed, and the primary physical process, equation, or principle utilized. Additionally, we also introduce a novel metric to compare the performance of PIMIA methods across different tasks and datasets. Based on this review, we summarize and distil our perspectives on the challenges, open research questions, and directions for future research. We highlight key open challenges in PIMIA, including selecting suitable physics priors and establishing a standardized benchmarking platform.

CVJan 5, 2024Code
AG-ReID.v2: Bridging Aerial and Ground Views for Person Re-identification

Huy Nguyen, Kien Nguyen, Sridha Sridharan et al.

Aerial-ground person re-identification (Re-ID) presents unique challenges in computer vision, stemming from the distinct differences in viewpoints, poses, and resolutions between high-altitude aerial and ground-based cameras. Existing research predominantly focuses on ground-to-ground matching, with aerial matching less explored due to a dearth of comprehensive datasets. To address this, we introduce AG-ReID.v2, a dataset specifically designed for person Re-ID in mixed aerial and ground scenarios. This dataset comprises 100,502 images of 1,615 unique individuals, each annotated with matching IDs and 15 soft attribute labels. Data were collected from diverse perspectives using a UAV, stationary CCTV, and smart glasses-integrated camera, providing a rich variety of intra-identity variations. Additionally, we have developed an explainable attention network tailored for this dataset. This network features a three-stream architecture that efficiently processes pairwise image distances, emphasizes key top-down features, and adapts to variations in appearance due to altitude differences. Comparative evaluations demonstrate the superiority of our approach over existing baselines. We plan to release the dataset and algorithm source code publicly, aiming to advance research in this specialized field of computer vision. For access, please visit https://github.com/huynguyen792/AG-ReID.v2.

CVMar 11, 2025Code
AG-VPReID: A Challenging Large-Scale Benchmark for Aerial-Ground Video-based Person Re-Identification

Huy Nguyen, Kien Nguyen, Akila Pemasiri et al.

We introduce AG-VPReID, a new large-scale dataset for aerial-ground video-based person re-identification (ReID) that comprises 6,632 subjects, 32,321 tracklets and over 9.6 million frames captured by drones (altitudes ranging from 15-120m), CCTV, and wearable cameras. This dataset offers a real-world benchmark for evaluating the robustness to significant viewpoint changes, scale variations, and resolution differences in cross-platform aerial-ground settings. In addition, to address these challenges, we propose AG-VPReID-Net, an end-to-end framework composed of three complementary streams: (1) an Adapted Temporal-Spatial Stream addressing motion pattern inconsistencies and facilitating temporal feature learning, (2) a Normalized Appearance Stream leveraging physics-informed techniques to tackle resolution and appearance changes, and (3) a Multi-Scale Attention Stream handling scale variations across drone altitudes. We integrate visual-semantic cues from all streams to form a robust, viewpoint-invariant whole-body representation. Extensive experiments demonstrate that AG-VPReID-Net outperforms state-of-the-art approaches on both our new dataset and existing video-based ReID benchmarks, showcasing its effectiveness and generalizability. Nevertheless, the performance gap observed on AG-VPReID across all methods underscores the dataset's challenging nature. The dataset, code and trained models are available at https://github.com/agvpreid25/AG-VPReID-Net.

CVJul 24, 2025Code
AG-VPReID.VIR: Bridging Aerial and Ground Platforms for Video-based Visible-Infrared Person Re-ID

Huy Nguyen, Kien Nguyen, Akila Pemasiri et al.

Person re-identification (Re-ID) across visible and infrared modalities is crucial for 24-hour surveillance systems, but existing datasets primarily focus on ground-level perspectives. While ground-based IR systems offer nighttime capabilities, they suffer from occlusions, limited coverage, and vulnerability to obstructions--problems that aerial perspectives uniquely solve. To address these limitations, we introduce AG-VPReID.VIR, the first aerial-ground cross-modality video-based person Re-ID dataset. This dataset captures 1,837 identities across 4,861 tracklets (124,855 frames) using both UAV-mounted and fixed CCTV cameras in RGB and infrared modalities. AG-VPReID.VIR presents unique challenges including cross-viewpoint variations, modality discrepancies, and temporal dynamics. Additionally, we propose TCC-VPReID, a novel three-stream architecture designed to address the joint challenges of cross-platform and cross-modality person Re-ID. Our approach bridges the domain gaps between aerial-ground perspectives and RGB-IR modalities, through style-robust feature learning, memory-based cross-view adaptation, and intermediary-guided temporal modeling. Experiments show that AG-VPReID.VIR presents distinctive challenges compared to existing datasets, with our TCC-VPReID framework achieving significant performance gains across multiple evaluation protocols. Dataset and code are available at https://github.com/agvpreid25/AG-VPReID.VIR.

69.5CVMar 24
InverFill: One-Step Inversion for Enhanced Few-Step Diffusion Inpainting

Duc Vu, Kien Nguyen, Trong-Tung Nguyen et al.

Recent diffusion-based models achieve photorealism in image inpainting but require many sampling steps, limiting practical use. Few-step text-to-image models offer faster generation, but naively applying them to inpainting yields poor harmonization and artifacts between the background and inpainted region. We trace this cause to random Gaussian noise initialization, which under low function evaluations causes semantic misalignment and reduced fidelity. To overcome this, we propose InverFill, a one-step inversion method tailored for inpainting that injects semantic information from the input masked image into the initial noise, enabling high-fidelity few-step inpainting. Instead of training inpainting models, InverFill leverages few-step text-to-image models in a blended sampling pipeline with semantically aligned noise as input, significantly improving vanilla blended sampling and even matching specialized inpainting models at low NFEs. Moreover, InverFill does not require real-image supervision and only adds minimal inference overhead. Extensive experiments show that InverFill consistently boosts baseline few-step models, improving image quality and text coherence without costly retraining or heavy iterative optimization.

LGMar 24, 2025Code
Mining--Gym: A Configurable RL Benchmarking Environment for Truck Dispatch Scheduling

Chayan Banerjee, Kien Nguyen, Clinton Fookes

Optimizing the mining process -- particularly truck dispatch scheduling -- is a key driver of efficiency in open-pit operations. However, the dynamic and stochastic nature of these environments, with uncertainties such as equipment failures, truck maintenance, and variable haul cycle times, challenges traditional optimization. While Reinforcement Learning (RL) shows strong potential for adaptive decision-making in mining logistics, practical deployment requires evaluation in realistic, customizable simulation environments. The lack of standardized benchmarking hampers fair algorithm comparison, reproducibility, and real-world applicability of RL solutions. To address this, we present Mining-Gym -- a configurable, open-source benchmarking environment for training, testing, and evaluating RL algorithms in mining process optimization. Built on Salabim-based Discrete Event Simulation (DES) and integrated with Gymnasium, Mining-Gym captures mining-specific uncertainties through an event-driven decision-point architecture. It offers a GUI for parameter configuration, data logging, and real-time visualization, supporting reproducible evaluation of RL strategies and heuristic baselines. We validate Mining-Gym by comparing classical heuristics with RL-based scheduling across six scenarios from normal operation to severe equipment failures. Results show it is an effective, reproducible testbed, enabling fair evaluation of adaptive decision-making and demonstrating the strong performance potential of RL-trained schedulers.

CVFeb 9, 2021Code
In Defense of Scene Graphs for Image Captioning

Kien Nguyen, Subarna Tripathi, Bang Du et al.

The mainstream image captioning models rely on Convolutional Neural Network (CNN) image features to generate captions via recurrent models. Recently, image scene graphs have been used to augment captioning models so as to leverage their structural semantics, such as object entities, relationships and attributes. Several studies have noted that the naive use of scene graphs from a black-box scene graph generator harms image captioning performance and that scene graph-based captioning models have to incur the overhead of explicit use of image features to generate decent captions. Addressing these challenges, we propose \textbf{SG2Caps}, a framework that utilizes only the scene graph labels for competitive image captioning performance. The basic idea is to close the semantic gap between the two scene graphs - one derived from the input image and the other from its caption. In order to achieve this, we leverage the spatial location of objects and the Human-Object-Interaction (HOI) labels as an additional HOI graph. SG2Caps outperforms existing scene graph-only captioning models by a large margin, indicating scene graphs as a promising representation for image captioning. Direct utilization of scene graph labels avoids expensive graph convolutions over high-dimensional CNN features resulting in 49% fewer trainable parameters. Our code is available at: https://github.com/Kien085/SG2Caps

CVFeb 22
IDSelect: A RL-Based Cost-Aware Selection Agent for Video-based Multi-Modal Person Recognition

Yuyang Ji, Yixuan Shen, Kien Nguyen et al.

Video-based person recognition achieves robust identification by integrating face, body, and gait. However, current systems waste computational resources by processing all modalities with fixed heavyweight ensembles regardless of input complexity. To address these limitations, we propose IDSelect, a reinforcement learning-based cost-aware selector that chooses one pre-trained model per modality per-sequence to optimize the accuracy-efficiency trade-off. Our key insight is that an input-conditioned selector can discover complementary model choices that surpass fixed ensembles while using substantially fewer resources. IDSelect trains a lightweight agent end-to-end using actor-critic reinforcement learning with budget-aware optimization. The reward balances recognition accuracy with computational cost, while entropy regularization prevents premature convergence. At inference, the policy selects the most probable model per modality and fuses modality-specific similarities for the final score. Extensive experiments on challenging video-based datasets demonstrate IDSelect's superior efficiency: on CCVID, it achieves 95.9% Rank-1 accuracy with 92.4% less computation than strong baselines while improving accuracy by 1.8%; on MEVID, it reduces computation by 41.3% while maintaining competitive performance.

CVNov 21, 2025
Person Recognition in Aerial Surveillance: A Decade Survey

Kien Nguyen, Feng Liu, Clinton Fookes et al.

The rapid emergence of airborne platforms and imaging sensors is enabling new forms of aerial surveillance due to their unprecedented advantages in scale, mobility, deployment, and covert observation capabilities. This paper provides a comprehensive overview of 150+ papers over the last 10 years of human-centric aerial surveillance tasks from a computer vision and machine learning perspective. It aims to provide readers with an in-depth systematic review and technical analysis of the current state of aerial surveillance tasks using drones, UAVs, and other airborne platforms. The object of interest is humans, where human subjects are to be detected, identified, and re-identified. More specifically, for each of these tasks, we first identify unique challenges in performing these tasks in an aerial setting compared to the popular ground-based setting and subsequently compile and analyze aerial datasets publicly available for each task. Most importantly, we delve deep into the approaches in the aerial surveillance literature with a focus on investigating how they presently address aerial challenges and techniques for improvement. We conclude the paper by discussing the gaps and open research questions to inform future research avenues.

LGSep 22, 2025
Physics-Informed Operator Learning for Hemodynamic Modeling

Ryan Chappell, Chayan Banerjee, Kien Nguyen et al.

Accurate modeling of personalized cardiovascular dynamics is crucial for non-invasive monitoring and therapy planning. State-of-the-art physics-informed neural network (PINN) approaches employ deep, multi-branch architectures with adversarial or contrastive objectives to enforce partial differential equation constraints. While effective, these enhancements introduce significant training and implementation complexity, limiting scalability and practical deployment. We investigate physics-informed neural operator learning models as efficient supervisory signals for training simplified architectures through knowledge distillation. Our approach pre-trains a physics-informed DeepONet (PI-DeepONet) on high-fidelity cuffless blood pressure recordings to learn operator mappings from raw wearable waveforms to beat-to-beat pressure signals under embedded physics constraints. This pre-trained operator serves as a frozen supervisor in a lightweight knowledge-distillation pipeline, guiding streamlined base models that eliminate complex adversarial and contrastive learning components while maintaining performance. We characterize the role of physics-informed regularization in operator learning and demonstrate its effectiveness for supervisory guidance. Through extensive experiments, our operator-supervised approach achieves performance parity with complex baselines (correlation: 0.766 vs. 0.770, RMSE: 4.452 vs. 4.501), while dramatically reducing architectural complexity from eight critical hyperparameters to a single regularization coefficient and decreasing training overhead by 4%. Our results demonstrate that operator-based supervision effectively replaces intricate multi-component training strategies, offering a more scalable and interpretable approach to physiological modeling with reduced implementation burden.

CVSep 14, 2025
Filling the Gaps: A Multitask Hybrid Multiscale Generative Framework for Missing Modality in Remote Sensing Semantic Segmentation

Nhi Kieu, Kien Nguyen, Arnold Wiliem et al.

Multimodal learning has shown significant performance boost compared to ordinary unimodal models across various domains. However, in real-world scenarios, multimodal signals are susceptible to missing because of sensor failures and adverse weather conditions, which drastically deteriorates models' operation and performance. Generative models such as AutoEncoder (AE) and Generative Adversarial Network (GAN) are intuitive solutions aiming to reconstruct missing modality from available ones. Yet, their efficacy in remote sensing semantic segmentation remains underexplored. In this paper, we first examine the limitations of existing generative approaches in handling the heterogeneity of multimodal remote sensing data. They inadequately capture semantic context in complex scenes with large intra-class and small inter-class variation. In addition, traditional generative models are susceptible to heavy dependence on the dominant modality, introducing bias that affects model robustness under missing modality conditions. To tackle these limitations, we propose a novel Generative-Enhanced MultiModal learning Network (GEMMNet) with three key components: (1) Hybrid Feature Extractor (HyFEx) to effectively learn modality-specific representations, (2) Hybrid Fusion with Multiscale Awareness (HyFMA) to capture modality-synergistic semantic context across scales and (3) Complementary Loss (CoLoss) scheme to alleviate the inherent bias by encouraging consistency across modalities and tasks. Our method, GEMMNet, outperforms both generative baselines AE, cGAN (conditional GAN), and state-of-the-art non-generative approaches - mmformer and shaspec - on two challenging semantic segmentation remote sensing datasets (Vaihingen and Potsdam). Source code is made available.

CVSep 6, 2025
SuMa: A Subspace Mapping Approach for Robust and Effective Concept Erasure in Text-to-Image Diffusion Models

Kien Nguyen, Anh Tran, Cuong Pham

The rapid growth of text-to-image diffusion models has raised concerns about their potential misuse in generating harmful or unauthorized contents. To address these issues, several Concept Erasure methods have been proposed. However, most of them fail to achieve both robustness, i.e., the ability to robustly remove the target concept., and effectiveness, i.e., maintaining image quality. While few recent techniques successfully achieve these goals for NSFW concepts, none could handle narrow concepts such as copyrighted characters or celebrities. Erasing these narrow concepts is critical in addressing copyright and legal concerns. However, erasing them is challenging due to their close distances to non-target neighboring concepts, requiring finer-grained manipulation. In this paper, we introduce Subspace Mapping (SuMa), a novel method specifically designed to achieve both robustness and effectiveness in easing these narrow concepts. SuMa first derives a target subspace representing the concept to be erased and then neutralizes it by mapping it to a reference subspace that minimizes the distance between the two. This mapping ensures the target concept is robustly erased while preserving image quality. We conduct extensive experiments with SuMa across four tasks: subclass erasure, celebrity erasure, artistic style erasure, and instance erasure and compare the results with current state-of-the-art methods. Our method achieves image quality comparable to approaches focused on effectiveness, while also yielding results that are on par with methods targeting completeness.

CVJun 28, 2025
AG-VPReID 2025: Aerial-Ground Video-based Person Re-identification Challenge Results

Kien Nguyen, Clinton Fookes, Sridha Sridharan et al.

Person re-identification (ReID) across aerial and ground vantage points has become crucial for large-scale surveillance and public safety applications. Although significant progress has been made in ground-only scenarios, bridging the aerial-ground domain gap remains a formidable challenge due to extreme viewpoint differences, scale variations, and occlusions. Building upon the achievements of the AG-ReID 2023 Challenge, this paper introduces the AG-VPReID 2025 Challenge - the first large-scale video-based competition focused on high-altitude (80-120m) aerial-ground ReID. Constructed on the new AG-VPReID dataset with 3,027 identities, over 13,500 tracklets, and approximately 3.7 million frames captured from UAVs, CCTV, and wearable cameras, the challenge featured four international teams. These teams developed solutions ranging from multi-stream architectures to transformer-based temporal reasoning and physics-informed modeling. The leading approach, X-TFCLIP from UAM, attained 72.28% Rank-1 accuracy in the aerial-to-ground ReID setting and 70.77% in the ground-to-aerial ReID setting, surpassing existing baselines while highlighting the dataset's complexity. For additional details, please refer to the official website at https://agvpreid25.github.io.

IVMay 29, 2023
Physics-Informed Computer Vision: A Review and Perspectives

Chayan Banerjee, Kien Nguyen, Clinton Fookes et al.

The incorporation of physical information in machine learning frameworks is opening and transforming many application domains. Here the learning process is augmented through the induction of fundamental knowledge and governing physical laws. In this work, we explore their utility for computer vision tasks in interpreting and understanding visual data. We present a systematic literature review of more than 250 papers on formulation and approaches to computer vision tasks guided by physical laws. We begin by decomposing the popular computer vision pipeline into a taxonomy of stages and investigate approaches to incorporate governing physical equations in each stage. Existing approaches in computer vision tasks are analyzed with regard to what governing physical processes are modeled and formulated, and how they are incorporated, i.e. modification of input data (observation bias), modification of network architectures (inductive bias), and modification of training losses (learning bias). The taxonomy offers a unified view of the application of the physics-informed capability, highlighting where physics-informed learning has been conducted and where the gaps and opportunities are. Finally, we highlight open problems and challenges to inform future research. While still in its early days, the study of physics-informed computer vision has the promise to develop better computer vision models that can improve physical plausibility, accuracy, data efficiency, and generalization in increasingly realistic applications.

CVMay 1, 2023
Physical Adversarial Attacks for Surveillance: A Survey

Kien Nguyen, Tharindu Fernando, Clinton Fookes et al.

Modern automated surveillance techniques are heavily reliant on deep learning methods. Despite the superior performance, these learning systems are inherently vulnerable to adversarial attacks - maliciously crafted inputs that are designed to mislead, or trick, models into making incorrect predictions. An adversary can physically change their appearance by wearing adversarial t-shirts, glasses, or hats or by specific behavior, to potentially avoid various forms of detection, tracking and recognition of surveillance systems; and obtain unauthorized access to secure properties and assets. This poses a severe threat to the security and safety of modern surveillance systems. This paper reviews recent attempts and findings in learning and designing physical adversarial attacks for surveillance applications. In particular, we propose a framework to analyze physical adversarial attacks and provide a comprehensive survey of physical adversarial attacks on four key surveillance tasks: detection, identification, tracking, and action recognition under this framework. Furthermore, we review and analyze strategies to defend against the physical adversarial attacks and the methods for evaluating the strengths of the defense. The insights in this paper present an important step in building resilience within surveillance systems to physical adversarial attacks.

CVJan 9, 2022
The State of Aerial Surveillance: A Survey

Kien Nguyen, Clinton Fookes, Sridha Sridharan et al.

The rapid emergence of airborne platforms and imaging sensors are enabling new forms of aerial surveillance due to their unprecedented advantages in scale, mobility, deployment and covert observation capabilities. This paper provides a comprehensive overview of human-centric aerial surveillance tasks from a computer vision and pattern recognition perspective. It aims to provide readers with an in-depth systematic review and technical analysis of the current state of aerial surveillance tasks using drones, UAVs and other airborne platforms. The main object of interest is humans, where single or multiple subjects are to be detected, identified, tracked, re-identified and have their behavior analyzed. More specifically, for each of these four tasks, we first discuss unique challenges in performing these tasks in an aerial setting compared to a ground-based setting. We then review and analyze the aerial datasets publicly available for each task, and delve deep into the approaches in the aerial literature and investigate how they presently address the aerial challenges. We conclude the paper with discussion on the missing gaps and open research questions to inform future research avenues.

MLOct 7, 2021
Gaussian Process for Trajectories

Kien Nguyen, John Krumm, Cyrus Shahabi

The Gaussian process is a powerful and flexible technique for interpolating spatiotemporal data, especially with its ability to capture complex trends and uncertainty from the input signal. This chapter describes Gaussian processes as an interpolation technique for geospatial trajectories. A Gaussian process models measurements of a trajectory as coming from a multidimensional Gaussian, and it produces for each timestamp a Gaussian distribution as a prediction. We discuss elements that need to be considered when applying Gaussian process to trajectories, common choices for those elements, and provide a concrete example of implementing a Gaussian process.

CVNov 25, 2020
Adversarial Attack on Facial Recognition using Visible Light

Morgan Frearson, Kien Nguyen

The use of deep learning for human identification and object detection is becoming ever more prevalent in the surveillance industry. These systems have been trained to identify human body's or faces with a high degree of accuracy. However, there have been successful attempts to fool these systems with different techniques called adversarial attacks. This paper presents a final report for an adversarial attack using visible light on facial recognition systems. The relevance of this research is to exploit the physical downfalls of deep neural networks. This demonstration of weakness within these systems are in hopes that this research will be used in the future to improve the training models for object recognition. As results were gathered the project objectives were adjusted to fit the outcomes. Because of this the following paper initially explores an adversarial attack using infrared light before readjusting to a visible light attack. A research outline on infrared light and facial recognition are presented within. A detailed analyzation of the current findings and possible future recommendations of the project are presented. The challenges encountered are evaluated and a final solution is delivered. The projects final outcome exhibits the ability to effectively fool recognition systems using light.

CVNov 23, 2020
Complex-valued Iris Recognition Network

Kien Nguyen, Clinton Fookes, Sridha Sridharan et al.

In this work, we design a fully complex-valued neural network for the task of iris recognition. Unlike the problem of general object recognition, where real-valued neural networks can be used to extract pertinent features, iris recognition depends on the extraction of both phase and magnitude information from the input iris texture in order to better represent its biometric content. This necessitates the extraction and processing of phase information that cannot be effectively handled by a real-valued neural network. In this regard, we design a fully complex-valued neural network that can better capture the multi-scale, multi-resolution, and multi-orientation phase and amplitude features of the iris texture. We show a strong correspondence of the proposed complex-valued iris recognition network with Gabor wavelets that are used to generate the classical IrisCode; however, the proposed method enables a new capability of automatic complex-valued feature learning that is tailored for iris recognition. We conduct experiments on three benchmark datasets - ND-CrossSensor-2013, CASIA-Iris-Thousand and UBIRIS.v2 - and show the benefit of the proposed network for the task of iris recognition. We exploit visualization schemes to convey how the complex-valued network, when compared to standard real-valued networks, extracts fundamentally different features from the iris texture.

CRAug 25, 2020
Spatial Privacy Pricing: The Interplay between Privacy, Utility and Price in Geo-Marketplaces

Kien Nguyen, John Krumm, Cyrus Shahabi

A geo-marketplace allows users to be paid for their location data. Users concerned about privacy may want to charge more for data that pinpoints their location accurately, but may charge less for data that is more vague. A buyer would prefer to minimize data costs, but may have to spend more to get the necessary level of accuracy. We call this interplay between privacy, utility, and price \emph{spatial privacy pricing}. We formalize the issues mathematically with an example problem of a buyer deciding whether or not to open a restaurant by purchasing location data to determine if the potential number of customers is sufficient to open. The problem is expressed as a sequential decision making problem, where the buyer first makes a series of decisions about which data to buy and concludes with a decision about opening the restaurant or not. We present two algorithms to solve this problem, including experiments that show they perform better than baselines.

CRApr 20, 2020
A Secure Location-based Alert System with Tunable Privacy-Performance Trade-off

Gabriel Ghinita, Kien Nguyen, Mihai Maruseac et al.

Monitoring location updates from mobile users has important applications in many areas, ranging from public safety and national security to social networks and advertising. However, sensitive information can be derived from movement patterns, thus protecting the privacy of mobile users is a major concern. Users may only be willing to disclose their locations when some condition is met, for instance in proximity of a disaster area or an event of interest. Currently, such functionality can be achieved using searchable encryption. Such cryptographic primitives provide provable guarantees for privacy, and allow decryption only when the location satisfies some predicate. Nevertheless, they rely on expensive pairing-based cryptography (PBC), of which direct application to the domain of location updates leads to impractical solutions. We propose secure and efficient techniques for private processing of location updates that complement the use of PBC and lead to significant gains in performance by reducing the amount of required pairing operations. We implement two optimizations that further improve performance: materialization of results to expensive mathematical operations, and parallelization. We also propose an heuristic that brings down the computational overhead through enlarging an alert zone by a small factor (given as system parameter), therefore trading off a small and controlled amount of privacy for significant performance gains. Extensive experimental results show that the proposed techniques significantly improve performance compared to the baseline, and reduce the searchable encryption overhead to a level that is practical in a computing environment with reasonable resources, such as the cloud.

CRSep 2, 2019
Differentially Private Publication of Location Entropy

Hien To, Kien Nguyen, Cyrus Shahabi

Location entropy (LE) is a popular metric for measuring the popularity of various locations (e.g., points-of-interest). Unlike other metrics computed from only the number of (unique) visits to a location, namely frequency, LE also captures the diversity of the users' visits, and is thus more accurate than other metrics. Current solutions for computing LE require full access to the past visits of users to locations, which poses privacy threats. This paper discusses, for the first time, the problem of perturbing location entropy for a set of locations according to differential privacy. The problem is challenging because removing a single user from the dataset will impact multiple records of the database; i.e., all the visits made by that user to various locations. Towards this end, we first derive non-trivial, tight bounds for both local and global sensitivity of LE, and show that to satisfy $ε$-differential privacy, a large amount of noise must be introduced, rendering the published results useless. Hence, we propose a thresholding technique to limit the number of users' visits, which significantly reduces the perturbation error but introduces an approximation error. To achieve better utility, we extend the technique by adopting two weaker notions of privacy: smooth sensitivity (slightly weaker) and crowd-blending (strictly weaker). Extensive experiments on synthetic and real-world datasets show that our proposed techniques preserve original data distribution without compromising location privacy.

CRSep 1, 2019
A Privacy-Preserving, Accountable and Spam-Resilient Geo-Marketplace

Kien Nguyen, Gabriel Ghinita, Muhammad Naveed et al.

Mobile devices with rich features can record videos, traffic parameters or air quality readings along user trajectories. Although such data may be valuable, users are seldom rewarded for collecting them. Emerging digital marketplaces allow owners to advertise their data to interested buyers. We focus on geo-marketplaces, where buyers search data based on geo-tags. Such marketplaces present significant challenges. First, if owners upload data with revealed geo-tags, they expose themselves to serious privacy risks. Second, owners must be accountable for advertised data, and must not be allowed to subsequently alter geo-tags. Third, such a system may be vulnerable to intensive spam activities, where dishonest owners flood the system with fake advertisements. We propose a geo-marketplace that addresses all these concerns. We employ searchable encryption, digital commitments, and blockchain to protect the location privacy of owners while at the same time incorporating accountability and spam-resilience mechanisms. We implement a prototype with two alternative designs that obtain distinct trade-offs between trust assumptions and performance. Our experiments on real location data show that one can achieve the above design goals with practical performance and reasonable financial overhead.

CVMay 23, 2019
Constrained Design of Deep Iris Networks

Kien Nguyen, Clinton Fookes, Sridha Sridharan

Despite the promise of recent deep neural networks in the iris recognition setting, there are vital properties of the classic IrisCode which are almost unable to be achieved with current deep iris networks: the compactness of model and the small number of computing operations (FLOPs). This paper re-models the iris network design process as a constrained optimization problem which takes model size and computation into account as learning criteria. On one hand, this allows us to fully automate the network design process to search for the best iris network confined to the computation and model compactness constraints. On the other hand, it allows us to investigate the optimality of the classic IrisCode and recent iris networks. It also allows us to learn an optimal iris network and demonstrate state-of-the-art performance with less computation and memory requirements.

CVJun 10, 2018
Semantic Correspondence: A Hierarchical Approach

Akila Pemasiri, Kien Nguyen, Sridha Sridhara et al.

Establishing semantic correspondence across images when the objects in the images have undergone complex deformations remains a challenging task in the field of computer vision. In this paper, we propose a hierarchical method to tackle this problem by first semantically targeting the foreground objects to localize the search space and then looking deeply into multiple levels of the feature representation to search for point-level correspondence. In contrast to existing approaches, which typically penalize large discrepancies, our approach allows for significant displacements, with the aim to accommodate large deformations of the objects in scene. Localizing the search space by semantically matching object-level correspondence, our method robustly handles large deformations of objects. Representing the target region by concatenated hypercolumn features which take into account the hierarchical levels of the surrounding context, helps to clear the ambiguity to further improve the accuracy. By conducting multiple experiments across scenes with non-rigid objects, we validate the proposed approach, and show that it outperforms the state of the art methods for semantic correspondence establishment.

CVJun 9, 2018
Sparse Over-complete Patch Matching

Akila Pemasiri, Kien Nguyen, Sridha Sridharan et al.

Image patch matching, which is the process of identifying corresponding patches across images, has been used as a subroutine for many computer vision and image processing tasks. State -of-the-art patch matching techniques take image patches as input to a convolutional neural network to extract the patch features and evaluate their similarity. Our aim in this paper is to improve on the state of the art patch matching techniques by observing the fact that a sparse-overcomplete representation of an image posses statistical properties of natural visual scenes which can be exploited for patch matching. We propose a new paradigm which encodes image patch details by encoding the patch and subsequently using this sparse representation as input to a neural network to compare the patches. As sparse coding is based on a generative model of natural image patches, it can represent the patch in terms of the fundamental visual components from which it has been composed of, leading to similar sparse codes for patches which are built from similar components. Once the sparse coded features are extracted, we employ a fully-connected neural network, which captures the non-linear relationships between features, for comparison. We have evaluated our approach using the Liberty and Notredame subsets of the popular UBC patch dataset and set a new benchmark outperforming all state-of-the-art patch matching techniques for these datasets.

CVMay 25, 2018
Meta Transfer Learning for Facial Emotion Recognition

Dung Nguyen, Kien Nguyen, Sridha Sridharan et al.

The use of deep learning techniques for automatic facial expression recognition has recently attracted great interest but developed models are still unable to generalize well due to the lack of large emotion datasets for deep learning. To overcome this problem, in this paper, we propose utilizing a novel transfer learning approach relying on PathNet and investigate how knowledge can be accumulated within a given dataset and how the knowledge captured from one emotion dataset can be transferred into another in order to improve the overall performance. To evaluate the robustness of our system, we have conducted various sets of experiments on two emotion datasets: SAVEE and eNTERFACE. The experimental results demonstrate that our proposed system leads to improvement in performance of emotion recognition and performs significantly better than the recent state-of-the-art schemes adopting fine-\ tuning/pre-trained approaches.

LGJan 11, 2018
Deep Classification of Epileptic Signals

David Ahmedt-Aristizabal, Clinton Fookes, Kien Nguyen et al.

Electrophysiological observation plays a major role in epilepsy evaluation. However, human interpretation of brain signals is subjective and prone to misdiagnosis. Automating this process, especially seizure detection relying on scalp-based Electroencephalography (EEG) and intracranial EEG, has been the focus of research over recent decades. Nevertheless, its numerous challenges have inhibited a definitive solution. Inspired by recent advances in deep learning, we propose a new classification approach for EEG time series based on Recurrent Neural Networks (RNNs) via the use of Long-Short Term Memory (LSTM) networks. The proposed deep network effectively learns and models discriminative temporal patterns from EEG sequential data. Especially, the features are automatically discovered from the raw EEG data without any pre-processing step, eliminating humans from laborious feature design task. We also show that, in the epilepsy scenario, simple architectures can achieve competitive performance. Using simple architectures significantly benefits in the practical scenario considering their low computation complexity and reduced requirement for large training datasets. Using a public dataset, a multi-fold cross-validation scheme exhibited an average validation accuracy of 95.54\% and an average AUC of 0.9582 of the ROC curve among all sets defined in the experiment. This work reinforces the benefits of deep learning to be further attended in clinical applications and neuroscientific research.