Huong Ninh

h-index3

4papers

32citations

Novelty50%

AI Score34

Ranked #112,129 of 194,257 authors (top 58%)#37,449 in CV (top 63%)

4 Papers

5.9CVJul 16

SwinAD: Multi-stage feature reconstruction for unsupervised industrial anomaly detection

Huong Ninh, Chien Thai, Mai Xuan Trang et al.

Industrial anomaly detection aims to identify and localize defective regions without relying on exhaustive annotations of all possible defect types. Although recent unsupervised methods have achieved strong performance, most are primarily designed for single-class settings and often struggle in multi-class scenarios, where diverse normal patterns may lead to over-generalization and reduce the discriminative capability between normal and anomalous regions. In this paper, we propose SwinAD, a reconstruction-based framework for multi-class unsupervised anomaly detection that leverages a frozen pretrained Swin Transformer V2 encoder and a feature diversity-preserving reconstruction decoder. The hierarchical encoder provides semantically rich multi-scale features, while stage-wise bottleneck modules with dropout prevent trivial identity mapping and encourage robust reconstruction of normal patterns. To further improve localization, we introduce a feature diversity-preserving reconstruction framework that maintains complementary reconstruction hypotheses instead of relying on a single decoding branch. The discrepancies between encoder features and the two reconstructed features are then aggregated across multiple scales to produce the final anomaly map. Experiments conducted on three industrial anomaly detection benchmarks, including MVTec AD, VisA, and Real-IAD, demonstrate that SwinAD achieves competitive image-level performance and strong pixel-level localization accuracy, with particularly notable improvements in pixel-level AP and 1 on MVTec AD. These results indicate that combining hierarchical Swin features with diverse multi-scale reconstruction substantially improve pixel-level localization in multi-class unsupervised anomaly setting.

2.6CVOct 25, 2022

An Effective Deep Network for Head Pose Estimation without Keypoints

Chien Thai, Viet Tran, Minh Bui et al.

Human head pose estimation is an essential problem in facial analysis in recent years that has a lot of computer vision applications such as gaze estimation, virtual reality, and driver assistance. Because of the importance of the head pose estimation problem, it is necessary to design a compact model to resolve this task in order to reduce the computational cost when deploying on facial analysis-based applications such as large camera surveillance systems, AI cameras while maintaining accuracy. In this work, we propose a lightweight model that effectively addresses the head pose estimation problem. Our approach has two main steps. 1) We first train many teacher models on the synthesis dataset - 300W-LPA to get the head pose pseudo labels. 2) We design an architecture with the ResNet18 backbone and train our proposed model with the ensemble of these pseudo labels via the knowledge distillation process. To evaluate the effectiveness of our model, we use AFLW-2000 and BIWI - two real-world head pose datasets. Experimental results show that our proposed model significantly improves the accuracy in comparison with the state-of-the-art head pose estimation methods. Furthermore, our model has the real-time speed of $\sim$300 FPS when inferring on Tesla V100.

8.4CVSep 12, 2025

Online 3D Multi-Camera Perception through Robust 2D Tracking and Depth-based Late Aggregation

Vu-Minh Le, Thao-Anh Tran, Duc Huy Do et al.

Multi-Target Multi-Camera Tracking (MTMC) is an essential computer vision task for automating large-scale surveillance. With camera calibration and depth information, the targets in the scene can be projected into 3D space, offering unparalleled levels of automatic perception of a 3D environment. However, tracking in the 3D space requires replacing all 2D tracking components from the ground up, which may be infeasible for existing MTMC systems. In this paper, we present an approach for extending any online 2D multi-camera tracking system into 3D space by utilizing depth information to reconstruct a target in point-cloud space, and recovering its 3D box through clustering and yaw refinement following tracking. We also introduced an enhanced online data association mechanism that leverages the target's local ID consistency to assign global IDs across frames. The proposed framework is evaluated on the 2025 AI City Challenge's 3D MTMC dataset, achieving 3rd place on the leaderboard.

3.6CVOct 18, 2025

Enhancing Rotated Object Detection via Anisotropic Gaussian Bounding Box and Bhattacharyya Distance

Chien Thai, Mai Xuan Trang, Huong Ninh et al.

Detecting rotated objects accurately and efficiently is a significant challenge in computer vision, particularly in applications such as aerial imagery, remote sensing, and autonomous driving. Although traditional object detection frameworks are effective for axis-aligned objects, they often underperform in scenarios involving rotated objects due to their limitations in capturing orientation variations. This paper introduces an improved loss function aimed at enhancing detection accuracy and robustness by leveraging the Gaussian bounding box representation and Bhattacharyya distance. In addition, we advocate for the use of an anisotropic Gaussian representation to address the issues associated with isotropic variance in square-like objects. Our proposed method addresses these challenges by incorporating a rotation-invariant loss function that effectively captures the geometric properties of rotated objects. We integrate this proposed loss function into state-of-the-art deep learning-based rotated object detection detectors, and extensive experiments demonstrated significant improvements in mean Average Precision metrics compared to existing methods. The results highlight the potential of our approach to establish new benchmark in rotated object detection, with implications for a wide range of applications requiring precise and reliable object localization irrespective of orientation.