Jia‐Ching Wang

h-index27

6papers

51citations

Novelty43%

AI Score27

Ranked #155,542 of 194,257 authors (top 80%)#50,522 in CV (top 85%)

6 Papers

1.4CVSep 3, 2022

A Novel Self-Knowledge Distillation Approach with Siamese Representation Learning for Action Recognition

Duc-Quang Vu, Trang Phung, Jia-Ching Wang

Knowledge distillation is an effective transfer of knowledge from a heavy network (teacher) to a small network (student) to boost students' performance. Self-knowledge distillation, the special case of knowledge distillation, has been proposed to remove the large teacher network training process while preserving the student's performance. This paper introduces a novel Self-knowledge distillation approach via Siamese representation learning, which minimizes the difference between two representation vectors of the two different views from a given sample. Our proposed method, SKD-SRL, utilizes both soft label distillation and the similarity of representation vectors. Therefore, SKD-SRL can generate more consistent predictions and representations in various views of the same data point. Our benchmark has been evaluated on various standard datasets. The experimental results have shown that SKD-SRL significantly improves the accuracy compared to existing supervised learning and knowledge distillation methods regardless of the networks.

11.0CVJul 16, 2023

Diffusion to Confusion: Naturalistic Adversarial Patch Generation Based on Diffusion Model for Object Detector

Shuo-Yen Lin, Ernie Chu, Che-Hsien Lin et al.

Many physical adversarial patch generation methods are widely proposed to protect personal privacy from malicious monitoring using object detectors. However, they usually fail to generate satisfactory patch images in terms of both stealthiness and attack performance without making huge efforts on careful hyperparameter tuning. To address this issue, we propose a novel naturalistic adversarial patch generation method based on the diffusion models (DM). Through sampling the optimal image from the DM model pretrained upon natural images, it allows us to stably craft high-quality and naturalistic physical adversarial patches to humans without suffering from serious mode collapse problems as other deep generative models. To the best of our knowledge, we are the first to propose DM-based naturalistic adversarial patch generation for object detectors. With extensive quantitative, qualitative, and subjective experiments, the results demonstrate the effectiveness of the proposed approach to generate better-quality and more naturalistic adversarial patches while achieving acceptable attack performance than other state-of-the-art patch generation methods. We also show various generation trade-offs under different conditions.

2.0CVAug 5, 2024Code

YOWOv3: An Efficient and Generalized Framework for Human Action Detection and Recognition

Duc Manh Nguyen Dang, Viet Hang Duong, Jia Ching Wang et al.

In this paper, we propose a new framework called YOWOv3, which is an improved version of YOWOv2, designed specifically for the task of Human Action Detection and Recognition. This framework is designed to facilitate extensive experimentation with different configurations and supports easy customization of various components within the model, reducing efforts required for understanding and modifying the code. YOWOv3 demonstrates its superior performance compared to YOWOv2 on two widely used datasets for Human Action Detection and Recognition: UCF101-24 and AVAv2.2. Specifically, the predecessor model YOWOv2 achieves an mAP of 85.2% and 20.3% on UCF101-24 and AVAv2.2, respectively, with 109.7M parameters and 53.6 GFLOPs. In contrast, our model - YOWOv3, with only 59.8M parameters and 39.8 GFLOPs, achieves an mAP of 88.33% and 20.31% on UCF101-24 and AVAv2.2, respectively. The results demonstrate that YOWOv3 significantly reduces the number of parameters and GFLOPs while still achieving comparable performance.

4.7CVFeb 20, 2021

Self-Supervised Learning via multi-Transformation Classification for Action Recognition

Duc Quang Vu, Ngan T. H. Le, Jia-Ching Wang

Self-supervised tasks have been utilized to build useful representations that can be used in downstream tasks when the annotation is unavailable. In this paper, we introduce a self-supervised video representation learning method based on the multi-transformation classification to efficiently classify human actions. Self-supervised learning on various transformations not only provides richer contextual information but also enables the visual representation more robust to the transforms. The spatio-temporal representation of the video is learned in a self-supervised manner by classifying seven different transformations i.e. rotation, clip inversion, permutation, split, join transformation, color switch, frame replacement, noise addition. First, seven different video transformations are applied to video clips. Then the 3D convolutional neural networks are utilized to extract features for clips and these features are processed to classify the pseudo-labels. We use the learned models in pretext tasks as the pre-trained models and fine-tune them to recognize human actions in the downstream task. We have conducted the experiments on UCF101 and HMDB51 datasets together with C3D and 3D Resnet-18 as backbone networks. The experimental results have shown that our proposed framework is outperformed other SOTA self-supervised action recognition approaches. The code will be made publicly available.

2.4SDDec 29, 2016

Phase-incorporating Speech Enhancement Based on Complex-valued Gaussian Process Latent Variable Model

Sih-Huei Chen, Yuan-Shan Lee, Jia-Ching Wang

Traditional speech enhancement techniques modify the magnitude of a speech in time-frequency domain, and use the phase of a noisy speech to resynthesize a time domain speech. This work proposes a complex-valued Gaussian process latent variable model (CGPLVM) to enhance directly the complex-valued noisy spectrum, modifying not only the magnitude but also the phase. The main idea that underlies the developed method is the modeling of short-time Fourier transform (STFT) coefficients across the time frames of a speech as a proper complex Gaussian process (GP) with noise added. The proposed method is based on projecting the spectrum into a low-dimensional subspace. The likelihood criterion is used to optimize the hyperparameters of the model. Experiments were carried out on the CHTTL database, which contains the digits zero to nine in Mandarin. Several standard measures are used to demonstrate that the proposed method outperforms baseline methods.

1.1CVDec 8, 2016

Complex Matrix Factorization for Face Recognition

Viet-Hang Duong, Yuan-Shan Lee, Bach-Tung Pham et al.

This work developed novel complex matrix factorization methods for face recognition; the methods were complex matrix factorization (CMF), sparse complex matrix factorization (SpaCMF), and graph complex matrix factorization (GraCMF). After real-valued data are transformed into a complex field, the complex-valued matrix will be decomposed into two matrices of bases and coefficients, which are derived from solutions to an optimization problem in a complex domain. The generated objective function is the real-valued function of the reconstruction error, which produces a parametric description. Factorizing the matrix of complex entries directly transformed the constrained optimization problem into an unconstrained optimization problem. Additionally, a complex vector space with N dimensions can be regarded as a 2N-dimensional real vector space. Accordingly, all real analytic properties can be exploited in the complex field. The ability to exploit these important characteristics motivated the development herein of a simpler framework that can provide better recognition results. The effectiveness of this framework will be clearly elucidated in later sections in this paper.