Nguyen Huu Phong

CV
7papers
80citations
Novelty38%
AI Score26

7 Papers

CVFeb 17, 2023Code
Video Action Recognition Collaborative Learning with Dynamics via PSO-ConvNet Transformer

Nguyen Huu Phong, Bernardete Ribeiro

Recognizing human actions in video sequences, known as Human Action Recognition (HAR), is a challenging task in pattern recognition. While Convolutional Neural Networks (ConvNets) have shown remarkable success in image recognition, they are not always directly applicable to HAR, as temporal features are critical for accurate classification. In this paper, we propose a novel dynamic PSO-ConvNet model for learning actions in videos, building on our recent work in image recognition. Our approach leverages a framework where the weight vector of each neural network represents the position of a particle in phase space, and particles share their current weight vectors and gradient estimates of the Loss function. To extend our approach to video, we integrate ConvNets with state-of-the-art temporal methods such as Transformer and Recurrent Neural Networks. Our experimental results on the UCF-101 dataset demonstrate substantial improvements of up to 9% in accuracy, which confirms the effectiveness of our proposed method. In addition, we conducted experiments on larger and more variety of datasets including Kinetics-400 and HMDB-51 and obtained preference for Collaborative Learning in comparison with Non-Collaborative Learning (Individual Learning). Overall, our dynamic PSO-ConvNet model provides a promising direction for improving HAR by better capturing the spatio-temporal dynamics of human actions in videos. The code is available at https://github.com/leonlha/Video-Action-Recognition-Collaborative-Learning-with-Dynamics-via-PSO-ConvNet-Transformer.

CVMay 20, 2022Code
PSO-Convolutional Neural Networks with Heterogeneous Learning Rate

Nguyen Huu Phong, Augusto Santos, Bernardete Ribeiro

Convolutional Neural Networks (ConvNets or CNNs) have been candidly deployed in the scope of computer vision and related fields. Nevertheless, the dynamics of training of these neural networks lie still elusive: it is hard and computationally expensive to train them. A myriad of architectures and training strategies have been proposed to overcome this challenge and address several problems in image processing such as speech, image and action recognition as well as object detection. In this article, we propose a novel Particle Swarm Optimization (PSO) based training for ConvNets. In such framework, the vector of weights of each ConvNet is typically cast as the position of a particle in phase space whereby PSO collaborative dynamics intertwines with Stochastic Gradient Descent (SGD) in order to boost training performance and generalization. Our approach goes as follows: i) [regular phase] each ConvNet is trained independently via SGD; ii) [collaborative phase] ConvNets share among themselves their current vector of weights (or particle-position) along with their gradient estimates of the Loss function. Distinct step sizes are coined by distinct ConvNets. By properly blending ConvNets with large (possibly random) step-sizes along with more conservative ones, we propose an algorithm with competitive performance with respect to other PSO-based approaches on Cifar-10 and Cifar-100 (accuracy of 98.31% and 87.48%). These accuracy levels are obtained by resorting to only four ConvNets -- such results are expected to scale with the number of collaborative ConvNets accordingly. We make our source codes available for download https://github.com/leonlha/PSO-ConvNet-Dynamics.

CVMay 20, 2022
Action Recognition for American Sign Language

Nguyen Huu Phong, Bernardete Ribeiro

In this research, we present our findings to recognize American Sign Language from series of hand gestures. While most researches in literature focus only on static handshapes, our work target dynamic hand gestures. Since dynamic signs dataset are very few, we collect an initial dataset of 150 videos for 10 signs and an extension of 225 videos for 15 signs. We apply transfer learning models in combination with deep neural networks and background subtraction for videos in different temporal settings. Our primarily results show that we can get an accuracy of $0.86$ and $0.71$ using DenseNet201, LSTM with video sequence of 12 frames accordingly.

CVJul 30, 2020Code
Rethinking Recurrent Neural Networks and Other Improvements for Image Classification

Nguyen Huu Phong, Bernardete Ribeiro

Over the long history of machine learning, which dates back several decades, recurrent neural networks (RNNs) have been used mainly for sequential data and time series and generally with 1D information. Even in some rare studies on 2D images, these networks are used merely to learn and generate data sequentially rather than for image recognition tasks. In this study, we propose integrating an RNN as an additional layer when designing image recognition models. We also develop end-to-end multimodel ensembles that produce expert predictions using several models. In addition, we extend the training strategy so that our model performs comparably to leading models and can even match the state-of-the-art models on several challenging datasets (e.g., SVHN (0.99), Cifar-100 (0.9027) and Cifar-10 (0.9852)). Moreover, our model sets a new record on the Surrey dataset (0.949). The source code of the methods provided in this article is available at https://github.com/leonlha/e2e-3m and http://nguyenhuuphong.me.

CVJul 30, 2020
An Improvement for Capsule Networks using Depthwise Separable Convolution

Nguyen Huu Phong, Bernardete Ribeiro

Capsule Networks face a critical problem in computer vision in the sense that the image background can challenge its performance, although they learn very well on training data. In this work, we propose to improve Capsule Networks' architecture by replacing the Standard Convolution with a Depthwise Separable Convolution. This new design significantly reduces the model's total parameters while increases stability and offers competitive accuracy. In addition, the proposed model on $64\times64$ pixel images outperforms standard models on $32\times32$ and $64\times64$ pixel images. Moreover, we empirically evaluate these models with Deep Learning architectures using state-of-the-art Transfer Learning networks such as Inception V3 and MobileNet V1. The results show that Capsule Networks can perform comparably against Deep Learning models. To the best of our knowledge, we believe that this is the first work on the integration of Depthwise Separable Convolution into Capsule Networks.

LGMar 18, 2019
Advanced Capsule Networks via Context Awareness

Nguyen Huu Phong, Bernardete Ribeiro

Capsule Networks (CN) offer new architectures for Deep Learning (DL) community. Though its effectiveness has been demonstrated in MNIST and smallNORB datasets, the networks still face challenges in other datasets for images with distinct contexts. In this research, we improve the design of CN (Vector version) namely we expand more Pooling layers to filter image backgrounds and increase Reconstruction layers to make better image restoration. Additionally, we perform experiments to compare accuracy and speed of CN versus DL models. In DL models, we utilize Inception V3 and DenseNet V201 for powerful computers besides NASNet, MobileNet V1 and MobileNet V2 for small and embedded devices. We evaluate our models on a fingerspelling alphabet dataset from American Sign Language (ASL). The results show that CNs perform comparably to DL models while dramatically reducing training time. We also make a demonstration and give a link for the purpose of illustration.

LGMar 18, 2019
Offline and Online Deep Learning for Image Recognition

Nguyen Huu Phong, Bernardete Ribeiro

Image recognition using Deep Learning has been evolved for decades though advances in the field through different settings is still a challenge. In this paper, we present our findings in searching for better image classifiers in offline and online environments. We resort to Convolutional Neural Network and its variations of fully connected Multi-layer Perceptron. Though still preliminary, these results are encouraging and may provide a better understanding about the field and directions toward future works.