Zijian Kuang

CV
5papers
16citations
Novelty41%
AI Score26

5 Papers

CVNov 22, 2023Code
Two-stage Synthetic Supervising and Multi-view Consistency Self-supervising based Animal 3D Reconstruction by Single Image

Zijian Kuang, Lihang Ying, Shi Jin et al.

Pixel-aligned Implicit Function (PIFu) effectively captures subtle variations in body shape within a low-dimensional space through extensive training with human 3D scans, its application to live animals presents formidable challenges due to the difficulty of obtaining animal cooperation for 3D scanning. To address this challenge, we propose the combination of two-stage supervised and self-supervised training to address the challenge of obtaining animal cooperation for 3D scanning. In the first stage, we leverage synthetic animal models for supervised learning. This allows the model to learn from a diverse set of virtual animal instances. In the second stage, we use 2D multi-view consistency as a self-supervised training method. This further enhances the model's ability to reconstruct accurate and realistic 3D shape and texture from largely available single-view images of real animals. The results of our study demonstrate that our approach outperforms state-of-the-art methods in both quantitative and qualitative aspects of bird 3D digitization. The source code is available at https://github.com/kuangzijian/drifu-for-animals.

CVApr 20, 2021Code
Flow-based Video Segmentation for Human Head and Shoulders

Zijian Kuang, Xinran Tie

Video segmentation for the human head and shoulders is essential in creating elegant media for videoconferencing and virtual reality applications. The main challenge is to process high-quality background subtraction in a real-time manner and address the segmentation issues under motion blurs, e.g., shaking the head or waving hands during conference video. To overcome the motion blur problem in video segmentation, we propose a novel flow-based encoder-decoder network (FUNet) that combines both traditional Horn-Schunck optical-flow estimation technique and convolutional neural networks to perform robust real-time video segmentation. We also introduce a video and image segmentation dataset: ConferenceVideoSegmentationDataset. Code and pre-trained models are available on our GitHub repository: \url{https://github.com/kuangzijian/Flow-Based-Video-Matting}.

MMMar 24, 2021
A Survey of Multimedia Technologies and Robust Algorithms

Zijian Kuang, Xinran Tie

Multimedia technologies are now more practical and deployable in real life, and the algorithms are widely used in various researching areas such as deep learning, signal processing, haptics, computer vision, robotics, and medical multimedia processing. This survey provides an overview of multimedia technologies and robust algorithms in multimedia data processing, medical multimedia processing, human facial expression tracking and pose recognition, and multimedia in education and training. This survey will also analyze and propose a future research direction based on the overview of current robust algorithms and multimedia technologies. We want to thank the research and previous work done by the Multimedia Research Centre (MRC), the University of Alberta, which is the inspiration and starting point for future research.

CVDec 12, 2020
Computer Vision and Normalizing Flow-Based Defect Detection

Zijian Kuang, Xinran Tie, Lihang Ying et al.

Visual defect detection is critical to ensure the quality of most products. However, the majority of small and medium-sized manufacturing enterprises still rely on tedious and error-prone human manual inspection. The main reasons include: 1) the existing automated visual defect detection systems require altering production assembly lines, which is time consuming and expensive 2) the existing systems require manually collecting defective samples and labeling them for a comparison-based algorithm or training a machine learning model. This introduces a heavy burden for small and medium-sized manufacturing enterprises as defects do not happen often and are difficult and time-consuming to collect. Furthermore, we cannot exhaustively collect or define all defect types as any new deviation from acceptable products are defects. In this paper, we overcome these challenges and design a three-stage plug-and-play fully automated unsupervised 360-degree defect detection system. In our system, products are freely placed on an unaltered assembly line and receive 360 degree visual inspection with multiple cameras from different angles. As such, the images collected from real-world product assembly lines contain lots of background noise. The products face different angles. The product sizes vary due to the distance to cameras. All these make defect detection much more difficult. Our system use object detection, background subtraction and unsupervised normalizing flow-based defect detection techniques to tackle these difficulties. Experiments show our system can achieve 0.90 AUROC in a real-world non-altered drinkware production assembly line.

CVOct 24, 2020
Improved Actor Relation Graph based Group Activity Recognition

Zijian Kuang, Xinran Tie

Video understanding is to recognize and classify different actions or activities appearing in the video. A lot of previous work, such as video captioning, has shown promising performance in producing general video understanding. However, it is still challenging to generate a fine-grained description of human actions and their interactions using state-of-the-art video captioning techniques. The detailed description of human actions and group activities is essential information, which can be used in real-time CCTV video surveillance, health care, sports video analysis, etc. This study proposes a video understanding method that mainly focused on group activity recognition by learning the pair-wise actor appearance similarity and actor positions. We propose to use Normalized cross-correlation (NCC) and the sum of absolute differences (SAD) to calculate the pair-wise appearance similarity and build the actor relationship graph to allow the graph convolution network to learn how to classify group activities. We also propose to use MobileNet as the backbone to extract features from each video frame. A visualization model is further introduced to visualize each input video frame with predicted bounding boxes on each human object and predict individual action and collective activity.