Jinqiang Bai

9papers

631citations

Novelty54%

AI Score27

Ranked #161,710 of 205,806 authors (top 79%)#49,413 in CV (top 84%)

9 Papers

CVAug 21, 2019

A Realistic Face-to-Face Conversation System based on Deep Neural Networks

Zezhou Chen, Zhaoxiang Liu, Huan Hu et al.

To improve the experiences of face-to-face conversation with avatar, this paper presents a novel conversation system. It is composed of two sequence-to-sequence models respectively for listening and speaking and a Generative Adversarial Network (GAN) based realistic avatar synthesizer. The models exploit the facial action and head pose to learn natural human reactions. Based on the models' output, the synthesizer uses the Pixel2Pixel model to generate realistic facial images. To show the improvement of our system, we use a 3D model based avatar driving scheme as a reference. We train and evaluate our neural networks with the data from ESPN shows. Experimental results show that our conversation system can generate natural facial reactions and realistic facial images.

CVAug 19, 2019

Video synthesis of human upper body with realistic face

Zhaoxiang Liu, Huan Hu, Zipeng Wang et al.

This paper presents a generative adversarial learning-based human upper body video synthesis approach to generate an upper body video of target person that is consistent with the body motion, face expression, and pose of the person in source video. We use upper body keypoints, facial action units and poses as intermediate representations between source video and target video. Instead of directly transferring the source video to the target video, we firstly map the source person's facial action units and poses into the target person's facial landmarks, then combine the normalized upper body keypoints and generated facial landmarks with spatio-temporal smoothing to generate the corresponding target video's image. Experimental results demonstrated the effectiveness of our method.

CVMay 6, 2019

Feature Aggregation Network for Video Face Recognition

Zhaoxiang Liu, Huan Hu, Jinqiang Bai et al.

This paper aims to learn a compact representation of a video for video face recognition task. We make the following contributions: first, we propose a meta attention-based aggregation scheme which adaptively and fine-grained weighs the feature along each feature dimension among all frames to form a compact and discriminative representation. It makes the best to exploit the valuable or discriminative part of each frame to promote the performance of face recognition, without discarding or despising low quality frames as usual methods do. Second, we build a feature aggregation network comprised of a feature embedding module and a feature aggregation module. The embedding module is a convolutional neural network used to extract a feature vector from a face image, while the aggregation module consists of cascaded two meta attention blocks which adaptively aggregate the feature vectors into a single fixed-length representation. The network can deal with arbitrary number of frames, and is insensitive to frame order. Third, we validate the performance of proposed aggregation scheme. Experiments on publicly available datasets, such as YouTube face dataset and IJB-A dataset, show the effectiveness of our method, and it achieves competitive performances on both the verification and identification protocols.

CVApr 30, 2019

Facial Pose Estimation by Deep Learning from Label Distributions

Zhaoxiang Liu, Zezhou Chen, Jinqiang Bai et al.

Facial pose estimation has gained a lot of attentions in many practical applications, such as human-robot interaction, gaze estimation and driver monitoring. Meanwhile, end-to-end deep learning-based facial pose estimation is becoming more and more popular. However, facial pose estimation suffers from a key challenge: the lack of sufficient training data for many poses, especially for large poses. Inspired by the observation that the faces under close poses look similar, we reformulate the facial pose estimation as a label distribution learning problem, considering each face image as an example associated with a Gaussian label distribution rather than a single label, and construct a convolutional neural network which is trained with a multi-loss function on AFLW dataset and 300W-LP dataset to predict the facial poses directly from color image. Extensive experiments are conducted on several popular benchmarks, including AFLW2000, BIWI, AFLW and AFW, where our approach shows a significant advantage over other state-of-the-art methods.

CVApr 30, 2019

Wearable Travel Aid for Environment Perception and Navigation of Visually Impaired People

Jinqiang Bai, Zhaoxiang Liu, Yimin Lin et al.

This paper presents a wearable assistive device with the shape of a pair of eyeglasses that allows visually impaired people to navigate safely and quickly in unfamiliar environment, as well as perceive the complicated environment to automatically make decisions on the direction to move. The device uses a consumer Red, Green, Blue and Depth (RGB-D) camera and an Inertial Measurement Unit (IMU) to detect obstacles. As the device leverages the ground height continuity among adjacent image frames, it is able to segment the ground from obstacles accurately and rapidly. Based on the detected ground, the optimal walkable direction is computed and the user is then informed via converted beep sound. Moreover, by utilizing deep learning techniques, the device can semantically categorize the detected obstacles to improve the users' perception of surroundings. It combines a Convolutional Neural Network (CNN) deployed on a smartphone with a depth-image-based object detection to decide what the object type is and where the object is located, and then notifies the user of such information via speech. We evaluated the device's performance with different experiments in which 20 visually impaired people were asked to wear the device and move in an office, and found that they were able to avoid obstacle collisions and find the way in complicated scenarios.

CVApr 30, 2019

Deep Learning Based Robot for Automatically Picking up Garbage on the Grass

Jinqiang Bai, Shiguo Lian, Zhaoxiang Liu et al.

This paper presents a novel garbage pickup robot which operates on the grass. The robot is able to detect the garbage accurately and autonomously by using a deep neural network for garbage recognition. In addition, with the ground segmentation using a deep neural network, a novel navigation strategy is proposed to guide the robot to move around. With the garbage recognition and automatic navigation functions, the robot can clean garbage on the ground in places like parks or schools efficiently and autonomously. Experimental results show that the garbage recognition accuracy can reach as high as 95%, and even without path planning, the navigation strategy can reach almost the same cleaning efficiency with traditional methods. Thus, the proposed robot can serve as a good assistance to relieve dustman's physical labor on garbage cleaning tasks.

CVApr 30, 2019

Virtual-Blind-Road Following Based Wearable Navigation Device for Blind People

Jinqiang Bai, Shiguo Lian, Zhaoxiang Liu et al.

To help the blind people walk to the destination efficiently and safely in indoor environment, a novel wearable navigation device is presented in this paper. The locating, way-finding, route following and obstacle avoiding modules are the essential components in a navigation system, while it remains a challenging task to consider obstacle avoiding during route following, as the indoor environment is complex, changeable and possibly with dynamic objects. To address this issue, we propose a novel scheme which utilizes a dynamic sub-goal selecting strategy to guide the users to the destination and help them bypass obstacles at the same time. This scheme serves as the key component of a complete navigation system deployed on a pair of wearable optical see-through glasses for the ease of use of blind people's daily walks. The proposed navigation device has been tested on a collection of individuals and proved to be effective on indoor navigation tasks. The sensors embedded are of low cost, small volume and easy integration, making it possible for the glasses to be widely used as a wearable consumer device.

CVDec 19, 2018

Deep Global-Relative Networks for End-to-End 6-DoF Visual Localization and Odometry

Yimin Lin, Zhaoxiang Liu, Jianfeng Huang et al.

Although a wide variety of deep neural networks for robust Visual Odometry (VO) can be found in the literature, they are still unable to solve the drift problem in long-term robot navigation. Thus, this paper aims to propose novel deep end-to-end networks for long-term 6-DoF VO task. It mainly fuses relative and global networks based on Recurrent Convolutional Neural Networks (RCNNs) to improve the monocular localization accuracy. Indeed, the relative sub-networks are implemented to smooth the VO trajectory, while global subnetworks are designed to avoid drift problem. All the parameters are jointly optimized using Cross Transformation Constraints (CTC), which represents temporal geometric consistency of the consecutive frames, and Mean Square Error (MSE) between the predicted pose and ground truth. The experimental results on both indoor and outdoor datasets show that our method outperforms other state-of-the-art learning-based VO methods in terms of pose accuracy.

HCSep 27, 2017

Smart Guiding Glasses for Visually Impaired People in Indoor Environment

Jinqiang Bai, Shiguo Lian, Zhaoxiang Liu et al.

To overcome the travelling difficulty for the visually impaired group, this paper presents a novel ETA (Electronic Travel Aids)-smart guiding device in the shape of a pair of eyeglasses for giving these people guidance efficiently and safely. Different from existing works, a novel multi sensor fusion based obstacle avoiding algorithm is proposed, which utilizes both the depth sensor and ultrasonic sensor to solve the problems of detecting small obstacles, and transparent obstacles, e.g. the French door. For totally blind people, three kinds of auditory cues were developed to inform the direction where they can go ahead. Whereas for weak sighted people, visual enhancement which leverages the AR (Augment Reality) technique and integrates the traversable direction is adopted. The prototype consisting of a pair of display glasses and several low cost sensors is developed, and its efficiency and accuracy were tested by a number of users. The experimental results show that the smart guiding glasses can effectively improve the user's travelling experience in complicated indoor environment. Thus it serves as a consumer device for helping the visually impaired people to travel safely.