Shuangjun Liu

12papers

253citations

Novelty51%

AI Score42

Ranked #84,467 of 201,326 authors (top 42%)#28,665 in CV (top 49%)

12 Papers

CVAug 30, 2022

Prior-Aware Synthetic Data to the Rescue: Animal Pose Estimation with Very Limited Real Data

Le Jiang, Shuangjun Liu, Xiangyu Bai et al.

Accurately annotated image datasets are essential components for studying animal behaviors from their poses. Compared to the number of species we know and may exist, the existing labeled pose datasets cover only a small portion of them, while building comprehensive large-scale datasets is prohibitively expensive. Here, we present a very data efficient strategy targeted for pose estimation in quadrupeds that requires only a small amount of real images from the target animal. It is confirmed that fine-tuning a backbone network with pretrained weights on generic image datasets such as ImageNet can mitigate the high demand for target animal pose data and shorten the training time by learning the the prior knowledge of object segmentation and keypoint estimation in advance. However, when faced with serious data scarcity (i.e., $<10^2$ real images), the model performance stays unsatisfactory, particularly for limbs with considerable flexibility and several comparable parts. We therefore introduce a prior-aware synthetic animal data generation pipeline called PASyn to augment the animal pose data essential for robust pose estimation. PASyn generates a probabilistically-valid synthetic pose dataset, SynAP, through training a variational generative model on several animated 3D animal models. In addition, a style transfer strategy is utilized to blend the synthetic animal image into the real backgrounds. We evaluate the improvement made by our approach with three popular backbone networks and test their pose estimation accuracy on publicly available animal pose images as well as collected from real animals in a zoo.

95.3ROMay 4

Latent Bridge: Feature Delta Prediction for Efficient Dual-System Vision-Language-Action Model Inference

Yudong Liu, Yuan Li, Zijia Tang et al.

Dual-system Vision-Language-Action (VLA) models achieve state-of-the-art robotic manipulation but are bottlenecked by the VLM backbone, which must execute at every control step while producing temporally redundant features. We propose Latent Bridge, a lightweight model that predicts VLM output deltas between timesteps, enabling the action head to operate on predicted outputs while the expensive VLM backbone is called only periodically. We instantiate Latent Bridge on two architecturally distinct VLAs: GR00T-N1.6 (feature-space bridge) and π0.5 (KV-cache bridge), demonstrating that the approach generalizes across VLA designs. Our task-agnostic DAgger training pipeline transfers across benchmarks without modification. Across four LIBERO suites, 24 RoboCasa kitchen tasks, and the ALOHA sim transfer-cube task, Latent Bridge achieves 95-100% performance retention while reducing VLM calls by 50-75%, yielding 1.65-1.73x net per-episode speedup.

CVJan 27, 2022

Pressure Eye: In-bed Contact Pressure Estimation via Contact-less Imaging

Shuangjun Liu, Sarah Ostadabbas

Computer vision has achieved great success in interpreting semantic meanings from images, yet estimating underlying (non-visual) physical properties of an object is often limited to their bulk values rather than reconstructing a dense map. In this work, we present our pressure eye (PEye) approach to estimate contact pressure between a human body and the surface she is lying on with high resolution from vision signals directly. PEye approach could ultimately enable the prediction and early detection of pressure ulcers in bed-bound patients, that currently depends on the use of expensive pressure mats. Our PEye network is configured in a dual encoding shared decoding form to fuse visual cues and some relevant physical parameters in order to reconstruct high resolution pressure maps (PMs). We also present a pixel-wise resampling approach based on Naive Bayes assumption to further enhance the PM regression performance. A percentage of correct sensing (PCS) tailored for sensing estimation accuracy evaluation is also proposed which provides another perspective for performance evaluation under varying error tolerances. We tested our approach via a series of extensive experiments using multimodal sensing technologies to collect data from 102 subjects while lying on a bed. The individual's high resolution contact pressure data could be estimated from their RGB or long wavelength infrared (LWIR) images with 91.8% and 91.2% estimation accuracies in $PCS_{efs0.1}$ criteria, superior to state-of-the-art methods in the related image regression/translation tasks.

CVMay 23, 2021

Heuristic Weakly Supervised 3D Human Pose Estimation

Shuangjun Liu, Michael Wan, Sarah Ostadabbas

Monocular 3D human pose estimation from RGB images has attracted significant attention in recent years. However, recent models depend on supervised training with 3D pose ground truth data or known pose priors for their target domains. 3D pose data is typically collected with motion capture devices, severely limiting their applicability. In this paper, we present a heuristic weakly supervised 3D human pose (HW-HuP) solution to estimate 3D poses in when no ground truth 3D pose data is available. HW-HuP learns partial pose priors from 3D human pose datasets and uses easy-to-access observations from the target domain to estimate 3D human pose and shape in an optimization and regression cycle. We employ depth data for weak supervision during training, but not inference. We show that HW-HuP meaningfully improves upon state-of-the-art models in two practical settings where 3D pose data can hardly be obtained: human poses in bed, and infant poses in the wild. Furthermore, we show that HW-HuP retains comparable performance to cutting-edge models on public benchmarks, even when such models train on 3D pose data.

CVMay 23, 2021

Adapted Human Pose: Monocular 3D Human Pose Estimation with Zero Real 3D Pose Data

Shuangjun Liu, Naveen Sehgal, Sarah Ostadabbas

The ultimate goal for an inference model is to be robust and functional in real life applications. However, training vs. test data domain gaps often negatively affect model performance. This issue is especially critical for the monocular 3D human pose estimation problem, in which 3D human data is often collected in a controlled lab setting. In this paper, we focus on alleviating the negative effect of domain shift in both appearance and pose space for 3D human pose estimation by presenting our adapted human pose (AHuP) approach. AHuP is built upon two key components: (1) semantically aware adaptation (SAA) for the cross-domain feature space adaptation, and (2) skeletal pose adaptation (SPA) for the pose space adaptation which takes only limited information from the target domain. By using zero real 3D human pose data, one of our adapted synthetic models shows comparable performance with the SOTA pose estimation models trained with large scale real 3D human datasets. The proposed SPA can be also employed independently as a light-weighted head to improve existing SOTA models in a novel context. A new 3D scan-based synthetic human dataset called ScanAva+ is also going to be publicly released with this work.

CVOct 13, 2020

Invariant Representation Learning for Infant Pose Estimation with Small Data

Xiaofei Huang, Nihang Fu, Shuangjun Liu et al.

Infant motion analysis is a topic with critical importance in early childhood development studies. However, while the applications of human pose estimation have become more and more broad, models trained on large-scale adult pose datasets are barely successful in estimating infant poses due to the significant differences in their body ratio and the versatility of their poses. Moreover, the privacy and security considerations hinder the availability of adequate infant pose data required for training of a robust model from scratch. To address this problem, this paper presents (1) building and publicly releasing a hybrid synthetic and real infant pose (SyRIP) dataset with small yet diverse real infant images as well as generated synthetic infant poses and (2) a multi-stage invariant representation learning strategy that could transfer the knowledge from the adjacent domains of adult poses and synthetic infant images into our fine-tuned domain-adapted infant pose (FiDIP) estimation model. In our ablation study, with identical network structure, models trained on SyRIP dataset show noticeable improvement over the ones trained on the only other public infant pose datasets. Integrated with pose estimation backbone networks with varying complexity, FiDIP performs consistently better than the fine-tuned versions of those models. One of our best infant pose estimation performers on the state-of-the-art DarkPose model shows mean average precision (mAP) of 93.6.

CVAug 20, 2020

Simultaneously-Collected Multimodal Lying Pose Dataset: Towards In-Bed Human Pose Monitoring under Adverse Vision Conditions

Shuangjun Liu, Xiaofei Huang, Nihang Fu et al.

Computer vision (CV) has achieved great success in interpreting semantic meanings from images, yet CV algorithms can be brittle for tasks with adverse vision conditions and the ones suffering from data/label pair limitation. One of this tasks is in-bed human pose estimation, which has significant values in many healthcare applications. In-bed pose monitoring in natural settings could involve complete darkness or full occlusion. Furthermore, the lack of publicly available in-bed pose datasets hinders the use of many successful pose estimation algorithms for this task. In this paper, we introduce our Simultaneously-collected multimodal Lying Pose (SLP) dataset, which includes in-bed pose images from 109 participants captured using multiple imaging modalities including RGB, long wave infrared, depth, and pressure map. We also present a physical hyper parameter tuning strategy for ground truth pose label generation under extreme conditions such as lights off and being fully covered by a sheet/blanket. SLP design is compatible with the mainstream human pose datasets, therefore, the state-of-the-art 2D pose estimation models can be trained effectively with SLP data with promising performance as high as 95% at PCKh@0.5 on a single modality. The pose estimation performance can be further improved by including additional modalities through collaboration.

APP-PHDec 26, 2019

Development of Use-specific High Performance Cyber-Nanomaterial Optical Detectors by Effective Choice of Machine Learning Algorithms

Davoud Hejazi, Shuangjun Liu, Amirreza Farnoosh et al.

Due to their inherent variabilities,nanomaterial-based sensors are challenging to translate into real-world applications,where reliability/reproducibility is key.Recently we showed Bayesian inference can be employed on engineered variability in layered nanomaterial-based optical transmission filters to determine optical wavelengths with high accuracy/precision.In many practical applications the sensing cost/speed and long-term reliability can be equal or more important considerations.Though various machine learning tools are frequently used on sensor/detector networks to address these,nonetheless their effectiveness on nanomaterial-based sensors has not been explored.Here we show the best choice of ML algorithm in a cyber-nanomaterial detector is mainly determined by specific use considerations,e.g.,accuracy, computational cost,speed, and resilience against drifts/ageing effects.When sufficient data/computing resources are provided,highest sensing accuracy can be achieved by the kNN and Bayesian inference algorithms,but but can be computationally expensive for real-time applications.In contrast,artificial neural networks are computationally expensive to train,but provide the fastest result under testing conditions and remain reasonably accurate.When data is limited,SVMs perform well even with small training sets,while other algorithms show considerable reduction in accuracy if data is scarce,hence,setting a lower limit on the size of required training data.We show by tracking/modeling the long-term drifts of the detector performance over large (1year) period,it is possible to improve the predictive accuracy with no need for recalibration.Our research shows for the first time if the ML algorithm is chosen specific to use-case,low-cost solution-processed cyber-nanomaterial detectors can be practically implemented under diverse operational requirements,despite their inherent variabilities.

CVJul 3, 2019

Seeing Under the Cover: A Physics Guided Learning Approach for In-Bed Pose Estimation

Shuangjun Liu, Sarah Ostadabbas

Human in-bed pose estimation has huge practical values in medical and healthcare applications yet still mainly relies on expensive pressure mapping (PM) solutions. In this paper, we introduce our novel physics inspired vision-based approach that addresses the challenging issues associated with the in-bed pose estimation problem including monitoring a fully covered person in complete darkness. We reformulated this problem using our proposed Under the Cover Imaging via Thermal Diffusion (UCITD) method to capture the high resolution pose information of the body even when it is fully covered by using a long wavelength IR technique. We proposed a physical hyperparameter concept through which we achieved high quality groundtruth pose labels in different modalities. A fully annotated in-bed pose dataset called Simultaneously-collected multimodal Lying Pose (SLP) is also formed/released with the same order of magnitude as most existing large-scale human pose datasets to support complex models' training and evaluation. A network trained from scratch on it and tested on two diverse settings, one in a living room and the other in a hospital room showed pose estimation performance of 99.5% and 95.7% in PCK0.2 standard, respectively. Moreover, in a multi-factor comparison with a state-of-the art in-bed pose monitoring solution based on PM, our solution showed significant superiority in all practical aspects by being 60 times cheaper, 300 times smaller, while having higher pose recognition granularity and accuracy.

CVAug 8, 2018

A Semi-Supervised Data Augmentation Approach using 3D Graphical Engines

Shuangjun Liu, Sarah Ostadabbas

Deep learning approaches have been rapidly adopted across a wide range of fields because of their accuracy and flexibility, but require large labeled training datasets. This presents a fundamental problem for applications with limited, expensive, or private data (i.e. small data), such as human pose and behavior estimation/tracking which could be highly personalized. In this paper, we present a semi-supervised data augmentation approach that can synthesize large scale labeled training datasets using 3D graphical engines based on a physically-valid low dimensional pose descriptor. To evaluate the performance of our synthesized datasets in training deep learning-based models, we generated a large synthetic human pose dataset, called ScanAva using 3D scans of only 7 individuals based on our proposed augmentation approach. A state-of-the-art human pose estimation deep learning model then was trained from scratch using our ScanAva dataset and could achieve the pose estimation accuracy of 91.2% at PCK0.5 criteria after applying an efficient domain adaptation on the synthetic images, in which its pose estimation accuracy was comparable to the same model trained on large scale pose data from real humans such as MPII dataset and much higher than the model trained on other synthetic human dataset such as SURREAL.

CVAug 6, 2018

Inner Space Preserving Generative Pose Machine

Shuangjun Liu, Sarah Ostadabbas

Image-based generative methods, such as generative adversarial networks (GANs) have already been able to generate realistic images with much context control, specially when they are conditioned. However, most successful frameworks share a common procedure which performs an image-to-image translation with pose of figures in the image untouched. When the objective is reposing a figure in an image while preserving the rest of the image, the state-of-the-art mainly assumes a single rigid body with simple background and limited pose shift, which can hardly be extended to the images under normal settings. In this paper, we introduce an image "inner space" preserving model that assigns an interpretable low-dimensional pose descriptor (LDPD) to an articulated figure in the image. Figure reposing is then generated by passing the LDPD and the original image through multi-stage augmented hourglass networks in a conditional GAN structure, called inner space preserving generative pose machine (ISP-GPM). We evaluated ISP-GPM on reposing human figures, which are highly articulated with versatile variations. Test of a state-of-the-art pose estimator on our reposed dataset gave an accuracy over 80% on PCK0.5 metric. The results also elucidated that our ISP-GPM is able to preserve the background with high accuracy while reasonably recovering the area blocked by the figure to be reposed.

CVNov 3, 2017

In-Bed Pose Estimation: Deep Learning with Shallow Dataset

Shuangjun Liu, Yu Yin, Sarah Ostadabbas

Although human pose estimation for various computer vision (CV) applications has been studied extensively in the last few decades, yet in-bed pose estimation using camera-based vision methods has been ignored by the CV community because it is assumed to be identical to the general purpose pose estimation methods. However, in-bed pose estimation has its own specialized aspects and comes with specific challenges including the notable differences in lighting conditions throughout a day and also having different pose distribution from the common human surveillance viewpoint. In this paper, we demonstrate that these challenges significantly lessen the effectiveness of existing general purpose pose estimation models. In order to address the lighting variation challenge, infrared selective (IRS) image acquisition technique is proposed to provide uniform quality data under various lighting conditions. In addition, to deal with unconventional pose perspective, a 2-end histogram of oriented gradient (HOG) rectification method is presented. In this work, we explored the idea of employing a pre-trained convolutional neural network (CNN) model trained on large public datasets of general human poses and fine-tuning the model using our own shallow in-bed IRS dataset. We developed an IRS imaging system and collected IRS image data from several realistic life-size mannequins in a simulated hospital room environment. A pre-trained CNN called convolutional pose machine (CPM) was repurposed for in-bed pose estimation by fine-tuning its specific intermediate layers. Using the HOG rectification method, the pose estimation performance of CPM significantly improved by 26.4% in PCK0.1 criteria compared to the model without such rectification.