Zhijun Liang

3papers

46citations

Novelty47%

AI Score29

Ranked #150,779 of 201,326 authors (top 75%)#46,886 in CV (top 80%)

3 Papers

CVAug 5, 2020Code

Pose-based Modular Network for Human-Object Interaction Detection

Zhijun Liang, Junfa Liu, Yisheng Guan et al.

Human-object interaction(HOI) detection is a critical task in scene understanding. The goal is to infer the triplet <subject, predicate, object> in a scene. In this work, we note that the human pose itself as well as the relative spatial information of the human pose with respect to the target object can provide informative cues for HOI detection. We contribute a Pose-based Modular Network (PMN) which explores the absolute pose features and relative spatial pose features to improve HOI detection and is fully compatible with existing networks. Our module consists of a branch that first processes the relative spatial pose features of each joint independently. Another branch updates the absolute pose features via fully connected graph structures. The processed pose features are then fed into an action classifier. To evaluate our proposed method, we combine the module with the state-of-the-art model named VS-GATs and obtain significant improvement on two public benchmarks: V-COCO and HICO-DET, which shows its efficacy and flexibility. Code is available at \url{https://github.com/birlrobotics/PMN}.

LGJan 7, 2020Code

Visual-Semantic Graph Attention Networks for Human-Object Interaction Detection

Zhijun Liang, Juan Rojas, Junfa Liu et al.

In scene understanding, robotics benefit from not only detecting individual scene instances but also from learning their possible interactions. Human-Object Interaction (HOI) Detection infers the action predicate on a <human, predicate, object> triplet. Contextual information has been found critical in inferring interactions. However, most works only use local features from single human-object pair for inference. Few works have studied the disambiguating contribution of subsidiary relations made available via graph networks. Similarly, few have learned to effectively leverage visual cues along with the intrinsic semantic regularities contained in HOIs. We contribute a dual-graph attention network that effectively aggregates contextual visual, spatial, and semantic information dynamically from primary human-object relations as well as subsidiary relations through attention mechanisms for strong disambiguating power. We achieve comparable results on two benchmarks: V-COCO and HICO-DET. Code is available at \url{https://github.com/birlrobotics/vs-gats}.

CVMar 11, 2020

A Graph Attention Spatio-temporal Convolutional Network for 3D Human Pose Estimation in Video

Junfa Liu, Juan Rojas, Zhijun Liang et al.

Spatio-temporal information is key to resolve occlusion and depth ambiguity in 3D pose estimation. Previous methods have focused on either temporal contexts or local-to-global architectures that embed fixed-length spatio-temporal information. To date, there have not been effective proposals to simultaneously and flexibly capture varying spatio-temporal sequences and effectively achieves real-time 3D pose estimation. In this work, we improve the learning of kinematic constraints in the human skeleton: posture, local kinematic connections, and symmetry by modeling local and global spatial information via attention mechanisms. To adapt to single- and multi-frame estimation, the dilated temporal model is employed to process varying skeleton sequences. Also, importantly, we carefully design the interleaving of spatial semantics with temporal dependencies to achieve a synergistic effect. To this end, we propose a simple yet effective graph attention spatio-temporal convolutional network (GAST-Net) that comprises of interleaved temporal convolutional and graph attention blocks. Experiments on two challenging benchmark datasets (Human3.6M and HumanEva-I) and YouTube videos demonstrate that our approach effectively mitigates depth ambiguity and self-occlusion, generalizes to half upper body estimation, and achieves competitive performance on 2D-to-3D video pose estimation. Code, video, and supplementary information is available at: \href{http://www.juanrojas.net/gast/}{http://www.juanrojas.net/gast/}