Zheheng Jiang

CV
h-index14
15papers
662citations
Novelty45%
AI Score46

15 Papers

84.0CVJun 3
Disentangled Fine-Grained Prototype Learning for Incomplete Image-Tabular Classification

Feixiang Zhou, Jianyang Xie, Zhuangzhi Gao et al.

The missing-modality problem poses a significant challenge in image-tabular multimodal learning across a wide range of multimedia applications, including product understanding, recommendation systems, and medical diagnosis. This challenge is particularly pronounced when the two modalities are highly heterogeneous, as images and tabular attributes differ substantially in their semantic granularity and data distributions. Existing methods learn modality-invariant representations through disentanglement and alignment over global token-averaged features, capturing only coarse cross-modal consistency and overlooking fine-grained semantic and distributional misalignment, which hampers the exploitation of complementary cues under missing modalities. To address this, we propose DFPL, a novel framework for fine-grained prototype learning. Specifically, Shared-Specific Prototype Modeling (SSPM) extracts compact and diverse shared and modality-specific prototypes, and further performs prototype-level disentanglement to suppress redundant intra-modality correlations. Additionally, we propose a Prototype-guided Fine-grained Alignment (PFA) module that jointly enforces prototype-level distribution matching and prototype-to-class semantic alignment within a unified prototype space, thereby preserving both fine-grained distributional and semantic consistency across modalities. We further introduce a Class-aware Multi-scale Aggregation (CMA) module to adaptively aggregate shared semantics and modality-specific characteristics from global and prototype levels for robust predictions. Extensive experiments on three diverse image-tabular benchmarks demonstrate the superiority of our method compared to the previous approaches under various missing-modality settings. Code will be made publicly available.

CVApr 27, 2023
A Probabilistic Attention Model with Occlusion-aware Texture Regression for 3D Hand Reconstruction from a Single RGB Image

Zheheng Jiang, Hossein Rahmani, Sue Black et al.

Recently, deep learning based approaches have shown promising results in 3D hand reconstruction from a single RGB image. These approaches can be roughly divided into model-based approaches, which are heavily dependent on the model's parameter space, and model-free approaches, which require large numbers of 3D ground truths to reduce depth ambiguity and struggle in weakly-supervised scenarios. To overcome these issues, we propose a novel probabilistic model to achieve the robustness of model-based approaches and reduced dependence on the model's parameter space of model-free approaches. The proposed probabilistic model incorporates a model-based network as a prior-net to estimate the prior probability distribution of joints and vertices. An Attention-based Mesh Vertices Uncertainty Regression (AMVUR) model is proposed to capture dependencies among vertices and the correlation between joints and mesh vertices to improve their feature representation. We further propose a learning based occlusion-aware Hand Texture Regression model to achieve high-fidelity texture reconstruction. We demonstrate the flexibility of the proposed probabilistic model to be trained in both supervised and weakly-supervised scenarios. The experimental results demonstrate our probabilistic model's state-of-the-art accuracy in 3D hand and texture reconstruction from a single image in both training schemes, including in the presence of severe occlusions.

CVAug 7, 2022
Cross-Skeleton Interaction Graph Aggregation Network for Representation Learning of Mouse Social Behaviour

Feixiang Zhou, Xinyu Yang, Fang Chen et al.

Automated social behaviour analysis of mice has become an increasingly popular research area in behavioural neuroscience. Recently, pose information (i.e., locations of keypoints or skeleton) has been used to interpret social behaviours of mice. Nevertheless, effective encoding and decoding of social interaction information underlying the keypoints of mice has been rarely investigated in the existing methods. In particular, it is challenging to model complex social interactions between mice due to highly deformable body shapes and ambiguous movement patterns. To deal with the interaction modelling problem, we here propose a Cross-Skeleton Interaction Graph Aggregation Network (CS-IGANet) to learn abundant dynamics of freely interacting mice, where a Cross-Skeleton Node-level Interaction module (CS-NLI) is used to model multi-level interactions (i.e., intra-, inter- and cross-skeleton interactions). Furthermore, we design a novel Interaction-Aware Transformer (IAT) to dynamically learn the graph-level representation of social behaviours and update the node-level representation, guided by our proposed interaction-aware self-attention mechanism. Finally, to enhance the representation ability of our model, an auxiliary self-supervised learning task is proposed for measuring the similarity between cross-skeleton nodes. Experimental results on the standard CRMI13-Skeleton and our PDMB-Skeleton datasets show that our proposed model outperforms several other state-of-the-art approaches.

CVDec 19, 2023Code
SMC-NCA: Semantic-guided Multi-level Contrast for Semi-supervised Temporal Action Segmentation

Feixiang Zhou, Zheheng Jiang, Huiyu Zhou et al.

Semi-supervised temporal action segmentation (SS-TA) aims to perform frame-wise classification in long untrimmed videos, where only a fraction of videos in the training set have labels. Recent studies have shown the potential of contrastive learning in unsupervised representation learning using unlabelled data. However, learning the representation of each frame by unsupervised contrastive learning for action segmentation remains an open and challenging problem. In this paper, we propose a novel Semantic-guided Multi-level Contrast scheme with a Neighbourhood-Consistency-Aware unit (SMC-NCA) to extract strong frame-wise representations for SS-TAS. Specifically, for representation learning, SMC is first used to explore intra- and inter-information variations in a unified and contrastive way, based on action-specific semantic information and temporal information highlighting relations between actions. Then, the NCA module, which is responsible for enforcing spatial consistency between neighbourhoods centered at different frames to alleviate over-segmentation issues, works alongside SMC for semi-supervised learning (SSL). Our SMC outperforms the other state-of-the-art methods on three benchmarks, offering improvements of up to 17.8% and 12.6% in terms of Edit distance and accuracy, respectively. Additionally, the NCA unit results in significantly better segmentation performance in the presence of only 5% labelled videos. We also demonstrate the generalizability and effectiveness of the proposed method on our Parkinson Disease's Mouse Behaviour (PDMB) dataset. Code is available at https://github.com/FeixiangZhou/SMC-NCA.

CVDec 21, 2023
3D Points Splatting for Real-Time Dynamic Hand Reconstruction

Zheheng Jiang, Hossein Rahmani, Sue Black et al.

We present 3D Points Splatting Hand Reconstruction (3D-PSHR), a real-time and photo-realistic hand reconstruction approach. We propose a self-adaptive canonical points upsampling strategy to achieve high-resolution hand geometry representation. This is followed by a self-adaptive deformation that deforms the hand from the canonical space to the target pose, adapting to the dynamic changing of canonical points which, in contrast to the common practice of subdividing the MANO model, offers greater flexibility and results in improved geometry fitting. To model texture, we disentangle the appearance color into the intrinsic albedo and pose-aware shading, which are learned through a Context-Attention module. Moreover, our approach allows the geometric and the appearance models to be trained simultaneously in an end-to-end manner. We demonstrate that our method is capable of producing animatable, photorealistic and relightable hand reconstructions using multiple datasets, including monocular videos captured with handheld smartphones and large-scale multi-view videos featuring various hand poses. We also demonstrate that our approach achieves real-time rendering speeds while simultaneously maintaining superior performance compared to existing state-of-the-art methods.

CVOct 12, 2025
Action-Dynamics Modeling and Cross-Temporal Interaction for Online Action Understanding

Xinyu Yang, Zheheng Jiang, Feixiang Zhou et al.

Action understanding, encompassing action detection and anticipation, plays a crucial role in numerous practical applications. However, untrimmed videos are often characterized by substantial redundant information and noise. Moreover, in modeling action understanding, the influence of the agent's intention on the action is often overlooked. Motivated by these issues, we propose a novel framework called the State-Specific Model (SSM), designed to unify and enhance both action detection and anticipation tasks. In the proposed framework, the Critical State-Based Memory Compression module compresses frame sequences into critical states, reducing information redundancy. The Action Pattern Learning module constructs a state-transition graph with multi-dimensional edges to model action dynamics in complex scenarios, on the basis of which potential future cues can be generated to represent intention. Furthermore, our Cross-Temporal Interaction module models the mutual influence between intentions and past as well as current information through cross-temporal interactions, thereby refining present and future features and ultimately realizing simultaneous action detection and anticipation. Extensive experiments on multiple benchmark datasets -- including EPIC-Kitchens-100, THUMOS'14, TVSeries, and the introduced Parkinson's Disease Mouse Behaviour (PDMB) dataset -- demonstrate the superior performance of our proposed framework compared to other state-of-the-art approaches. These results highlight the importance of action dynamics learning and cross-temporal interactions, laying a foundation for future action understanding research.

CVDec 1, 2020
Structured Context Enhancement Network for Mouse Pose Estimation

Feixiang Zhou, Zheheng Jiang, Zhihua Liu et al.

Automated analysis of mouse behaviours is crucial for many applications in neuroscience. However, quantifying mouse behaviours from videos or images remains a challenging problem, where pose estimation plays an important role in describing mouse behaviours. Although deep learning based methods have made promising advances in human pose estimation, they cannot be directly applied to pose estimation of mice due to different physiological natures. Particularly, since mouse body is highly deformable, it is a challenge to accurately locate different keypoints on the mouse body. In this paper, we propose a novel Hourglass network based model, namely Graphical Model based Structured Context Enhancement Network (GM-SCENet) where two effective modules, i.e., Structured Context Mixer (SCM) and Cascaded Multi-Level Supervision (CMLS) are subsequently implemented. SCM can adaptively learn and enhance the proposed structured context information of each mouse part by a novel graphical model that takes into account the motion difference between body parts. Then, the CMLS module is designed to jointly train the proposed SCM and the Hourglass network by generating multi-level information, increasing the robustness of the whole network.Using the multi-level prediction information from SCM and CMLS, we develop an inference method to ensure the accuracy of the localisation results. Finally, we evaluate our proposed approach against several baselines...

CVNov 4, 2020
Muti-view Mouse Social Behaviour Recognition with Deep Graphical Model

Zheheng Jiang, Feixiang Zhou, Aite Zhao et al.

Home-cage social behaviour analysis of mice is an invaluable tool to assess therapeutic efficacy of neurodegenerative diseases. Despite tremendous efforts made within the research community, single-camera video recordings are mainly used for such analysis. Because of the potential to create rich descriptions of mouse social behaviors, the use of multi-view video recordings for rodent observations is increasingly receiving much attention. However, identifying social behaviours from various views is still challenging due to the lack of correspondence across data sources. To address this problem, we here propose a novel multiview latent-attention and dynamic discriminative model that jointly learns view-specific and view-shared sub-structures, where the former captures unique dynamics of each view whilst the latter encodes the interaction between the views. Furthermore, a novel multi-view latent-attention variational autoencoder model is introduced in learning the acquired features, enabling us to learn discriminative features in each view. Experimental results on the standard CRMI13 and our multi-view Parkinson's Disease Mouse Behaviour (PDMB) datasets demonstrate that our model outperforms the other state of the arts technologies and effectively deals with the imbalanced data problem.

CVAug 21, 2020
Perceptual underwater image enhancement with deep learning and physical priors

Long Chen, Zheheng Jiang, Lei Tong et al.

Underwater image enhancement, as a pre-processing step to improve the accuracy of the following object detection task, has drawn considerable attention in the field of underwater navigation and ocean exploration. However, most of the existing underwater image enhancement strategies tend to consider enhancement and detection as two independent modules with no interaction, and the practice of separate optimization does not always help the underwater object detection task. In this paper, we propose two perceptual enhancement models, each of which uses a deep enhancement model with a detection perceptor. The detection perceptor provides coherent information in the form of gradients to the enhancement model, guiding the enhancement model to generate patch level visually pleasing images or detection favourable images. In addition, due to the lack of training data, a hybrid underwater image synthesis model, which fuses physical priors and data-driven cues, is proposed to synthesize training data and generalise our enhancement model for real-world underwater images. Experimental results show the superiority of our proposed method over several state-of-the-art methods on both real-world and synthetic underwater datasets.

IVJul 18, 2020
Deep Learning Based Brain Tumor Segmentation: A Survey

Zhihua Liu, Lei Tong, Zheheng Jiang et al.

Brain tumor segmentation is one of the most challenging problems in medical image analysis. The goal of brain tumor segmentation is to generate accurate delineation of brain tumor regions. In recent years, deep learning methods have shown promising performance in solving various computer vision problems, such as image classification, object detection and semantic segmentation. A number of deep learning based methods have been applied to brain tumor segmentation and achieved promising results. Considering the remarkable breakthroughs made by state-of-the-art technologies, we use this survey to provide a comprehensive study of recently developed deep learning based brain tumor segmentation techniques. More than 100 scientific papers are selected and discussed in this survey, extensively covering technical aspects such as network architecture design, segmentation under imbalanced conditions, and multi-modality processes. We also provide insightful discussions for future development directions.

CVJul 15, 2020
CANet: Context Aware Network for 3D Brain Glioma Segmentation

Zhihua Liu, Lei Tong, Long Chen et al.

Automated segmentation of brain glioma plays an active role in diagnosis decision, progression monitoring and surgery planning. Based on deep neural networks, previous studies have shown promising technologies for brain glioma segmentation. However, these approaches lack powerful strategies to incorporate contextual information of tumor cells and their surrounding, which has been proven as a fundamental cue to deal with local ambiguity. In this work, we propose a novel approach named Context-Aware Network (CANet) for brain glioma segmentation. CANet captures high dimensional and discriminative features with contexts from both the convolutional space and feature interaction graphs. We further propose context guided attentive conditional random fields which can selectively aggregate features. We evaluate our method using publicly accessible brain glioma segmentation datasets BRATS2017, BRATS2018 and BRATS2019. The experimental results show that the proposed algorithm has better or competitive performance against several State-of-The-Art approaches under different segmentation metrics on the training and validation sets.

CVJun 29, 2020
A Benchmark dataset for both underwater image enhancement and underwater object detection

Long Chen, Lei Tong, Feixiang Zhou et al.

Underwater image enhancement is such an important vision task due to its significance in marine engineering and aquatic robot. It is usually work as a pre-processing step to improve the performance of high level vision tasks such as underwater object detection. Even though many previous works show the underwater image enhancement algorithms can boost the detection accuracy of the detectors, no work specially focus on investigating the relationship between these two tasks. This is mainly because existing underwater datasets lack either bounding box annotations or high quality reference images, based on which detection accuracy or image quality assessment metrics are calculated. To investigate how the underwater image enhancement methods influence the following underwater object detection tasks, in this paper, we provide a large-scale underwater object detection dataset with both bounding box annotations and high quality reference images, namely OUC dataset. The OUC dataset provides a platform for researchers to comprehensive study the influence of underwater image enhancement algorithms on the underwater object detection task.

CVMay 23, 2020
Underwater object detection using Invert Multi-Class Adaboost with deep learning

Long Chen, Zhihua Liu, Lei Tong et al.

In recent years, deep learning based methods have achieved promising performance in standard object detection. However, these methods lack sufficient capabilities to handle underwater object detection due to these challenges: (1) Objects in real applications are usually small and their images are blurry, and (2) images in the underwater datasets and real applications accompany heterogeneous noise. To address these two problems, we first propose a novel neural network architecture, namely Sample-WeIghted hyPEr Network (SWIPENet), for small object detection. SWIPENet consists of high resolution and semantic rich Hyper Feature Maps which can significantly improve small object detection accuracy. In addition, we propose a novel sample-weighted loss function which can model sample weights for SWIPENet, which uses a novel sample re-weighting algorithm, namely Invert Multi-Class Adaboost (IMA), to reduce the influence of noise on the proposed SWIPENet. Experiments on two underwater robot picking contest datasets URPC2017 and URPC2018 show that the proposed SWIPENet+IMA framework achieves better performance in detection accuracy against several state-of-the-art object detection approaches.

CVJun 6, 2019
Detection and Tracking of Multiple Mice Using Part Proposal Networks

Zheheng Jiang, Zhihua Liu, Long Chen et al.

The study of mouse social behaviours has been increasingly undertaken in neuroscience research. However, automated quantification of mouse behaviours from the videos of interacting mice is still a challenging problem, where object tracking plays a key role in locating mice in their living spaces. Artificial markers are often applied for multiple mice tracking, which are intrusive and consequently interfere with the movements of mice in a dynamic environment. In this paper, we propose a novel method to continuously track several mice and individual parts without requiring any specific tagging. Firstly, we propose an efficient and robust deep learning based mouse part detection scheme to generate part candidates. Subsequently, we propose a novel Bayesian Integer Linear Programming Model that jointly assigns the part candidates to individual targets with necessary geometric constraints whilst establishing pair-wise association between the detected parts. There is no publicly available dataset in the research community that provides a quantitative test-bed for the part detection and tracking of multiple mice, and we here introduce a new challenging Multi-Mice PartsTrack dataset that is made of complex behaviours and actions. Finally, we evaluate our proposed approach against several baselines on our new datasets, where the results show that our method outperforms the other state-of-the-art approaches in terms of accuracy.

LGJun 2, 2019
Cost-sensitive Boosting Pruning Trees for depression detection on Twitter

Lei Tong, Zhihua Liu, Zheheng Jiang et al.

Depression is one of the most common mental health disorders, and a large number of depressed people commit suicide each year. Potential depression sufferers usually do not consult psychological doctors because they feel ashamed or are unaware of any depression, which may result in severe delay of diagnosis and treatment. In the meantime, evidence shows that social media data provides valuable clues about physical and mental health conditions. In this paper, we argue that it is feasible to identify depression at an early stage by mining online social behaviours. Our approach, which is innovative to the practice of depression detection, does not rely on the extraction of numerous or complicated features to achieve accurate depression detection. Instead, we propose a novel classifier, namely, Cost-sensitive Boosting Pruning Trees (CBPT), which demonstrates a strong classification ability on two publicly accessible Twitter depression detection datasets. To comprehensively evaluate the classification capability of the CBPT, we use additional three datasets from the UCI machine learning repository and the CBPT obtains appealing classification results against several state of the arts boosting algorithms. Finally, we comprehensively explore the influence factors of model prediction, and the results manifest that our proposed framework is promising for identifying Twitter users with depression.