Guizhong Liu

CV
h-index13
20papers
251citations
Novelty53%
AI Score37

20 Papers

AIMay 19, 2022
Image Augmentation Based Momentum Memory Intrinsic Reward for Sparse Reward Visual Scenes

Zheng Fang, Biao Zhao, Guizhong Liu

Many scenes in real life can be abstracted to the sparse reward visual scenes, where it is difficult for an agent to tackle the task under the condition of only accepting images and sparse rewards. We propose to decompose this problem into two sub-problems: the visual representation and the sparse reward. To address them, a novel framework IAMMIR combining the self-supervised representation learning with the intrinsic motivation is presented. For visual representation, a representation driven by a combination of the imageaugmented forward dynamics and the reward is acquired. For sparse rewards, a new type of intrinsic reward is designed, the Momentum Memory Intrinsic Reward (MMIR). It utilizes the difference of the outputs from the current model (online network) and the historical model (target network) to present the agent's state familiarity. Our method is evaluated on the visual navigation task with sparse rewards in Vizdoom. Experiments demonstrate that our method achieves the state of the art performance in sample efficiency, at least 2 times faster than the existing methods reaching 100% success rate.

LGJul 4, 2023
Relation-aware graph structure embedding with co-contrastive learning for drug-drug interaction prediction

Mengying Jiang, Guizhong Liu, Biao Zhao et al.

Relation-aware graph structure embedding is promising for predicting multi-relational drug-drug interactions (DDIs). Typically, most existing methods begin by constructing a multi-relational DDI graph and then learning relation-aware graph structure embeddings (RaGSEs) of drugs from the DDI graph. Nevertheless, most existing approaches are usually limited in learning RaGSEs of new drugs, leading to serious over-fitting when the test DDIs involve such drugs. To alleviate this issue, we propose a novel DDI prediction method based on relation-aware graph structure embedding with co-contrastive learning, RaGSECo. The proposed RaGSECo constructs two heterogeneous drug graphs: a multi-relational DDI graph and a multi-attribute drug-drug similarity (DDS) graph. The two graphs are used respectively for learning and propagating the RaGSEs of drugs, aiming to ensure all drugs, including new ones, can possess effective RaGSEs. Additionally, we present a novel co-contrastive learning module to learn drug-pairs (DPs) representations. This mechanism learns DP representations from two distinct views (interaction and similarity views) and encourages these views to supervise each other collaboratively to obtain more discriminative DP representations. We evaluate the effectiveness of our RaGSECo on three different tasks using two real datasets. The experimental results demonstrate that RaGSECo outperforms existing state-of-the-art prediction methods.

AIJun 9, 2025Code
Curriculum Learning With Counterfactual Group Relative Policy Advantage For Multi-Agent Reinforcement Learning

Weiqiang Jin, Hongyang Du, Guizhong Liu et al.

Multi-agent reinforcement learning (MARL) has achieved strong performance in cooperative adversarial tasks. However, most existing methods typically train agents against fixed opponent strategies and rely on such meta-static difficulty conditions, which limits their adaptability to changing environments and often leads to suboptimal policies. Inspired by the success of curriculum learning (CL) in supervised tasks, we propose a dynamic CL framework for MARL that employs an self-adaptive difficulty adjustment mechanism. This mechanism continuously modulates opponent strength based on real-time agent training performance, allowing agents to progressively learn from easier to more challenging scenarios. However, the dynamic nature of CL introduces instability due to nonstationary environments and sparse global rewards. To address this challenge, we develop a Counterfactual Group Relative Policy Advantage (CGRPA), which is tightly coupled with the curriculum by providing intrinsic credit signals that reflect each agent's impact under evolving task demands. CGRPA constructs a counterfactual advantage function that isolates individual contributions within group behavior, facilitating more reliable policy updates throughout the curriculum. CGRPA evaluates each agent's contribution through constructing counterfactual action advantage function, providing intrinsic rewards that enhance credit assignment and stabilize learning under non-stationary conditions. Extensive experiments demonstrate that our method improves both training stability and final performance, achieving competitive results against state-of-the-art methods. The code is available at https://github.com/NICE-HKU/CL2MARL-SMAC.

CLOct 25, 2021Code
Improving Embedded Knowledge Graph Multi-hop Question Answering by introducing Relational Chain Reasoning

Weiqiang Jin, Biao Zhao, Hang Yu et al.

Knowledge Graph Question Answering (KGQA) aims to answer user-questions from a knowledge graph (KG) by identifying the reasoning relations between topic entity and answer. As a complex branch task of KGQA, multi-hop KGQA requires reasoning over the multi-hop relational chain preserved in KG to arrive at the right answer. Despite recent successes, the existing works on answering multi-hop complex questions still face the following challenges: i) The absence of an explicit relational chain order reflected in user-question stems from a misunderstanding of a user's intentions. ii) Incorrectly capturing relational types on weak supervision of which dataset lacks intermediate reasoning chain annotations due to expensive labeling cost. iii) Failing to consider implicit relations between the topic entity and the answer implied in structured KG because of limited neighborhoods size constraint in subgraph retrieval-based algorithms.To address these issues in multi-hop KGQA, we propose a novel model herein, namely Relational Chain based Embedded KGQA (Rce-KGQA), which simultaneously utilizes the explicit relational chain revealed in natural language question and the implicit relational chain stored in structured KG. Our extensive empirical study on three open-domain benchmarks proves that our method significantly outperforms the state-of-the-art counterparts like GraftNet, PullNet and EmbedKGQA. Comprehensive ablation experiments also verify the effectiveness of our method on the multi-hop KGQA task. We have made our model's source code available at github: https://github.com/albert-jin/Rce-KGQA.

LGJun 30, 2023
Landmark Guided Active Exploration with State-specific Balance Coefficient

Fei Cui, Jiaojiao Fang, Mengke Yang et al.

Goal-conditioned hierarchical reinforcement learning (GCHRL) decomposes long-horizon tasks into sub-tasks through a hierarchical framework and it has demonstrated promising results across a variety of domains. However, the high-level policy's action space is often excessively large, presenting a significant challenge to effective exploration and resulting in potentially inefficient training. In this paper, we design a measure of prospect for sub-goals by planning in the goal space based on the goal-conditioned value function. Building upon the measure of prospect, we propose a landmark-guided exploration strategy by integrating the measures of prospect and novelty which aims to guide the agent to explore efficiently and improve sample efficiency. In order to dynamically consider the impact of prospect and novelty on exploration, we introduce a state-specific balance coefficient to balance the significance of prospect and novelty. The experimental results demonstrate that our proposed exploration strategy significantly outperforms the baseline methods across multiple tasks.

LGMar 6, 2024
Self-Attention Empowered Graph Convolutional Network for Structure Learning and Node Embedding

Mengying Jiang, Guizhong Liu, Yuanchao Su et al.

In representation learning on graph-structured data, many popular graph neural networks (GNNs) fail to capture long-range dependencies, leading to performance degradation. Furthermore, this weakness is magnified when the concerned graph is characterized by heterophily (low homophily). To solve this issue, this paper proposes a novel graph learning framework called the graph convolutional network with self-attention (GCN-SA). The proposed scheme exhibits an exceptional generalization capability in node-level representation learning. The proposed GCN-SA contains two enhancements corresponding to edges and node features. For edges, we utilize a self-attention mechanism to design a stable and effective graph-structure-learning module that can capture the internal correlation between any pair of nodes. This graph-structure-learning module can identify reliable neighbors for each node from the entire graph. Regarding the node features, we modify the transformer block to make it more applicable to enable GCN to fuse valuable information from the entire graph. These two enhancements work in distinct ways to help our GCN-SA capture long-range dependencies, enabling it to perform representation learning on graphs with varying levels of homophily. The experimental results on benchmark datasets demonstrate the effectiveness of the proposed GCN-SA. Compared to other outstanding GNN counterparts, the proposed GCN-SA is competitive.

LGFeb 28, 2024
Hierarchical Multi-Relational Graph Representation Learning for Large-Scale Prediction of Drug-Drug Interactions

Mengying Jiang, Guizhong Liu, Yuanchao Su et al.

Most existing methods for predicting drug-drug interactions (DDI) predominantly concentrate on capturing the explicit relationships among drugs, overlooking the valuable implicit correlations present between drug pairs (DPs), which leads to weak predictions. To address this issue, this paper introduces a hierarchical multi-relational graph representation learning (HMGRL) approach. Within the framework of HMGRL, we leverage a wealth of drug-related heterogeneous data sources to construct heterogeneous graphs, where nodes represent drugs and edges denote clear and various associations. The relational graph convolutional network (RGCN) is employed to capture diverse explicit relationships between drugs from these heterogeneous graphs. Additionally, a multi-view differentiable spectral clustering (MVDSC) module is developed to capture multiple valuable implicit correlations between DPs. Within the MVDSC, we utilize multiple DP features to construct graphs, where nodes represent DPs and edges denote different implicit correlations. Subsequently, multiple DP representations are generated through graph cutting, each emphasizing distinct implicit correlations. The graph-cutting strategy enables our HMGRL to identify strongly connected communities of graphs, thereby reducing the fusion of irrelevant features. By combining every representation view of a DP, we create high-level DP representations for predicting DDIs. Two genuine datasets spanning three distinct tasks are adopted to gauge the efficacy of our HMGRL. Experimental outcomes unequivocally indicate that HMGRL surpasses several leading-edge methods in performance.

CVApr 17, 2024
State-space Decomposition Model for Video Prediction Considering Long-term Motion Trend

Fei Cui, Jiaojiao Fang, Xiaojiang Wu et al.

Stochastic video prediction enables the consideration of uncertainty in future motion, thereby providing a better reflection of the dynamic nature of the environment. Stochastic video prediction methods based on image auto-regressive recurrent models need to feed their predictions back into the latent space. Conversely, the state-space models, which decouple frame synthesis and temporal prediction, proves to be more efficient. However, inferring long-term temporal information about motion and generalizing to dynamic scenarios under non-stationary assumptions remains an unresolved challenge. In this paper, we propose a state-space decomposition stochastic video prediction model that decomposes the overall video frame generation into deterministic appearance prediction and stochastic motion prediction. Through adaptive decomposition, the model's generalization capability to dynamic scenarios is enhanced. In the context of motion prediction, obtaining a prior on the long-term trend of future motion is crucial. Thus, in the stochastic motion prediction branch, we infer the long-term motion trend from conditional frames to guide the generation of future frames that exhibit high consistency with the conditional frames. Experimental results demonstrate that our model outperforms baselines on multiple datasets.

CVAug 9, 2021
Self-supervised Learning of Occlusion Aware Flow Guided 3D Geometry Perception with Adaptive Cross Weighted Loss from Monocular Videos

Jiaojiao Fang, Guizhong Liu

Self-supervised deep learning-based 3D scene understanding methods can overcome the difficulty of acquiring the densely labeled ground-truth and have made a lot of advances. However, occlusions and moving objects are still some of the major limitations. In this paper, we explore the learnable occlusion aware optical flow guided self-supervised depth and camera pose estimation by an adaptive cross weighted loss to address the above limitations. Firstly, we explore to train the learnable occlusion mask fused optical flow network by an occlusion-aware photometric loss with the temporally supplemental information and backward-forward consistency of adjacent views. And then, we design an adaptive cross-weighted loss between the depth-pose and optical flow loss of the geometric and photometric error to distinguish the moving objects which violate the static scene assumption. Our method shows promising results on KITTI, Make3D, and Cityscapes datasets under multiple tasks. We also show good generalization ability under a variety of challenging scenarios.

CVAug 4, 2021
Self-Supervised Learning of Depth and Ego-Motion from Video by Alternative Training and Geometric Constraints from 3D to 2D

Jiaojiao Fang, Guizhong Liu

Self-supervised learning of depth and ego-motion from unlabeled monocular video has acquired promising results and drawn extensive attention. Most existing methods jointly train the depth and pose networks by photometric consistency of adjacent frames based on the principle of structure-from-motion (SFM). However, the coupling relationship of the depth and pose networks seriously influences the learning performance, and the re-projection relations is sensitive to scale ambiguity, especially for pose learning. In this paper, we aim to improve the depth-pose learning performance without the auxiliary tasks and address the above issues by alternative training each task and incorporating the epipolar geometric constraints into the Iterative Closest Point (ICP) based point clouds match process. Distinct from jointly training the depth and pose networks, our key idea is to better utilize the mutual dependency of these two tasks by alternatively training each network with respective losses while fixing the other. We also design a log-scale 3D structural consistency loss to put more emphasis on the smaller depth values during training. To makes the optimization easier, we further incorporate the epipolar geometry into the ICP based learning process for pose learning. Extensive experiments on various benchmarks datasets indicate the superiority of our algorithm over the state-of-the-art self-supervised methods.

CVJun 21, 2021
Trainable Class Prototypes for Few-Shot Learning

Jianyi Li, Guizhong Liu

Metric learning is a widely used method for few shot learning in which the quality of prototypes plays a key role in the algorithm. In this paper we propose the trainable prototypes for distance measure instead of the artificial ones within the meta-training and task-training framework. Also to avoid the disadvantages that the episodic meta-training brought, we adopt non-episodic meta-training based on self-supervised learning. Overall we solve the few-shot tasks in two phases: meta-training a transferable feature extractor via self-supervised learning and training the prototypes for metric classification. In addition, the simple attention mechanism is used in both meta-training and task-training. Our method achieves state-of-the-art performance in a variety of established few-shot tasks on the standard few-shot visual classification dataset, with about 20% increase compared to the available unsupervised few-shot learning methods.

LGMay 28, 2021
GCN-SL: Graph Convolutional Networks with Structure Learning for Graphs under Heterophily

Mengying Jiang, Guizhong Liu, Yuanchao Su et al.

In representation learning on the graph-structured data, under heterophily (or low homophily), many popular GNNs may fail to capture long-range dependencies, which leads to their performance degradation. To solve the above-mentioned issue, we propose a graph convolutional networks with structure learning (GCN-SL), and furthermore, the proposed approach can be applied to node classification. The proposed GCN-SL contains two improvements: corresponding to node features and edges, respectively. In the aspect of node features, we propose an efficient-spectral-clustering (ESC) and an ESC with anchors (ESC-ANCH) algorithms to efficiently aggregate feature representations from all similar nodes. In the aspect of edges, we build a re-connected adjacency matrix by using a special data preprocessing technique and similarity learning, and the re-connected adjacency matrix can be optimized directly along with GCN-SL parameters. Considering that the original adjacency matrix may provide misleading information for aggregation in GCN, especially the graphs being with a low level of homophily. The proposed GCN-SL can aggregate feature representations from nearby nodes via re-connected adjacency matrix and is applied to graphs with various levels of homophily. Experimental results on a wide range of benchmark datasets illustrate that the proposed GCN-SL outperforms the stateof-the-art GNN counterparts.

AIMar 14, 2021
R-GSN: The Relation-based Graph Similar Network for Heterogeneous Graph

Xinliang Wu, Mengying Jiang, Guizhong Liu

Heterogeneous graph is a kind of data structure widely existing in real life. Nowadays, the research of graph neural network on heterogeneous graph has become more and more popular. The existing heterogeneous graph neural network algorithms mainly have two ideas, one is based on meta-path and the other is not. The idea based on meta-path often requires a lot of manual preprocessing, at the same time it is difficult to extend to large scale graphs. In this paper, we proposed the general heterogeneous message passing paradigm and designed R-GSN that does not need meta-path, which is much improved compared to the baseline R-GCN. Experiments have shown that our R-GSN algorithm achieves the state-of-the-art performance on the ogbn-mag large scale heterogeneous graph dataset.

CVAug 23, 2020
Few-Shot Image Classification via Contrastive Self-Supervised Learning

Jianyi Li, Guizhong Liu

Most previous few-shot learning algorithms are based on meta-training with fake few-shot tasks as training samples, where large labeled base classes are required. The trained model is also limited by the type of tasks. In this paper we propose a new paradigm of unsupervised few-shot learning to repair the deficiencies. We solve the few-shot tasks in two phases: meta-training a transferable feature extractor via contrastive self-supervised learning and training a classifier using graph aggregation, self-distillation and manifold augmentation. Once meta-trained, the model can be used in any type of tasks with a task-dependent classifier training. Our method achieves state of-the-art performance in a variety of established few-shot tasks on the standard few-shot visual classification datasets, with an 8- 28% increase compared to the available unsupervised few-shot learning methods.

MMSep 9, 2019
Hit Ratio Driven Mobile Edge Caching Scheme for Video on Demand Services

Xing Chen, Lijun He, Shang Xu et al.

More and more scholars focus on mobile edge computing (MEC) technology, because the strong storage and computing capabilities of MEC servers can reduce the long transmission delay, bandwidth waste, energy consumption, and privacy leaks in the data transmission process. In this paper, we study the cache placement problem to determine how to cache videos and which videos to be cached in a mobile edge computing system. First, we derive the video request probability by taking into account video popularity, user preference and the characteristic of video representations. Second, based on the acquired request probability, we formulate a cache placement problem with the objective to maximize the cache hit ratio subject to the storage capacity constraints. Finally, in order to solve the formulated problem, we transform it into a grouping knapsack problem and develop a dynamic programming algorithm to obtain the optimal caching strategy. Simulation results show that the proposed algorithm can greatly improve the cache hit ratio.

CVSep 3, 2019
Unsupervised Video Depth Estimation Based on Ego-motion and Disparity Consensus

Lingtao Zhou, Jiaojiao Fang, Guizhong Liu

Unsupervised learning based depth estimation methods have received more and more attention as they do not need vast quantities of densely labeled data for training which are touch to acquire. In this paper, we propose a novel unsupervised monocular video depth estimation method in natural scenes by taking advantage of the state-of-the-art method of Zhou et al. which jointly estimates depth and camera motion. Our method advances beyond the baseline method by three aspects: 1) we add an additional signal as supervision to the baseline method by incorporating left-right binocular images reconstruction loss based on the estimated disparities, thus the left frame can be reconstructed by the temporal frames and right frames of stereo vision; 2) the network is trained by jointly using two kinds of view syntheses loss and left-right disparity consistency regularization to estimate depth and pose simultaneously; 3) we use the edge aware smooth L2 regularization to smooth the depth map while preserving the contour of the target. Extensive experiments on the KITTI autonomous driving dataset and Make3D dataset indicate the superiority of our algorithm in training efficiency. We can achieve competitive results with the baseline by only 3/5 times training data. The experimental results also show that our method even outperforms the classical supervised methods that using either ground truth depth or given pose for training.

CVSep 1, 2019
3D Bounding Box Estimation for Autonomous Vehicles by Cascaded Geometric Constraints and Depurated 2D Detections Using 3D Results

Jiaojiao Fang, Lingtao Zhou, Guizhong Liu

3D object detection is one of the most important tasks in 3D vision perceptual system of autonomous vehicles. In this paper, we propose a novel two stage 3D object detection method aimed at get the optimal solution of object location in 3D space based on regressing two additional 3D object properties by a deep convolutional neural network and combined with cascaded geometric constraints between the 2D and 3D boxes. First, we modify the existing 3D properties regressing network by adding two additional components, viewpoints classification and the center projection of the 3D bounding box s bottom face. Second, we use the predicted center projection combined with similar triangle constraint to acquire an initial 3D bounding box by a closed-form solution. Then, the location predicted by previous step is used as the initial value of the over-determined equations constructed by 2D and 3D boxes fitting constraint with the configuration determined with the classified viewpoint. Finally, we use the recovered physical world information by the 3D detections to filter out the false detection and false alarm in 2D detections. We compare our method with the state-of-the-arts on the KITTI dataset show that although conceptually simple, our method outperforms more complex and computational expensive methods not only by improving the overall precision of 3D detections, but also increasing the orientation estimation precision. Furthermore our method can deal with the truncated objects to some extent and remove the false alarm and false detections in both 2D and 3D detections.

CVSep 1, 2019
Flow Guided Short-term Trackers with Cascade Detection for Long-term Tracking

Han Wu, Xueyuan Yang, Yong Yang et al.

Object tracking has been studied for decades, but most of the existing works are focused on the short-term tracking. For a long sequence, the object is often fully occluded or out of view for a long time, and existing short-term object tracking algorithms often lose the target, and it is difficult to re-catch the target even if it reappears again. In this paper a novel long-term object tracking algorithm flow_MDNet_RPN is proposed, in which a tracking result judgement module and a detection module are added to the short-term object tracking algorithm. Experiments show that the proposed long-term tracking algorithm is effective to the problem of target disappearance.

CVSep 1, 2019
Multiple Object Tracking with Motion and Appearance Cues

Weiqiang Li, Jiatong Mu, Guizhong Liu

Due to better video quality and higher frame rate, the performance of multiple object tracking issues has been greatly improved in recent years. However, in real application scenarios, camera motion and noisy per frame detection results degrade the performance of trackers significantly. High-speed and high-quality multiple object trackers are still in urgent demand. In this paper, we propose a new multiple object tracker following the popular tracking-by-detection scheme. We tackle the camera motion problem with an optical flow network and utilize an auxiliary tracker to deal with the missing detection problem. Besides, we use both the appearance and motion information to improve the matching quality. The experimental results on the VisDrone-MOT dataset show that our approach can improve the performance of multiple object tracking significantly while achieving a high efficiency.

CVFeb 8, 2019
A Single-shot Object Detector with Feature Aggragation and Enhancement

Weiqiang Li, Guizhong Liu

For many real applications, it is equally important to detect objects accurately and quickly. In this paper, we propose an accurate and efficient single shot object detector with feature aggregation and enhancement (FAENet). Our motivation is to enhance and exploit the shallow and deep feature maps of the whole network simultaneously. To achieve it we introduce a pair of novel feature aggregation modules and two feature enhancement blocks, and integrate them into the original structure of SSD. Extensive experiments on both the PASCAL VOC and MS COCO datasets demonstrate that the proposed method achieves much higher accuracy than SSD. In addition, our method performs better than the state-of-the-art one-stage detector RefineDet on small objects and can run at a faster speed.