Kewei Liang

CV
4papers
123citations
Novelty50%
AI Score28

4 Papers

CVFeb 6, 2023
PatchDCT: Patch Refinement for High Quality Instance Segmentation

Qinrou Wen, Jirui Yang, Xue Yang et al.

High-quality instance segmentation has shown emerging importance in computer vision. Without any refinement, DCT-Mask directly generates high-resolution masks by compressed vectors. To further refine masks obtained by compressed vectors, we propose for the first time a compressed vector based multi-stage refinement framework. However, the vanilla combination does not bring significant gains, because changes in some elements of the DCT vector will affect the prediction of the entire mask. Thus, we propose a simple and novel method named PatchDCT, which separates the mask decoded from a DCT vector into several patches and refines each patch by the designed classifier and regressor. Specifically, the classifier is used to distinguish mixed patches from all patches, and to correct previously mispredicted foreground and background patches. In contrast, the regressor is used for DCT vector prediction of mixed patches, further refining the segmentation quality at boundary locations. Experiments on COCO show that our method achieves 2.0%, 3.2%, 4.5% AP and 3.4%, 5.3%, 7.0% Boundary AP improvements over Mask-RCNN on COCO, LVIS, and Cityscapes, respectively. It also surpasses DCT-Mask by 0.7%, 1.1%, 1.3% AP and 0.9%, 1.7%, 4.2% Boundary AP on COCO, LVIS and Cityscapes. Besides, the performance of PatchDCT is also competitive with other state-of-the-art methods.

CVApr 11, 2023
Multi-Graph Convolution Network for Pose Forecasting

Hongwei Ren, Yuhong Shi, Kewei Liang

Recently, there has been a growing interest in predicting human motion, which involves forecasting future body poses based on observed pose sequences. This task is complex due to modeling spatial and temporal relationships. The most commonly used models for this task are autoregressive models, such as recurrent neural networks (RNNs) or variants, and Transformer Networks. However, RNNs have several drawbacks, such as vanishing or exploding gradients. Other researchers have attempted to solve the communication problem in the spatial dimension by integrating Graph Convolutional Networks (GCN) and Long Short-Term Memory (LSTM) models. These works deal with temporal and spatial information separately, which limits the effectiveness. To fix this problem, we propose a novel approach called the multi-graph convolution network (MGCN) for 3D human pose forecasting. This model simultaneously captures spatial and temporal information by introducing an augmented graph for pose sequences. Multiple frames give multiple parts, joined together in a single graph instance. Furthermore, we also explore the influence of natural structure and sequence-aware attention to our model. In our experimental evaluation of the large-scale benchmark datasets, Human3.6M, AMSS and 3DPW, MGCN outperforms the state-of-the-art in pose prediction.

CVNov 19, 2020Code
DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

Xing Shen, Jirui Yang, Chunbo Wei et al.

Binary grid mask representation is broadly used in instance segmentation. A representative instantiation is Mask R-CNN which predicts masks on a $28\times 28$ binary grid. Generally, a low-resolution grid is not sufficient to capture the details, while a high-resolution grid dramatically increases the training complexity. In this paper, we propose a new mask representation by applying the discrete cosine transform(DCT) to encode the high-resolution binary grid mask into a compact vector. Our method, termed DCT-Mask, could be easily integrated into most pixel-based instance segmentation methods. Without any bells and whistles, DCT-Mask yields significant gains on different frameworks, backbones, datasets, and training schedules. It does not require any pre-processing or pre-training, and almost no harm to the running speed. Especially, for higher-quality annotations and more complex backbones, our method has a greater improvement. Moreover, we analyze the performance of our method from the perspective of the quality of mask representation. The main reason why DCT-Mask works well is that it obtains a high-quality mask representation with low complexity. Code is available at https://github.com/aliyun/DCT-Mask.git.

CVOct 18, 2020
Tracklets Predicting Based Adaptive Graph Tracking

Chaobing Shan, Chunbo Wei, Bing Deng et al.

Most of the existing tracking methods link the detected boxes to the tracklets using a linear combination of feature cosine distances and box overlap. But the problem of inconsistent features of an object in two different frames still exists. In addition, when extracting features, only appearance information is utilized, neither the location relationship nor the information of the tracklets is considered. We present an accurate and end-to-end learning framework for multi-object tracking, namely \textbf{TPAGT}. It re-extracts the features of the tracklets in the current frame based on motion predicting, which is the key to solve the problem of features inconsistent. The adaptive graph neural network in TPAGT is adopted to fuse locations, appearance, and historical information, and plays an important role in distinguishing different objects. In the training phase, we propose the balanced MSE LOSS to successfully overcome the unbalanced samples. Experiments show that our method reaches state-of-the-art performance. It achieves 76.5\% MOTA on the MOT16 challenge and 76.2\% MOTA on the MOT17 challenge.