Qiang Guo

CV
h-index1
19papers
204citations
Novelty47%
AI Score47

19 Papers

CVNov 4, 2022Code
Tensor Robust PCA with Nonconvex and Nonlocal Regularization

Xiaoyu Geng, Qiang Guo, Shuaixiong Hui et al.

Tensor robust principal component analysis (TRPCA) is a classical way for low-rank tensor recovery, which minimizes the convex surrogate of tensor rank by shrinking each tensor singular value equally. However, for real-world visual data, large singular values represent more significant information than small singular values. In this paper, we propose a nonconvex TRPCA (N-TRPCA) model based on the tensor adjustable logarithmic norm. Unlike TRPCA, our N-TRPCA can adaptively shrink small singular values more and shrink large singular values less. In addition, TRPCA assumes that the whole data tensor is of low rank. This assumption is hardly satisfied in practice for natural visual data, restricting the capability of TRPCA to recover the edges and texture details from noisy images and videos. To this end, we integrate nonlocal self-similarity into N-TRPCA, and further develop a nonconvex and nonlocal TRPCA (NN-TRPCA) model. Specifically, similar nonlocal patches are grouped as a tensor and then each group tensor is recovered by our N-TRPCA. Since the patches in one group are highly correlated, all group tensors have strong low-rank property, leading to an improvement of recovery performance. Experimental results demonstrate that the proposed NN-TRPCA outperforms existing TRPCA methods in visual data recovery. The demo code is available at https://github.com/qguo2010/NN-TRPCA.

ROMay 15Code
PCASim: Promptable Closed-loop Adversarial Simulation for Urban Traffic Environment

Chuancheng Zhang, Zhenhao Wang, Kaizheng Li et al.

Real-world autonomous driving, particularly in urban environments with numerous corner cases, requires rigorous testing to ensure product safety and robustness. However, few studies have explored integrating adversarial scenario generation with the training of safety agents in closed-loop testing, enabling efficient co-evolution and mutual enhancement of both. To address this challenge, an adversarial behavior knowledge repository is constructed by applying rule-based filtering to an open-source dataset, combined with knowledge retrieval modules tailored for simulation environments. A large language model (LLM) is employed to integrate knowledge-, data-, and adversarial-driven approaches, generating safety-critical traffic scenarios customized to user needs. Additionally, while evaluating the generated scenarios, we employ reinforcement learning models to train the behaviors of different types of vehicles, thereby enriching scenario diversity beyond existing datasets while preserving realism. Experimental results demonstrate that the proposed framework improves the accuracy of domain-specific language generation by 12\%. Moreover, the success rate of newly generated scenario transformations increases by 8\%, while obstacle-avoidance capability is enhanced by 30\%. For the complete manuscript, please refer to: https://zhenhaooo.github.io/PCASim.github.io/

CVSep 7, 2024
Medical Image Segmentation via Single-Source Domain Generalization with Random Amplitude Spectrum Synthesis

Qiang Qiao, Wenyu Wang, Meixia Qu et al.

The field of medical image segmentation is challenged by domain generalization (DG) due to domain shifts in clinical datasets. The DG challenge is exacerbated by the scarcity of medical data and privacy concerns. Traditional single-source domain generalization (SSDG) methods primarily rely on stacking data augmentation techniques to minimize domain discrepancies. In this paper, we propose Random Amplitude Spectrum Synthesis (RASS) as a training augmentation for medical images. RASS enhances model generalization by simulating distribution changes from a frequency perspective. This strategy introduces variability by applying amplitude-dependent perturbations to ensure broad coverage of potential domain variations. Furthermore, we propose random mask shuffle and reconstruction components, which can enhance the ability of the backbone to process structural information and increase resilience intra- and cross-domain changes. The proposed Random Amplitude Spectrum Synthesis for Single-Source Domain Generalization (RAS^4DG) is validated on 3D fetal brain images and 2D fundus photography, and achieves an improved DG segmentation performance compared to other SSDG models.

CVOct 15, 2023
Efficient and Effective Deep Multi-view Subspace Clustering

Yuxiu Lin, Hui Liu, Ren Wang et al.

Recent multi-view subspace clustering achieves impressive results utilizing deep networks, where the self-expressive correlation is typically modeled by a fully connected (FC) layer. However, they still suffer from two limitations. i) The parameter scale of the FC layer is quadratic to sample numbers, resulting in high time and memory costs that significantly degrade their feasibility in large-scale datasets. ii) It is under-explored to extract a unified representation that simultaneously satisfies minimal sufficiency and discriminability. To this end, we propose a novel deep framework, termed Efficient and Effective deep Multi-View Subspace Clustering (E$^2$MVSC). Instead of a parameterized FC layer, we design a Relation-Metric Net that decouples network parameter scale from sample numbers for greater computational efficiency. Most importantly, the proposed method devises a multi-type auto-encoder to explicitly decouple consistent, complementary, and superfluous information from every view, which is supervised by a soft clustering assignment similarity constraint. Following information bottleneck theory and the maximal coding rate reduction principle, a sufficient yet minimal unified representation can be obtained, as well as pursuing intra-cluster aggregation and inter-cluster separability within it. Extensive experiments show that E$^2$MVSC yields comparable results to existing methods and achieves state-of-the-art performance in various types of multi-view datasets.

IVApr 21, 2021Code
NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Methods and Results

Ren Yang, Radu Timofte, Jing Liu et al.

This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results. In this challenge, the new Large-scale Diverse Video (LDV) dataset is employed. The challenge has three tracks. Tracks 1 and 2 aim at enhancing the videos compressed by HEVC at a fixed QP, while Track 3 is designed for enhancing the videos compressed by x265 at a fixed bit-rate. Besides, the quality enhancement of Tracks 1 and 3 targets at improving the fidelity (PSNR), and Track 2 targets at enhancing the perceptual quality. The three tracks totally attract 482 registrations. In the test phase, 12 teams, 8 teams and 11 teams submitted the final results of Tracks 1, 2 and 3, respectively. The proposed methods and solutions gauge the state-of-the-art of video quality enhancement. The homepage of the challenge: https://github.com/RenYang-home/NTIRE21_VEnh

CVMay 8, 2020Code
TSDM: Tracking by SiamRPN++ with a Depth-refiner and a Mask-generator

Pengyao Zhao, Quanli Liu, Wei Wang et al.

In a generic object tracking, depth (D) information provides informative cues for foreground-background separation and target bounding box regression. However, so far, few trackers have used depth information to play the important role aforementioned due to the lack of a suitable model. In this paper, a RGB-D tracker named TSDM is proposed, which is composed of a Mask-generator (M-g), SiamRPN++ and a Depth-refiner (D-r). The M-g generates the background masks, and updates them as the target 3D position changes. The D-r optimizes the target bounding box estimated by SiamRPN++, based on the spatial depth distribution difference between the target and the surrounding background. Extensive evaluation on the Princeton Tracking Benchmark and the Visual Object Tracking challenge shows that our tracker outperforms the state-of-the-art by a large margin while achieving 23 FPS. In addition, a light-weight variant can run at 31 FPS and thus it is practical for real world applications. Code and models of TSDM are available at https://github.com/lql-team/TSDM.

CVAug 20, 2024
Rethinking Video Segmentation with Masked Video Consistency: Did the Model Learn as Intended?

Chen Liang, Qiang Guo, Xiaochao Qu et al.

Video segmentation aims at partitioning video sequences into meaningful segments based on objects or regions of interest within frames. Current video segmentation models are often derived from image segmentation techniques, which struggle to cope with small-scale or class-imbalanced video datasets. This leads to inconsistent segmentation results across frames. To address these issues, we propose a training strategy Masked Video Consistency, which enhances spatial and temporal feature aggregation. MVC introduces a training strategy that randomly masks image patches, compelling the network to predict the entire semantic segmentation, thus improving contextual information integration. Additionally, we introduce Object Masked Attention (OMA) to optimize the cross-attention mechanism by reducing the impact of irrelevant queries, thereby enhancing temporal modeling capabilities. Our approach, integrated into the latest decoupled universal video segmentation framework, achieves state-of-the-art performance across five datasets for three video segmentation tasks, demonstrating significant improvements over previous methods without increasing model parameters.

CVMar 29, 2024
MCNet: A crowd denstity estimation network based on integrating multiscale attention module

Qiang Guo, Rubo Zhang, Di Zhao

Aiming at the metro video surveillance system has not been able to effectively solve the metro crowd density estimation problem, a Metro Crowd density estimation Network (called MCNet) is proposed to automatically classify crowd density level of passengers. Firstly, an Integrating Multi-scale Attention (IMA) module is proposed to enhance the ability of the plain classifiers to extract semantic crowd texture features to accommodate to the characteristics of the crowd texture feature. The innovation of the IMA module is to fuse the dilation convolution, multiscale feature extraction and attention mechanism to obtain multi-scale crowd feature activation from a larger receptive field with lower computational cost, and to strengthen the crowds activation state of convolutional features in top layers. Secondly, a novel lightweight crowd texture feature extraction network is proposed, which can directly process video frames and automatically extract texture features for crowd density estimation, while its faster image processing speed and fewer network parameters make it flexible to be deployed on embedded platforms with limited hardware resources. Finally, this paper integrates IMA module and the lightweight crowd texture feature extraction network to construct the MCNet, and validate the feasibility of this network on image classification dataset: Cifar10 and four crowd density datasets: PETS2009, Mall, QUT and SH_METRO to validate the MCNet whether can be a suitable solution for crowd density estimation in metro video surveillance where there are image processing challenges such as high density, high occlusion, perspective distortion and limited hardware resources.

CVAug 2, 2025
A Full-Stage Refined Proposal Algorithm for Suppressing False Positives in Two-Stage CNN-Based Detection Methods

Qiang Guo, Rubo Zhang, Bingbing Zhang et al.

False positives in pedestrian detection remain a challenge that has yet to be effectively resolved. To address this issue, this paper proposes a Full-stage Refined Proposal (FRP) algorithm aimed at eliminating these false positives within a two-stage CNN-based pedestrian detection framework. The main innovation of this work lies in employing various pedestrian feature re-evaluation strategies to filter out low-quality pedestrian proposals during both the training and testing stages. Specifically, in the training phase, the Training mode FRP algorithm (TFRP) introduces a novel approach for validating pedestrian proposals to effectively guide the model training process, thereby constructing a model with strong capabilities for false positive suppression. During the inference phase, two innovative strategies are implemented: the Classifier-guided FRP (CFRP) algorithm integrates a pedestrian classifier into the proposal generation pipeline to yield high-quality proposals through pedestrian feature evaluation, and the Split-proposal FRP (SFRP) algorithm vertically divides all proposals, sending both the original and the sub-region proposals to the subsequent subnetwork to evaluate their confidence scores, filtering out those with lower sub-region pedestrian confidence scores. As a result, the proposed algorithm enhances the model's ability to suppress pedestrian false positives across all stages. Various experiments conducted on multiple benchmarks and the SY-Metro datasets demonstrate that the model, supported by different combinations of the FRP algorithm, can effectively eliminate false positives to varying extents. Furthermore, experiments conducted on embedded platforms underscore the algorithm's effectiveness in enhancing the comprehensive pedestrian detection capabilities of the small pedestrian detector in resource-constrained edge devices.

CVMay 24, 2024
A PST Algorithm for FPs Suppression in Two-stage CNN Detection Methods

Qiang Guo

Pedestrian detection has been a hot spot in computer vision over the past decades due to the wide spectrum of promising applications, the major challenge of which is False Positives (FPs) that occur during pedestrian detection. The emergence various Convolutional Neural Network-based detection strategies substantially enhance the pedestrian detection accuracy but still not well solve this problem. This paper deeply analysis the detection framework of the two-stage CNN detection methods and find out false positives in detection results is due to its training strategy miss classify some false proposals, thus weakens the classification capability of following subnetwork and hardly to suppress false ones. To solve this problem, This paper proposes a pedestrian-sensitive training algorithm to effectively help two-stage CNN detection methods learn to distinguish the pedestrian and non-pedestrian samples and suppress the false positives in final detection results. The core of the proposed training algorithm is to redesign the training proposal generating pipeline of the two-stage CNN detection methods, which can avoid a certain number of false ones that mislead its training process. With the help of the proposed algorithm, the detection accuracy of the MetroNext, an smaller and accurate metro passenger detector, is further improved, which further decreases false ones in its metro passengers detection results. Based on various challenging benchmark datasets, experiment results have demonstrated that feasibility of the proposed algorithm to improve pedestrian detection accuracy by removing the false positives. Compared with the competitors, MetroNext-PST demonstrates better overall prediction performance in accuracy, total number of parameters, and inference time, thus it can become a practical solution for hunting pedestrian tailored for mobile and edge devices.

CVJun 7, 2024
Semantic Segmentation on VSPW Dataset through Masked Video Consistency

Chen Liang, Qiang Guo, Chongkai Yu et al.

Pixel-level Video Understanding requires effectively integrating three-dimensional data in both spatial and temporal dimensions to learn accurate and stable semantic information from continuous frames. However, existing advanced models on the VSPW dataset have not fully modeled spatiotemporal relationships. In this paper, we present our solution for the PVUW competition, where we introduce masked video consistency (MVC) based on existing models. MVC enforces the consistency between predictions of masked frames where random patches are withheld. The model needs to learn the segmentation results of the masked parts through the context of images and the relationship between preceding and succeeding frames of the video. Additionally, we employed test-time augmentation, model aggeregation and a multimodal model-based post-processing method. Our approach achieves 67.27% mIoU performance on the VSPW dataset, ranking 2nd place in the PVUW2024 challenge VSS track.

CVMay 16, 2018
Multi-task Learning for Macromolecule Classification, Segmentation and Coarse Structural Recovery in Cryo-Tomography

Chang Liu, Xiangrui Zeng, Kaiwen Wang et al.

Cellular Electron Cryo-Tomography (CECT) is a powerful 3D imaging tool for studying the native structure and organization of macromolecules inside single cells. For systematic recognition and recovery of macromolecular structures captured by CECT, methods for several important tasks such as subtomogram classification and semantic segmentation have been developed. However, the recognition and recovery of macromolecular structures are still very difficult due to high molecular structural diversity, crowding molecular environment, and the imaging limitations of CECT. In this paper, we propose a novel multi-task 3D convolutional neural network model for simultaneous classification, segmentation, and coarse structural recovery of macromolecules of interest in subtomograms. In our model, the learned image features of one task are shared and thereby mutually reinforce the learning of other tasks. Evaluated on realistically simulated and experimental CECT data, our multi-task learning model outperformed all single-task learning methods for classification and segmentation. In addition, we demonstrate that our model can generalize to discover, segment and recover novel structures that do not exist in the training data.

QMApr 4, 2018
An integration of fast alignment and maximum-likelihood methods for electron subtomogram averaging and classification

Yixiu Zhao, Xiangrui Zeng, Qiang Guo et al.

Motivation: Cellular Electron CryoTomography (CECT) is an emerging 3D imaging technique that visualizes subcellular organization of single cells at submolecular resolution and in near-native state. CECT captures large numbers of macromolecular complexes of highly diverse structures and abundances. However, the structural complexity and imaging limits complicate the systematic de novo structural recovery and recognition of these macromolecular complexes. Efficient and accurate reference-free subtomogram averaging and classification represent the most critical tasks for such analysis. Existing subtomogram alignment based methods are prone to the missing wedge effects and low signal-to-noise ratio (SNR). Moreover, existing maximum-likelihood based methods rely on integration operations, which are in principle computationally infeasible for accurate calculation. Results: Built on existing works, we propose an integrated method, Fast Alignment Maximum Likelihood method (FAML), which uses fast subtomogram alignment to sample sub-optimal rigid transformations. The transformations are then used to approximate integrals for maximum-likelihood update of subtomogram averages through expectation-maximization algorithm. Our tests on simulated and experimental subtomograms showed that, compared to our previously developed fast alignment method (FA), FAML is significantly more robust to noise and missing wedge effects with moderate increases of computation cost.Besides, FAML performs well with significantly fewer input subtomograms when the FA method fails. Therefore, FAML can serve as a key component for improved construction of initial structural models from macromolecules captured by CECT.

QMJan 31, 2018
Feature Decomposition Based Saliency Detection in Electron Cryo-Tomograms

Bo Zhou, Qiang Guo, Xiangrui Zeng et al.

Electron Cryo-Tomography (ECT) allows 3D visualization of subcellular structures at the submolecular resolution in close to the native state. However, due to the high degree of structural complexity and imaging limits, the automatic segmentation of cellular components from ECT images is very difficult. To complement and speed up existing segmentation methods, it is desirable to develop a generic cell component segmentation method that is 1) not specific to particular types of cellular components, 2) able to segment unknown cellular components, 3) fully unsupervised and does not rely on the availability of training data. As an important step towards this goal, in this paper, we propose a saliency detection method that computes the likelihood that a subregion in a tomogram stands out from the background. Our method consists of four steps: supervoxel over-segmentation, feature extraction, feature matrix decomposition, and computation of saliency. The method produces a distribution map that represents the regions' saliency in tomograms. Our experiments show that our method can successfully label most salient regions detected by a human observer, and able to filter out regions not containing cellular components. Therefore, our method can remove the majority of the background region, and significantly speed up the subsequent processing of segmentation and recognition of cellular components captured by ECT.

CVJun 15, 2016
High-speed real-time single-pixel microscopy based on Fourier sampling

Qiang Guo, Hongwei Chen, Yuxi Wang et al.

Single-pixel cameras based on the concepts of compressed sensing (CS) leverage the inherent structure of images to retrieve them with far fewer measurements and operate efficiently over a significantly broader spectral range than conventional silicon-based cameras. Recently, photonic time-stretch (PTS) technique facilitates the emergence of high-speed single-pixel cameras. A significant breakthrough in imaging speed of single-pixel cameras enables observation of fast dynamic phenomena. However, according to CS theory, image reconstruction is an iterative process that consumes enormous amounts of computational time and cannot be performed in real time. To address this challenge, we propose a novel single-pixel imaging technique that can produce high-quality images through rapid acquisition of their effective spatial Fourier spectrum. We employ phase-shifting sinusoidal structured illumination instead of random illumination for spectrum acquisition and apply inverse Fourier transform to the obtained spectrum for image restoration. We evaluate the performance of our prototype system by recognizing quick response (QR) codes and flow cytometric screening of cells. A frame rate of 625 kHz and a compression ratio of 10% are experimentally demonstrated in accordance with the recognition rate of the QR code. An imaging flow cytometer enabling high-content screening with an unprecedented throughput of 100,000 cells/s is also demonstrated. For real-time imaging applications, the proposed single-pixel microscope can significantly reduce the time required for image reconstruction by two orders of magnitude, which can be widely applied in industrial quality control and label-free biomedical imaging.

DBMar 24, 2016
Web Data Knowledge Extraction

Juan M. Tirado, Ovidiu Serban, Qiang Guo et al.

A constantly growing amount of information is available through the web. Unfortunately, extracting useful content from this massive amount of data still remains an open issue. The lack of standard data models and structures forces developers to create adhoc solutions from the scratch. The figure of the expert is still needed in many situations where developers do not have the correct background knowledge. This forces developers to spend time acquiring the needed background from the expert. In other directions, there are promising solutions employing machine learning techniques. However, increasing accuracy requires an increase in system complexity that cannot be endured in many projects. In this work, we approach the web knowledge extraction problem using an expertcentric methodology. This methodology defines a set of configurable, extendible and independent components that permit the reutilisation of large pieces of code among projects. Our methodology differs from similar solutions in its expert-driven design. This design, makes it possible for subject-matter expert to drive the knowledge extraction for a given set of documents. Additionally, we propose the utilization of machine assisted solutions that guide the expert during this process. To demonstrate the capabilities of our methodology, we present a real use case scenario in which public procurement data is extracted from the web-based repositories of several public institutions across Europe. We provide insightful details about the challenges we had to deal with in this use case and additional discussions about how to apply our methodology.

CVJul 30, 2015
Agglomerative clustering and collectiveness measure via exponent generating function

Wei-Ya Ren, Shuo-Hao Li, Qiang Guo et al.

The key in agglomerative clustering is to define the affinity measure between two sets. A novel agglomerative clustering method is proposed by utilizing the path integral to define the affinity measure. Firstly, the path integral descriptor of an edge, a node and a set is computed by path integral and exponent generating function. Then, the affinity measure between two sets is obtained by path integral descriptor of sets. Several good properties of the path integral descriptor is proposed in this paper. In addition, we give the physical interpretation of the proposed path integral descriptor of a set. The proposed path integral descriptor of a set can be regard as the collectiveness measure of a set, which can be a moving system such as human crowd, sheep herd and so on. Self-driven particle (SDP) model is used to test the ability of the proposed method in measuring collectiveness.

IRJul 24, 2014
Ultra accurate collaborative information filtering via directed user similarity

Qiang Guo, Wen-Jun Song, Jian-Guo Liu

A key challenge of the collaborative filtering (CF) information filtering is how to obtain the reliable and accurate results with the help of peers' recommendation. Since the similarities from small-degree users to large-degree users would be larger than the ones opposite direction, the large-degree users' selections are recommended extensively by the traditional second-order CF algorithms. By considering the users' similarity direction and the second-order correlations to depress the influence of mainstream preferences, we present the directed second-order CF (HDCF) algorithm specifically to address the challenge of accuracy and diversity of the CF algorithm. The numerical results for two benchmark data sets, MovieLens and Netflix, show that the accuracy of the new algorithm outperforms the state-of-the-art CF algorithms. Comparing with the CF algorithm based on random-walks proposed in the Ref.7, the average ranking score could reach 0.0767 and 0.0402, which is enhanced by 27.3\% and 19.1\% for MovieLens and Netflix respectively. In addition, the diversity, precision and recall are also enhanced greatly. Without relying on any context-specific information, tuning the similarity direction of CF algorithms could obtain accurate and diverse recommendations. This work suggests that the user similarity direction is an important factor to improve the personalized recommendation performance.

DATA-ANJan 30, 2012
Solving the accuracy-diversity dilemma via directed random walks

Jian-Guo Liu, Kerui Shi, Qiang Guo

Random walks have been successfully used to measure user or object similarities in collaborative filtering (CF) recommender systems, which is of high accuracy but low diversity. A key challenge of CF system is that the reliably accurate results are obtained with the help of peers' recommendation, but the most useful individual recommendations are hard to be found among diverse niche objects. In this paper we investigate the direction effect of the random walk on user similarity measurements and find that the user similarity, calculated by directed random walks, is reverse to the initial node's degree. Since the ratio of small-degree users to large-degree users is very large in real data sets, the large-degree users' selections are recommended extensively by traditional CF algorithms. By tuning the user similarity direction from neighbors to the target user, we introduce a new algorithm specifically to address the challenge of diversity of CF and show how it can be used to solve the accuracy-diversity dilemma. Without relying on any context-specific information, we are able to obtain accurate and diverse recommendations, which outperforms the state-of-the-art CF methods. This work suggests that the random walk direction is an important factor to improve the personalized recommendation performance.