h-index23
43papers
1,733citations
Novelty48%
AI Score57

43 Papers

CVMay 27Code
ViCA: Efficient Multimodal LLMs with Vision-Only Cross-Attention

Wenjie Liu, Hao Wu, Xin Qiu et al.

Modern multimodal large language models (MLLMs) adopt a unified self-attention design that processes visual and textual tokens at every Transformer layer, incurring substantial computational overhead. In this work, we revisit the necessity of such dense visual processing and show that projected visual embeddings are already well-aligned with the language space, while effective vision-language interaction occurs in only a small subset of layers. Based on these insights, we propose ViCA (Vision-only Cross-Attention), a minimal MLLM architecture in which visual tokens bypass all self-attention and feed-forward layers, interacting with text solely through sparse cross-attention at selected layers. Extensive evaluations across three MLLM backbones, nine multimodal benchmarks, and 26 pruning-based baselines show that ViCA preserves 98% of baseline accuracy while reducing visual-side computation to 4%, consistently achieving superior performance-efficiency trade-offs. Moreover, ViCA provides a regular, hardware-friendly inference pipeline that yields over 3.5x speedup in single-batch inference and over 10x speedup in multi-batch inference, reducing visual grounding to near-zero overhead compared with text-only LLMs. It is also orthogonal to token pruning methods and can be seamlessly combined for further efficiency gains. Our code is available at https://github.com/EIT-NLP/ViCA.

CLDec 2, 2022
SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition

Yichong Leng, Xu Tan, Wenjie Liu et al. · microsoft-research

Error correction in automatic speech recognition (ASR) aims to correct those incorrect words in sentences generated by ASR models. Since recent ASR models usually have low word error rate (WER), to avoid affecting originally correct tokens, error correction models should only modify incorrect words, and therefore detecting incorrect words is important for error correction. Previous works on error correction either implicitly detect error words through target-source attention or CTC (connectionist temporal classification) loss, or explicitly locate specific deletion/substitution/insertion errors. However, implicit error detection does not provide clear signal about which tokens are incorrect and explicit error detection suffers from low detection accuracy. In this paper, we propose SoftCorrect with a soft error detection mechanism to avoid the limitations of both explicit and implicit error detection. Specifically, we first detect whether a token is correct or not through a probability produced by a dedicatedly designed language model, and then design a constrained CTC loss that only duplicates the detected incorrect tokens to let the decoder focus on the correction of error tokens. Compared with implicit error detection with CTC loss, SoftCorrect provides explicit signal about which words are incorrect and thus does not need to duplicate every token but only incorrect tokens; compared with explicit error detection, SoftCorrect does not detect specific deletion/substitution/insertion errors but just leaves it to CTC loss. Experiments on AISHELL-1 and Aidatatang datasets show that SoftCorrect achieves 26.1% and 9.4% CER reduction respectively, outperforming previous works by a large margin, while still enjoying fast speed of parallel generation.

CVJan 11, 2023
Elevation Estimation-Driven Building 3D Reconstruction from Single-View Remote Sensing Imagery

Yongqiang Mao, Kaiqiang Chen, Liangjin Zhao et al.

Building 3D reconstruction from remote sensing images has a wide range of applications in smart cities, photogrammetry and other fields. Methods for automatic 3D urban building modeling typically employ multi-view images as input to algorithms to recover point clouds and 3D models of buildings. However, such models rely heavily on multi-view images of buildings, which are time-intensive and limit the applicability and practicality of the models. To solve these issues, we focus on designing an efficient DSM estimation-driven reconstruction framework (Building3D), which aims to reconstruct 3D building models from the input single-view remote sensing image. First, we propose a Semantic Flow Field-guided DSM Estimation (SFFDE) network, which utilizes the proposed concept of elevation semantic flow to achieve the registration of local and global features. Specifically, in order to make the network semantics globally aware, we propose an Elevation Semantic Globalization (ESG) module to realize the semantic globalization of instances. Further, in order to alleviate the semantic span of global features and original local features, we propose a Local-to-Global Elevation Semantic Registration (L2G-ESR) module based on elevation semantic flow. Our Building3D is rooted in the SFFDE network for building elevation prediction, synchronized with a building extraction network for building masks, and then sequentially performs point cloud reconstruction, surface reconstruction (or CityGML model reconstruction). On this basis, our Building3D can optionally generate CityGML models or surface mesh models of the buildings. Extensive experiments on ISPRS Vaihingen and DFC2019 datasets on the DSM estimation task show that our SFFDE significantly improves upon state-of-the-arts. Furthermore, our Building3D achieves impressive results in the 3D point cloud and 3D model reconstruction process.

NANov 9, 2018
Optimal error estimates for Chebyshev approximations of functions with limited regularity in fractional Sobolev-type spaces

Wenjie Liu, Li-Lian Wang, Huiyuan Li

In this paper, we introduce a new theoretical framework built upon fractional Sobolev-type spaces involving Riemann-Liouville (RL) fractional integrals/derivatives, which is naturally arisen from exact representations of Chebyshev expansion coefficients, for optimal error estimates of Chebyshev approximations to functions with limited regularity. The essential pieces of the puzzle for the error analysis include (i) fractional integration by parts (under the weakest possible conditions), and (ii) generalised Gegenbauer functions of fractional degree (GGF-Fs): a new family of special functions with notable fractional calculus properties. Under this framework, we are able to estimate the optimal decay rate of Chebyshev expansion coefficients for a large class of functions with interior and endpoint singularities, which are deemed suboptimal or complicated to characterize in existing literature. We can then derive optimal error estimates for spectral expansions and the related Chebyshev interpolation and quadrature measured in various norms, and also improve the available results in usual Sobolev spaces of integer regularity exponentials in several senses. As a by-product, this study results in some analytically perspicuous formulas particularly on GGF-Fs, which are potentially useful in spectral algorithms. The idea and analysis techniques can be extended to general Jacobi spectral approximations.

CVOct 1, 2023
Quantum image edge detection based on eight-direction Sobel operator for NEQR

Wenjie Liu, Lu Wang

Quantum Sobel edge detection (QSED) is a kind of algorithm for image edge detection using quantum mechanism, which can solve the real-time problem encountered by classical algorithms. However, the existing QSED algorithms only consider two- or four-direction Sobel operator, which leads to a certain loss of edge detail information in some high-definition images. In this paper, a novel QSED algorithm based on eight-direction Sobel operator is proposed, which not only reduces the loss of edge information, but also simultaneously calculates eight directions' gradient values of all pixel in a quantum image. In addition, the concrete quantum circuits, which consist of gradient calculation, non-maximum suppression, double threshold detection and edge tracking units, are designed in details. For a 2^n x 2^n image with q gray scale, the complexity of our algorithm can be reduced to O(n^2 + q^2), which is lower than other existing classical or quantum algorithms. And the simulation experiment demonstrates that our algorithm can detect more edge information, especially diagonal edges, than the two- and four-direction QSED algorithms.

NANov 19, 2018
Uniform bounds and asymptotics of Generalized Gegenbauer functions of fractional degree

Wenjie Liu, Li-Lian Wang

The generalised Gegenbauer functions of fractional degree (GGF-Fs), denoted by ${}^{r\!}G^{(λ)}_ν(x)$ (right GGF-Fs) and ${}^{l}G^{(λ)}_ν(x)$ (left GGF-Fs) with $x\in (-1,1),$ $λ>-1/2$ and real $ν\ge 0,$ are special functions (usually non-polynomials), which are defined upon the hypergeometric representation of the classical Gegenbauer polynomial by allowing integer degree to be real fractional degree. Remarkably, the GGF-Fs become indispensable for optimal error estimates of polynomial approximation to singular functions, and have intimate relations with several families of nonstandard basis functions recently introduced for solving fractional differential equations. However, some properties of GGF-Fs, which are important pieces for the analysis and applications, are unknown or under explored. The purposes of this paper are twofold. The first is to show that for $λ,ν>0$ and $x=\cosθ$ with $θ\in (0,π),$ \begin{equation*}\label{IntRep-0N} (\sin φ)^λ\,{}^{r\!}G_ν^{(λ)}(\cos φ)= \frac{2^λΓ(λ+1/2)}{\sqrtπ {(ν+λ)^λ}} \, {\cos ((ν+λ)φ- λπ/2)} +{\mathcal R}_ν^{(λ)} (φ), \end{equation*} and derive the precise expression of the "residual" term ${\mathcal R}_ν^{(λ)} (φ).$ With this at our disposal, we obtain the bounds of GGF-Fs uniform in $ν.$ Under an appropriate weight function, the bounds are uniform for $θ\in [0,π]$ as well. Moreover, we can study the asymptotics of GGF-Fs with large fractional degree $ν.$ The second is to present miscellaneous properties of GGF-Fs for better understanding of this family of useful special functions.

CLNov 23, 2022
Mask the Correct Tokens: An Embarrassingly Simple Approach for Error Correction

Kai Shen, Yichong Leng, Xu Tan et al.

Text error correction aims to correct the errors in text sequences such as those typed by humans or generated by speech recognition models. Previous error correction methods usually take the source (incorrect) sentence as encoder input and generate the target (correct) sentence through the decoder. Since the error rate of the incorrect sentence is usually low (e.g., 10\%), the correction model can only learn to correct on limited error tokens but trivially copy on most tokens (correct tokens), which harms the effective training of error correction. In this paper, we argue that the correct tokens should be better utilized to facilitate effective training and then propose a simple yet effective masking strategy to achieve this goal. Specifically, we randomly mask out a part of the correct tokens in the source sentence and let the model learn to not only correct the original error tokens but also predict the masked tokens based on their context information. Our method enjoys several advantages: 1) it alleviates trivial copy; 2) it leverages effective training signals from correct tokens; 3) it is a plug-and-play module and can be applied to different models and tasks. Experiments on spelling error correction and speech recognition error correction on Mandarin datasets and grammar error correction on English datasets with both autoregressive and non-autoregressive generation models show that our method improves the correction accuracy consistently.

CVDec 6, 2022
An advanced YOLOv3 method for small object detection

Baokai Liu, Fengjie He, Shiqiang Du et al.

Small object detection has important application value in the fields of autonomous driving and drone scene analysis. As one of the most advanced object detection algorithms, YOLOv3 suffers some challenges when detecting small objects, such as the problem of detection failure of small objects and occluded objects. To solve these problems, an improved YOLOv3 algorithm for small object detection is proposed. In the proposed method, the dilated convolutions mish (DCM) module is introduced into the backbone network of YOLOv3 to improve the feature expression ability by fusing the feature maps of different receptive fields. In the neck network of YOLOv3, the convolutional block attention module (CBAM) and multi-level fusion module are introduced to select the important information for small object detection in the shallow network, suppress the uncritical information, and use the fusion module to fuse the feature maps of different scales, so as to improve the detection accuracy of the algorithm. In addition, the Soft-NMS and Complete-IoU (CloU) strategies are applied to candidate frame screening, which improves the accuracy of the algorithm for the detection of occluded objects. The ablation experiment of the MS COCO2017 object detection task proves the effectiveness of several modules introduced in this paper for small object detection. The experimental results on the MS COCO2017, VOC2007, and VOC2012 datasets show that the Average Precision (AP) of this method is 16.5%, 8.71%, and 9.68% higher than that of YOLOv3, respectively.

QUANT-PHOct 2, 2023
Quantum Image Segmentation Based on Grayscale Morphology

Wenjie Liu, Lu Wang, Mengmeng Cui

The classical image segmentation algorithm based on grayscale morphology can effectively segment images with uneven illumination, but with the increase of the image data, the real-time problem will emerge. In order to solve this problem, a quantum image segmentation algorithm is proposed in this paper, which can use quantum mechanism to simultaneously perform morphological operations on all pixels in a grayscale image, and then quickly segment the image into a binary image. In addition, several quantum circuit units, including dilation, erosion, bottom hat transformation, top hat transformation, etc., are designed in detail, and then they are combined together to construct the complete quantum circuits for segmenting the NEQR images. For a 2^n * 2^n image with q grayscale levels, the complexity of our algorithm can be reduced to O(n^2+q), which is an exponential speedup than the classic counterparts. Finally, the experiment is conducted on IBM Q to show the feasibility of our algorithm in the noisy intermediate-scale quantum (NISQ) era.

LGOct 9, 2022
Coresets for Wasserstein Distributionally Robust Optimization Problems

Ruomin Huang, Jiawei Huang, Wenjie Liu et al.

Wasserstein distributionally robust optimization (\textsf{WDRO}) is a popular model to enhance the robustness of machine learning with ambiguous data. However, the complexity of \textsf{WDRO} can be prohibitive in practice since solving its ``minimax'' formulation requires a great amount of computation. Recently, several fast \textsf{WDRO} training algorithms for some specific machine learning tasks (e.g., logistic regression) have been developed. However, the research on designing efficient algorithms for general large-scale \textsf{WDRO}s is still quite limited, to the best of our knowledge. \textit{Coreset} is an important tool for compressing large dataset, and thus it has been widely applied to reduce the computational complexities for many optimization problems. In this paper, we introduce a unified framework to construct the $ε$-coreset for the general \textsf{WDRO} problems. Though it is challenging to obtain a conventional coreset for \textsf{WDRO} due to the uncertainty issue of ambiguous data, we show that we can compute a ``dual coreset'' by using the strong duality property of \textsf{WDRO}. Also, the error introduced by the dual coreset can be theoretically guaranteed for the original \textsf{WDRO} objective. To construct the dual coreset, we propose a novel grid sampling approach that is particularly suitable for the dual formulation of \textsf{WDRO}. Finally, we implement our coreset approach and illustrate its effectiveness for several \textsf{WDRO} problems in the experiments.

QUANT-PHOct 2, 2023
An improved two-threshold quantum segmentation algorithm for NEQR image

Lu Wang, Zhiliang Deng, Wenjie Liu

The quantum image segmentation algorithm is to divide a quantum image into several parts, but most of the existing algorithms use more quantum resource(qubit) or cannot process the complex image. In this paper, an improved two-threshold quantum segmentation algorithm for NEQR image is proposed, which can segment the complex gray-scale image into a clear ternary image by using fewer qubits and can be scaled to use n thresholds for n + 1 segmentations. In addition, a feasible quantum comparator is designed to distinguish the gray-scale values with two thresholds, and then a scalable quantum circuit is designed to segment the NEQR image. For a 2^(n)*2^(n) image with q gray-scale levels, the quantum cost of our algorithm can be reduced to 60q-6, which is lower than other existing quantum algorithms and does not increase with the image's size increases. The experiment on IBM Q demonstrates that our algorithm can effectively segment the image.

QUANT-PHOct 2, 2023
A quantum segmentation algorithm based on local adaptive threshold for NEQR image

Lu Wang, Wenjie Liu

The classical image segmentation algorithm based on local adaptive threshold can effectively segment images with uneven illumination, but with the increase of the image data, the real-time problem gradually emerges. In this paper, a quantum segmentation algorithm based on local adaptive threshold for NEQR image is proposed, which can use quantum mechanism to simultaneously compute local thresholds for all pixels in a gray-scale image and quickly segment the image into a binary image. In addition, several quantum circuit units, including median calculation, quantum binarization, etc. are designed in detail, and then a complete quantum circuit is designed to segment NEQR images by using fewer qubits and quantum gates. For a $2^n\times 2^n$ image with q gray-scale levels, the complexity of our algorithm can be reduced to $O(n^2+q)$, which is an exponential speedup compared to the classic counterparts. Finally, the experiment is conducted on IBM Q to show the feasibility of our algorithm in the noisy intermediate-scale quantum (NISQ) era.

IVSep 24, 2023
Solving Low-Dose CT Reconstruction via GAN with Local Coherence

Wenjie Liu

The Computed Tomography (CT) for diagnosis of lesions in human internal organs is one of the most fundamental topics in medical imaging. Low-dose CT, which offers reduced radiation exposure, is preferred over standard-dose CT, and therefore its reconstruction approaches have been extensively studied. However, current low-dose CT reconstruction techniques mainly rely on model-based methods or deep-learning-based techniques, which often ignore the coherence and smoothness for sequential CT slices. To address this issue, we propose a novel approach using generative adversarial networks (GANs) with enhanced local coherence. The proposed method can capture the local coherence of adjacent images by optical flow, which yields significant improvements in the precision and stability of the constructed images. We evaluate our proposed method on real datasets and the experimental results suggest that it can outperform existing state-of-the-art reconstruction approaches significantly.

QUANT-PHSep 23, 2023
A Unitary Weights Based One-Iteration Quantum Perceptron Algorithm for Non-Ideal Training Sets

Wenjie Liu, Peipei Gao, Yuxiang Wang et al.

In order to solve the problem of non-ideal training sets (i.e., the less-complete or over-complete sets) and implement one-iteration learning, a novel efficient quantum perceptron algorithm based on unitary weights is proposed, where the singular value decomposition of the total weight matrix from the training set is calculated to make the weight matrix to be unitary. The example validation of quantum gates {H, S, T, CNOT, Toffoli, Fredkin} shows that our algorithm can accurately implement arbitrary quantum gates within one iteration. The performance comparison between our algorithm and other quantum perceptron algorithms demonstrates the advantages of our algorithm in terms of applicability, accuracy, and availability. For further validating the applicability of our algorithm, a quantum composite gate which consists of several basic quantum gates is also illustrated.

CVJan 28, 2023
Towards Accurate Acne Detection via Decoupled Sequential Detection Head

Xin Wei, Lei Zhang, Jianwei Zhang et al.

Accurate acne detection plays a crucial role in acquiring precise diagnosis and conducting proper therapy. However, the ambiguous boundaries and arbitrary dimensions of acne lesions severely limit the performance of existing methods. In this paper, we address these challenges via a novel Decoupled Sequential Detection Head (DSDH), which can be easily adopted by mainstream two-stage detectors. DSDH brings two simple but effective improvements to acne detection. Firstly, the offset and scaling tasks are explicitly introduced, and their incompatibility is settled by our task-decouple mechanism, which improves the capability of predicting the location and size of acne lesions. Second, we propose the task-sequence mechanism, and execute offset and scaling sequentially to gain a more comprehensive insight into the dimensions of acne lesions. In addition, we build a high-quality acne detection dataset named ACNE-DET to verify the effectiveness of DSDH. Experiments on ACNE-DET and the public benchmark ACNE04 show that our method outperforms the state-of-the-art methods by significant margins. Our code and dataset are publicly available at (temporarily anonymous).

ETSep 30, 2023
A quantum system control method based on enhanced reinforcement learning

Wenjie Liu, Bosi Wang, Jihao Fan et al.

Traditional quantum system control methods often face different constraints, and are easy to cause both leakage and stochastic control errors under the condition of limited resources. Reinforcement learning has been proved as an efficient way to complete the quantum system control task. To learn a satisfactory control strategy under the condition of limited resources, a quantum system control method based on enhanced reinforcement learning (QSC-ERL) is proposed. The states and actions in reinforcement learning are mapped to quantum states and control operations in quantum systems. By using new enhanced neural networks, reinforcement learning can quickly achieve the maximization of long-term cumulative rewards, and a quantum state can be evolved accurately from an initial state to a target state. According to the number of candidate unitary operations, the three-switch control is used for simulation experiments. Compared with other methods, the QSC-ERL achieves close to 1 fidelity learning control of quantum systems, and takes fewer episodes to quantum state evolution under the condition of limited resources.

CVSep 7, 2022
A Data-dependent Approach for High Dimensional (Robust) Wasserstein Alignment

Hu Ding, Wenjie Liu, Mingquan Ye

Many real-world problems can be formulated as the alignment between two geometric patterns. Previously, a great amount of research focus on the alignment of 2D or 3D patterns in the field of computer vision. Recently, the alignment problem in high dimensions finds several novel applications in practice. However, the research is still rather limited in the algorithmic aspect. To the best of our knowledge, most existing approaches are just simple extensions of their counterparts for 2D and 3D cases, and often suffer from the issues such as high computational complexities. In this paper, we propose an effective framework to compress the high dimensional geometric patterns. Any existing alignment method can be applied to the compressed geometric patterns and the time complexity can be significantly reduced. Our idea is inspired by the observation that high dimensional data often has a low intrinsic dimension. Our framework is a ``data-dependent'' approach that has the complexity depending on the intrinsic dimension of the input data. Our experimental results reveal that running the alignment algorithm on compressed patterns can achieve similar qualities, comparing with the results on the original patterns, but the runtimes (including the times cost for compression) are substantially lower.

LGOct 1, 2023
Quantum-Based Feature Selection for Multi-classification Problem in Complex Systems with Edge Computing

Wenjie Liu, Junxiu Chen, Yuxiang Wang et al.

The complex systems with edge computing require a huge amount of multi-feature data to extract appropriate insights for their decision making, so it is important to find a feasible feature selection method to improve the computational efficiency and save the resource consumption. In this paper, a quantum-based feature selection algorithm for the multi-classification problem, namely, QReliefF, is proposed, which can effectively reduce the complexity of algorithm and improve its computational efficiency. First, all features of each sample are encoded into a quantum state by performing operations CMP and R_y, and then the amplitude estimation is applied to calculate the similarity between any two quantum states (i.e., two samples). According to the similarities, the Grover-Long method is utilized to find the nearest k neighbor samples, and then the weight vector is updated. After a certain number of iterations through the above process, the desired features can be selected with regards to the final weight vector and the threshold τ. Compared with the classical ReliefF algorithm, our algorithm reduces the complexity of similarity calculation from O(MN) to O(M), the complexity of finding the nearest neighbor from O(M) to O(sqrt(M)), and resource consumption from O(MN) to O(MlogN). Meanwhile, compared with the quantum Relief algorithm, our algorithm is superior in finding the nearest neighbor, reducing the complexity from O(M) to O(sqrt(M)). Finally, in order to verify the feasibility of our algorithm, a simulation experiment based on Rigetti with a simple example is performed.

QUANT-PHSep 29, 2023
A Quantum States Preparation Method Based on Difference-Driven Reinforcement Learning

Wenjie Liu, Jing Xu, Bosi Wang

Due to the large state space of the two-qubit system, and the adoption of ladder reward function in the existing quantum state preparation methods, the convergence speed is slow and it is difficult to prepare the desired target quantum state with high fidelity under limited conditions. To solve the above problems, a difference-driven reinforcement learning (RL) algorithm for quantum state preparation of two-qubit system is proposed by improving the reward function and action selection strategy. Firstly, a model is constructed for the problem of preparing quantum states of a two-qubit system, with restrictions on the type of quantum gates and the time for quantum state evolution. In the preparation process, a weighted differential dynamic reward function is designed to assist the algorithm quickly obtain the maximum expected cumulative reward. Then, an adaptive e-greedy action selection strategy is adopted to achieve a balance between exploration and utilization to a certain extent, thereby improving the fidelity of the final quantum state. The simulation results show that the proposed algorithm can prepare quantum state with high fidelity under limited conditions. Compared with other algorithms, it has different degrees of improvement in convergence speed and fidelity of the final quantum state.

OCMay 21
Output regulation via input-output data

Andrea Bisoffi, Wenjie Liu, Zhongjie Hu et al.

From a multi-input-multi-output (MIMO) discrete-time linear system, we collect input-output data affected by noise in the form of an unknown exosignal and, from these data points (without knowledge of the system model), we design a feedback controller that asymptotically annihilates the effect of that exosignal on the output. This amounts to solving an output regulation problem purely from input-output data, for MIMO linear systems. The design of the controller corresponds to a semidefinite program and is pursued on a suitable auxiliary system. Such design carries over from the auxiliary system to the original one by a rigorous examination of the relation between the solutions of the two systems.

LGSep 30, 2023
A hybrid quantum-classical conditional generative adversarial network algorithm for human-centered paradigm in cloud

Wenjie Liu, Ying Zhang, Zhiliang Deng et al.

As an emerging field that aims to bridge the gap between human activities and computing systems, human-centered computing (HCC) in cloud, edge, fog has had a huge impact on the artificial intelligence algorithms. The quantum generative adversarial network (QGAN) is considered to be one of the quantum machine learning algorithms with great application prospects, which also should be improved to conform to the human-centered paradigm. The generation process of QGAN is relatively random and the generated model does not conform to the human-centered concept, so it is not quite suitable for real scenarios. In order to solve these problems, a hybrid quantum-classical conditional generative adversarial network (QCGAN) algorithm is proposed, which is a knowledge-driven human-computer interaction computing mode that can be implemented in cloud. The purposes of stabilizing the generation process and realizing the interaction between human and computing process are achieved by inputting artificial conditional information in the generator and discriminator. The generator uses the parameterized quantum circuit with an all-to-all connected topology, which facilitates the tuning of network parameters during the training process. The discriminator uses the classical neural network, which effectively avoids the "input bottleneck" of quantum machine learning. Finally, the BAS training set is selected to conduct experiment on the quantum cloud computing platform. The result shows that the QCGAN algorithm can effectively converge to the Nash equilibrium point after training and perform human-centered classification generation tasks.

CVFeb 9
Analysis of Converged 3D Gaussian Splatting Solutions: Density Effects and Prediction Limit

Zhendong Wang, Cihan Ruan, Jingchuan Xiao et al.

We investigate what structure emerges in 3D Gaussian Splatting (3DGS) solutions from standard multi-view optimization. We term these Rendering-Optimal References (RORs) and analyze their statistical properties, revealing stable patterns: mixture-structured scales and bimodal radiance across diverse scenes. To understand what determines these parameters, we apply learnability probes by training predictors to reconstruct RORs from point clouds without rendering supervision. Our analysis uncovers fundamental density-stratification. Dense regions exhibit geometry-correlated parameters amenable to render-free prediction, while sparse regions show systematic failure across architectures. We formalize this through variance decomposition, demonstrating that visibility heterogeneity creates covariance-dominated coupling between geometric and appearance parameters in sparse regions. This reveals the dual character of RORs: geometric primitives where point clouds suffice, and view synthesis primitives where multi-view constraints are essential. We provide density-aware strategies that improve training robustness and discuss architectural implications for systems that adaptively balance feed-forward prediction and rendering-based refinement.

QUANT-PHAug 21, 2023
A Block-Ring connected Topology of Parameterized Quantum Circuits

Wenjie Liu, Qingshan Wu

It is essential to select efficient topology of parameterized quantum circuits (PQCs) in variational quantum algorithms (VQAs). However, there are problems in current circuits, i.e. optimization difficulties caused by too many parameters or performance is hard to guarantee. How to reduce the number of parameters (number of single-qubit rotation gates and 2-qubit gates) in PQCs without reducing the performance has become a new challenge. To solve this problem, we propose a novel topology, called Block-Ring (BR) topology, to construct the PQCs. This topology allocate all qubits to several blocks, all-to-all mode is adopt inside each block and ring mode is applied to connect different blocks. Compared with the pure all-to-all topology circuits which own the best power, BR topology have similar performance and the number of parameters and 2-qubit gate reduced from 0(n^2) to 0(mn) , m is a hyperparameter set by ourselves. Besides, we compared BR topology with other topology circuits in terms of expressibility and entangling capability. Considering the effects of different 2-qubit gates on circuits, we also make a distinction between controlled X-rotation gates and controlled Z-rotation gates. Finally, the 1- and 2-layer configurations of PQCs are taken into consideration as well, which shows the BR's performance improvement in the condition of multilayer circuits.

LGNov 11, 2025
Dual-Kernel Graph Community Contrastive Learning

Xiang Chen, Kun Yue, Wenjie Liu et al.

Graph Contrastive Learning (GCL) has emerged as a powerful paradigm for training Graph Neural Networks (GNNs) in the absence of task-specific labels. However, its scalability on large-scale graphs is hindered by the intensive message passing mechanism of GNN and the quadratic computational complexity of contrastive loss over positive and negative node pairs. To address these issues, we propose an efficient GCL framework that transforms the input graph into a compact network of interconnected node sets while preserving structural information across communities. We firstly introduce a kernelized graph community contrastive loss with linear complexity, enabling effective information transfer among node sets to capture hierarchical structural information of the graph. We then incorporate a knowledge distillation technique into the decoupled GNN architecture to accelerate inference while maintaining strong generalization performance. Extensive experiments on sixteen real-world datasets of varying scales demonstrate that our method outperforms state-of-the-art GCL baselines in both effectiveness and scalability.

CVOct 1, 2023
A quantum moving target segmentation algorithm for grayscale video

Wenjie Liu, Lu Wang, Qingshan Wu

The moving target segmentation (MTS) aims to segment out moving targets in the video, however, the classical algorithm faces the huge challenge of real-time processing in the current video era. Some scholars have successfully demonstrated the quantum advantages in some video processing tasks, but not concerning moving target segmentation. In this paper, a quantum moving target segmentation algorithm for grayscale video is proposed, which can use quantum mechanism to simultaneously calculate the difference of all pixels in all adjacent frames and then quickly segment out the moving target. In addition, a feasible quantum comparator is designed to distinguish the grayscale values with the threshold. Then several quantum circuit units, including three-frame difference, binarization and AND operation, are designed in detail, and then are combined together to construct the complete quantum circuits for segmenting the moving target. For a quantum video with $2^m$ frames (every frame is a $2^n\times 2^n$ image with $q$ grayscale levels), the complexity of our algorithm can be reduced to O$(n^2 + q)$. Compared with the classic counterpart, it is an exponential speedup, while its complexity is also superior to the existing quantum algorithms. Finally, the experiment is conducted on IBM Q to show the feasibility of our algorithm in the noisy intermediate-scale quantum (NISQ) era.

NAMay 11
PCELM: Perturbation-Correction Extreme Learning Machine for the Stefan problem

Wenjie Liu, Siyuan Lang, Zhiyue Zhang

For Stefan problems, characterized by moving boundaries and discontinuous coefficients due to phase changes, the inherent nonconvexity of the objective functional frequently causes optimization difficulty in randomized neural network approximations; to address this, we propose a Perturbation-Correction Extreme Learning Machine (PCELM) framework, built upon the extreme learning machine framework. This method first establishes a basic approximation during an initialization step by minimizing the original nonconvex residual, typically achieving only moderate accuracy, and then, in a subsequent correction step, determines a correction term by solving a subproblem based on a perturbation expansion around this basic approximation, thereby transforming it into a convex optimization problem for the output coefficients that ensures rapid convergence. We further provide a rigorous a convexity analysis, demonstrating that PCELM method solves a convex sub-problem. Numerical experiments on various Stefan problems, including multi-phase and multi-dimensional Stefan problems, confirm that the proposed PCELM method successfully overcomes optimization plateaus, with the correction step consistently delivering a significant improvement of 2-6 orders of magnitude in the relative L2 accuracy.

ROMar 13
Safety-guaranteed and Goal-oriented Semantic Sensing, Communication, and Control for Robotics

Wenchao Wu, Shutong Chen, Wenjie Liu et al.

Wirelessly-connected robotic system empowers robots with real-time intelligence by leveraging remote computing resources for decision-making. However, the data exchange between robots and base stations often overwhelms communication links, introducing latency that undermines real-time response. To tackle this, goal-oriented semantic communication (GSC) has been introduced into wirelessly-connected robotic systems to extract and transmit only goal-relevant semantic representations, enhancing communication efficiency and task effectiveness. However, existing GSC approaches focused primarily on optimizing effectiveness metrics while overlooking safety requirements, which should be treated as the top priority in real-world robotic systems. To bridge this gap, we propose safety-guaranteed and goal-oriented semantic communication for wirelessly-connected robotic system, aiming to maximize the robotic task effectiveness subject to practical operational safety requirements. We first summarize the general safety requirements and effectiveness metrics across typical robotic tasks, including robot arm grasping, unmanned aerial vehicle (UAV)-assisted tasks, and multi-robot exploration. We then systematically analyze the unique safety and effectiveness challenges faced by wirelessly-connected robotic system in sensing, communication, and control. Based on these, we further present potential safety-guaranteed and goal-oriented sensing, communication, and control solutions. Finally, a UAV target tracking case study validates that our proposed GSC solutions can significantly improve safety rate and tracking success rate by more than 2 times and 4.5 times, respectively.

CVJul 2, 2025Code
DocShaDiffusion: Diffusion Model in Latent Space for Document Image Shadow Removal

Wenjie Liu, Bingshu Wang, Ze Wang et al.

Document shadow removal is a crucial task in the field of document image enhancement. However, existing methods tend to remove shadows with constant color background and ignore color shadows. In this paper, we first design a diffusion model in latent space for document image shadow removal, called DocShaDiffusion. It translates shadow images from pixel space to latent space, enabling the model to more easily capture essential features. To address the issue of color shadows, we design a shadow soft-mask generation module (SSGM). It is able to produce accurate shadow mask and add noise into shadow regions specially. Guided by the shadow mask, a shadow mask-aware guided diffusion module (SMGDM) is proposed to remove shadows from document images by supervising the diffusion and denoising process. We also propose a shadow-robust perceptual feature loss to preserve details and structures in document images. Moreover, we develop a large-scale synthetic document color shadow removal dataset (SDCSRD). It simulates the distribution of realistic color shadows and provides powerful supports for the training of models. Experiments on three public datasets validate the proposed method's superiority over state-of-the-art. Our code and dataset will be publicly available.

CVFeb 24
Bridging Physically Based Rendering and Diffusion Models with Stochastic Differential Equation

Junwei Shu, Wenjie Liu, Changgu Chen et al.

Diffusion-based image generators excel at producing realistic content from text or image conditions, but they offer only limited explicit control over low-level, physically grounded shading and material properties. In contrast, physically based rendering (PBR) offers fine-grained physical control but lacks prompt-driven flexibility. Although these two paradigms originate from distinct communities, both share a common evolution -- from noisy observations to clean images. In this paper, we propose a unified stochastic formulation that bridges Monte Carlo rendering and diffusion-based generative modeling. First, a general stochastic differential equation (SDE) formulation for Monte Carlo integration under the Central Limit Theorem is modeled. Through instantiation via physically based path tracing, we convert it into a physically grounded SDE representation. Moreover, we provide a systematic analysis of how the physical characteristics of path tracing can be extended to existing diffusion models from the perspective of noise variance. Extensive experiments across multiple tasks show that our method can exert physically grounded control over diffusion-generated results, covering tasks such as rendering and material editing.

CVMay 14, 2024
The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

Lingdong Kong, Shaoyuan Xie, Hanjiang Hu et al. · tsinghua

In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that can withstand and adapt to these real-world variabilities. Focusing on four pivotal tasks -- BEV detection, map segmentation, semantic occupancy prediction, and multi-view depth estimation -- the competition laid down a gauntlet to innovate and enhance system resilience against typical and atypical disturbances. This year's challenge consisted of five distinct tracks and attracted 140 registered teams from 93 institutes across 11 countries, resulting in nearly one thousand submissions evaluated through our servers. The competition culminated in 15 top-performing solutions, which introduced a range of innovative approaches including advanced data augmentation, multi-sensor fusion, self-supervised learning for error correction, and new algorithmic strategies to enhance sensor robustness. These contributions significantly advanced the state of the art, particularly in handling sensor inconsistencies and environmental variability. Participants, through collaborative efforts, pushed the boundaries of current technologies, showcasing their potential in real-world scenarios. Extensive evaluations and analyses provided insights into the effectiveness of these solutions, highlighting key trends and successful strategies for improving the resilience of driving perception systems. This challenge has set a new benchmark in the field, providing a rich repository of techniques expected to guide future research in this field.

LGJul 2, 2025
AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training

Zhenyu Han, Ansheng You, Haibo Wang et al.

Reinforcement learning (RL) has become a pivotal technology in the post-training phase of large language models (LLMs). Traditional task-colocated RL frameworks suffer from significant scalability bottlenecks, while task-separated RL frameworks face challenges in complex dataflows and the corresponding resource idling and workload imbalance. Moreover, most existing frameworks are tightly coupled with LLM training or inference engines, making it difficult to support custom-designed engines. To address these challenges, we propose AsyncFlow, an asynchronous streaming RL framework for efficient post-training. Specifically, we introduce a distributed data storage and transfer module that provides a unified data management and fine-grained scheduling capability in a fully streamed manner. This architecture inherently facilitates automated pipeline overlapping among RL tasks and dynamic load balancing. Moreover, we propose a producer-consumer-based asynchronous workflow engineered to minimize computational idleness by strategically deferring parameter update process within staleness thresholds. Finally, the core capability of AsynFlow is architecturally decoupled from underlying training and inference engines and encapsulated by service-oriented user interfaces, offering a modular and customizable user experience. Extensive experiments demonstrate an average of 1.59 throughput improvement compared with state-of-the-art baseline. The presented architecture in this work provides actionable insights for next-generation RL training system designs.

CVMar 28, 2025
ABC-GS: Alignment-Based Controllable Style Transfer for 3D Gaussian Splatting

Wenjie Liu, Zhongliang Liu, Xiaoyan Yang et al.

3D scene stylization approaches based on Neural Radiance Fields (NeRF) achieve promising results by optimizing with Nearest Neighbor Feature Matching (NNFM) loss. However, NNFM loss does not consider global style information. In addition, the implicit representation of NeRF limits their fine-grained control over the resulting scenes. In this paper, we introduce ABC-GS, a novel framework based on 3D Gaussian Splatting to achieve high-quality 3D style transfer. To this end, a controllable matching stage is designed to achieve precise alignment between scene content and style features through segmentation masks. Moreover, a style transfer loss function based on feature alignment is proposed to ensure that the outcomes of style transfer accurately reflect the global style of the reference image. Furthermore, the original geometric information of the scene is preserved with the depth loss and Gaussian regularization terms. Extensive experiments show that our ABC-GS provides controllability of style transfer and achieves stylization results that are more faithfully aligned with the global style of the chosen artistic reference. Our homepage is available at https://vpx-ecnu.github.io/ABC-GS-website.

CVSep 23, 2025
Source-Free Domain Adaptive Semantic Segmentation of Remote Sensing Images with Diffusion-Guided Label Enrichment

Wenjie Liu, Hongmin Liu, Lixin Zhang et al.

Research on unsupervised domain adaptation (UDA) for semantic segmentation of remote sensing images has been extensively conducted. However, research on how to achieve domain adaptation in practical scenarios where source domain data is inaccessible namely, source-free domain adaptation (SFDA) remains limited. Self-training has been widely used in SFDA, which requires obtaining as many high-quality pseudo-labels as possible to train models on target domain data. Most existing methods optimize the entire pseudo-label set to obtain more supervisory information. However, as pseudo-label sets often contain substantial noise, simultaneously optimizing all labels is challenging. This limitation undermines the effectiveness of optimization approaches and thus restricts the performance of self-training. To address this, we propose a novel pseudo-label optimization framework called Diffusion-Guided Label Enrichment (DGLE), which starts from a few easily obtained high-quality pseudo-labels and propagates them to a complete set of pseudo-labels while ensuring the quality of newly generated labels. Firstly, a pseudo-label fusion method based on confidence filtering and super-resolution enhancement is proposed, which utilizes cross-validation of details and contextual information to obtain a small number of high-quality pseudo-labels as initial seeds. Then, we leverage the diffusion model to propagate incomplete seed pseudo-labels with irregular distributions due to its strong denoising capability for randomly distributed noise and powerful modeling capacity for complex distributions, thereby generating complete and high-quality pseudo-labels. This method effectively avoids the difficulty of directly optimizing the complete set of pseudo-labels, significantly improves the quality of pseudo-labels, and thus enhances the model's performance in the target domain.

CVMay 21, 2025
GT^2-GS: Geometry-aware Texture Transfer for Gaussian Splatting

Wenjie Liu, Zhongliang Liu, Junwei Shu et al.

Transferring 2D textures to 3D modalities is of great significance for improving the efficiency of multimedia content creation. Existing approaches have rarely focused on transferring image textures onto 3D representations. 3D style transfer methods are capable of transferring abstract artistic styles to 3D scenes. However, these methods often overlook the geometric information of the scene, which makes it challenging to achieve high-quality 3D texture transfer results. In this paper, we present GT^2-GS, a geometry-aware texture transfer framework for gaussian splitting. From the perspective of matching texture features with geometric information in rendered views, we identify the issue of insufficient texture features and propose a geometry-aware texture augmentation module to expand the texture feature set. Moreover, a geometry-consistent texture loss is proposed to optimize texture features into the scene representation. This loss function incorporates both camera pose and 3D geometric information of the scene, enabling controllable texture-oriented appearance editing. Finally, a geometry preservation strategy is introduced. By alternating between the texture transfer and geometry correction stages over multiple iterations, this strategy achieves a balance between learning texture features and preserving geometric integrity. Extensive experiments demonstrate the effectiveness and controllability of our method. Through geometric awareness, our approach achieves texture transfer results that better align with human visual perception. Our homepage is available at https://vpx-ecnu.github.io/GT2-GS-website.

CVMay 10, 2023
Multi-stage Progressive Reasoning for Dunhuang Murals Inpainting

Wenjie Liu, Baokai Liu, Shiqiang Du et al.

Dunhuang murals suffer from fading, breakage, surface brittleness and extensive peeling affected by prolonged environmental erosion. Image inpainting techniques are widely used in the field of digital mural inpainting. Generally speaking, for mural inpainting tasks with large area damage, it is challenging for any image inpainting method. In this paper, we design a multi-stage progressive reasoning network (MPR-Net) containing global to local receptive fields for murals inpainting. This network is capable of recursively inferring the damage boundary and progressively tightening the regional texture constraints. Moreover, to adaptively fuse plentiful information at various scales of murals, a multi-scale feature aggregation module (MFA) is designed to empower the capability to select the significant features. The execution of the model is similar to the process of a mural restorer (i.e., inpainting the structure of the damaged mural globally first and then adding the local texture details further). Our method has been evaluated through both qualitative and quantitative experiments, and the results demonstrate that it outperforms state-of-the-art image inpainting methods.

LGDec 5, 2021
A Novel Sequential Coreset Method for Gradient Descent Algorithms

Jiawei Huang, Ruomin Huang, Wenjie Liu et al.

A wide range of optimization problems arising in machine learning can be solved by gradient descent algorithms, and a central question in this area is how to efficiently compress a large-scale dataset so as to reduce the computational complexity. {\em Coreset} is a popular data compression technique that has been extensively studied before. However, most of existing coreset methods are problem-dependent and cannot be used as a general tool for a broader range of applications. A key obstacle is that they often rely on the pseudo-dimension and total sensitivity bound that can be very high or hard to obtain. In this paper, based on the ''locality'' property of gradient descent algorithms, we propose a new framework, termed ''sequential coreset'', which effectively avoids these obstacles. Moreover, our method is particularly suitable for sparse optimization whence the coreset size can be further reduced to be only poly-logarithmically dependent on the dimension. In practice, the experimental results suggest that our method can save a large amount of running time compared with the baseline algorithms.

CLSep 29, 2021
FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition

Yichong Leng, Xu Tan, Rui Wang et al.

Error correction is widely used in automatic speech recognition (ASR) to post-process the generated sentence, and can further reduce the word error rate (WER). Although multiple candidates are generated by an ASR system through beam search, current error correction approaches can only correct one sentence at a time, failing to leverage the voting effect from multiple candidates to better detect and correct error tokens. In this work, we propose FastCorrect 2, an error correction model that takes multiple ASR candidates as input for better correction accuracy. FastCorrect 2 adopts non-autoregressive generation for fast inference, which consists of an encoder that processes multiple source sentences and a decoder that generates the target sentence in parallel from the adjusted source sentence, where the adjustment is based on the predicted duration of each source token. However, there are some issues when handling multiple source sentences. First, it is non-trivial to leverage the voting effect from multiple source sentences since they usually vary in length. Thus, we propose a novel alignment algorithm to maximize the degree of token alignment among multiple sentences in terms of token and pronunciation similarity. Second, the decoder can only take one adjusted source sentence as input, while there are multiple source sentences. Thus, we develop a candidate predictor to detect the most suitable candidate for the decoder. Experiments on our inhouse dataset and AISHELL-1 show that FastCorrect 2 can further reduce the WER over the previous correction model with single candidate by 3.2% and 2.6%, demonstrating the effectiveness of leveraging multiple candidates in ASR error correction. FastCorrect 2 achieves better performance than the cascaded re-scoring and correction pipeline and can serve as a unified post-processing module for ASR.

LGFeb 28, 2021
Is Simple Uniform Sampling Effective for Center-Based Clustering with Outliers: When and Why?

Jiawei Huang, Wenjie Liu, Hu Ding

Real-world datasets often contain outliers, and the presence of outliers can make the clustering problems to be much more challenging. In this paper, we propose a simple uniform sampling framework for solving three representative center-based clustering with outliers problems: $k$-center/median/means clustering with outliers. Our analysis is fundamentally different from the previous (uniform and non-uniform) sampling based ideas. To explain the effectiveness of uniform sampling in theory, we introduce a measure of "significance" and prove that the performance of our framework depends on the significance degree of the given instance. In particular, the sample size can be independent of the input data size $n$ and the dimensionality $d$, if we assume the given instance is "significant", which is in fact a fairly reasonable assumption in practice. Due to its simplicity, the uniform sampling approach also enjoys several significant advantages over the non-uniform sampling approaches in practice. To the best of our knowledge, this is the first work that systematically studies the effectiveness of uniform sampling from both theoretical and experimental aspects.

QUANT-PHFeb 1, 2020
A Quantum-based Database Query Scheme for Privacy Preservation in Cloud Environment

Wenjie Liu, Peipei Gao, Zhihao Liu et al.

Cloud computing is a powerful and popular information technology paradigm that enables data service outsourcing and provides higher-level services with minimal management effort. However, it is still a key challenge to protect data privacy when a user accesses the sensitive cloud data. Privacy-preserving database query allows the user to retrieve a data item from the cloud database without revealing the information of the queried data item, meanwhile limiting user's ability to access other ones. In this study, in order to achieve the privacy preservation and reduce the communication complexity, a quantum-based database query scheme for privacy preservation in cloud environment is developed. Specifically, all the data items of the database are firstly encrypted by different keys for protecting server's privacy, and in order to guarantee the clients' privacy, the server is required to transmit all these encrypted data items to the client with the oblivious transfer strategy. Besides, two oracle operations, a modified Grover iteration, and a special offset encryption mechanism are combined together to ensure that the client can correctly query the desirable data item. Finally, performance evaluation is conducted to validate the correctness, privacy, and efficiency of our proposed scheme.

CVJul 24, 2018
Feature Fusion through Multitask CNN for Large-scale Remote Sensing Image Segmentation

Shihao Sun, Lei Yang, Wenjie Liu et al.

In recent years, Fully Convolutional Networks (FCN) has been widely used in various semantic segmentation tasks, including multi-modal remote sensing imagery. How to fuse multi-modal data to improve the segmentation performance has always been a research hotspot. In this paper, a novel end-toend fully convolutional neural network is proposed for semantic segmentation of natural color, infrared imagery and Digital Surface Models (DSM). It is based on a modified DeepUNet and perform the segmentation in a multi-task way. The channels are clustered into groups and processed on different task pipelines. After a series of segmentation and fusion, their shared features and private features are successfully merged together. Experiment results show that the feature fusion network is efficient. And our approach achieves good performance in ISPRS Semantic Labeling Contest (2D).

CVSep 1, 2017
DeepUNet: A Deep Fully Convolutional Network for Pixel-level Sea-Land Segmentation

Ruirui Li, Wenjie Liu, Lei Yang et al.

Semantic segmentation is a fundamental research in remote sensing image processing. Because of the complex maritime environment, the sea-land segmentation is a challenging task. Although the neural network has achieved excellent performance in semantic segmentation in the last years, there are a few of works using CNN for sea-land segmentation and the results could be further improved. This paper proposes a novel deep convolution neural network named DeepUNet. Like the U-Net, its structure has a contracting path and an expansive path to get high resolution output. But differently, the DeepUNet uses DownBlocks instead of convolution layers in the contracting path and uses UpBlock in the expansive path. The two novel blocks bring two new connections that are U-connection and Plus connection. They are promoted to get more precise segmentation results. To verify our network architecture, we made a new challenging sea-land dataset and compare the DeepUNet on it with the SegNet and the U-Net. Experimental results show that DeepUNet achieved good performance compared with other architectures, especially in high-resolution remote sensing imagery.

NAJun 24, 2017
High-order implicit Galerkin-Legendre spectral method for the two-dimensional Schrodinger equation

Wenjie Liu, Boying Wu

In this paper, we propose Galerkin-Legendre spectral method with implicit Runge-Kutta method for solving the unsteady two-dimensional Schrodinger equation with nonhomogeneous Dirichlet boundary conditions and initial condition. We apply a Galerkin-Legendre spectral method for discretizing spatial derivatives, and then employ the implicit Runge-Kutta method for the time integration of the resulting linear first-order system of ordinary differential equations in complex domain. We derive the spectral rate of convergence for the proposed method in the L^2-norm for the semidiscrete formulation. Numerical experiments show our formulation have high-order accurate, and have the exponential rates of convergence in space.

CRDec 19, 2013
Quantum Private Comparison: A Review

Wenjie Liu, Chao Liu, Haibin Wang et al.

As an important branch of quantum secure multiparty computation, quantum private comparison (QPC) has attracted more and more attention recently. In this paper, according to the quantum implementation mechanism that these protocols used, we divide these protocols into three categories: The quantum cryptography QPC, the superdense coding QPC, and the entanglement swapping QPC. And then, a more in-depth analysis on the research progress, design idea, and substantive characteristics of corresponding QPC categories is carried out, respectively. Finally, the applications of QPC and quantum secure multi-party computation issues are discussed and, in addition, three possible research mainstream directions are pointed out.