CVMay 26
SIMPC: Learning Self-Induced Mirror-Point Consistency for Unsupervised Point Cloud DenoisingChengwei Zhang, Xueyi Zhang, Tao Jiang et al.
In point clouds, noise directly perturbs point coordinates that encode both spatial location and geometry, making one-to-one correspondence construction more challenging than in images. Existing methods impose statistical mappings across noisy variants via noise or optimal transport, but suffer from correspondence ambiguity. In this work, we propose Self-Induced Mirror-Point Consistency (SIMPC) to learn deterministic correspondences between points and the underlying surface in an unsupervised manner. For each noisy point, SIMPC generates a mirror-point on the opposite side of the underlying surface, guided by geometric priors during the denoising process. By encouraging consistency between the denoising targets of the original point and its mirror counterpart, SIMPC effectively localizes the position of underlying surface. Extensive experiments on synthetic and real-world datasets demonstrate that SIMPC significantly outperforms state-of-the-art unsupervised methods and surpasses several strong supervised counterparts.
CHEM-PHNov 30, 2023
Transfer Learning across Different Chemical Domains: Virtual Screening of Organic Materials with Deep Learning Models Pretrained on Small Molecule and Chemical Reaction DataChengwei Zhang, Yushuang Zhai, Ziyang Gong et al.
Machine learning is becoming a preferred method for the virtual screening of organic materials due to its cost-effectiveness over traditional computationally demanding techniques. However, the scarcity of labeled data for organic materials poses a significant challenge for training advanced machine learning models. This study showcases the potential of utilizing databases of drug-like small molecules and chemical reactions to pretrain the BERT model, enhancing its performance in the virtual screening of organic materials. By fine-tuning the BERT models with data from five virtual screening tasks, the version pretrained with the USPTO-SMILES dataset achieved R2 scores exceeding 0.94 for three tasks and over 0.81 for two others. This performance surpasses that of models pretrained on the small molecule or organic materials databases and outperforms three traditional machine learning models trained directly on virtual screening data. The success of the USPTO-SMILES pretrained BERT model can be attributed to the diverse array of organic building blocks in the USPTO database, offering a broader exploration of the chemical space. The study further suggests that accessing a reaction database with a wider range of reactions than the USPTO could further enhance model performance. Overall, this research validates the feasibility of applying transfer learning across different chemical domains for the efficient virtual screening of organic materials.
CVNov 18, 2024Code
DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving ScenesChensheng Peng, Chengwei Zhang, Yixiao Wang et al. · berkeley
We present DeSiRe-GS, a self-supervised gaussian splatting representation, enabling effective static-dynamic decomposition and high-fidelity surface reconstruction in complex driving scenarios. Our approach employs a two-stage optimization pipeline of dynamic street Gaussians. In the first stage, we extract 2D motion masks based on the observation that 3D Gaussian Splatting inherently can reconstruct only the static regions in dynamic environments. These extracted 2D motion priors are then mapped into the Gaussian space in a differentiable manner, leveraging an efficient formulation of dynamic Gaussians in the second stage. Combined with the introduced geometric regularizations, our method are able to address the over-fitting issues caused by data sparsity in autonomous driving, reconstructing physically plausible Gaussians that align with object surfaces rather than floating in air. Furthermore, we introduce temporal cross-view consistency to ensure coherence across time and viewpoints, resulting in high-quality surface reconstruction. Comprehensive experiments demonstrate the efficiency and effectiveness of DeSiRe-GS, surpassing prior self-supervised arts and achieving accuracy comparable to methods relying on external 3D bounding box annotations. Code is available at https://github.com/chengweialan/DeSiRe-GS
ITMay 17
Variational Diffusion Channel DecoderChengwei Zhang, Yifan Du, Siyu Liao
Neural channel decoder, as a data-driven channel decoding strategy, has shown very promising improvement on error-correcting capability over the classical methods. However, the success of those deep learning-based decoder comes at the cost of drastically increased model storage and computational complexity, hindering their practical adoptions in real-world time-sensitive resource-sensitive communication and storage systems. To address this challenge, we propose an efficient variational diffusion model-based channel decoder, which effectively integrates the domain-specific belief propagation process to the modern diffusion model. By reaping the low-cost benefits of belief propagation and strong learning capability of diffusion model, our proposed neural decoder simultaneously achieves very low cost and high error-correcting performance. Experimental results show that, compared with the state-of-the-art neural channel decoders, our model provides a feasible solution for practical deployment via achieving the best decoding performance with significantly reduced computational cost and model size.
ROMay 17
Beyond Geometry: Efficient Topologically-Grounded Navigation in Complex 3D EnvironmentsYifan Du, Chengwei Zhang, Siyu Liao et al.
Ground robot navigation in complex 3D environments is often hindered by geometric ambiguity, where non-traversable structures such as furniture share local geometric properties with navigable ground. Furthermore, the computational cost of searching massive voxel spaces remains a significant challenge. To address these issues, we present a surface extraction framework that constructs a reduced state space of physically reachable standing positions by enforcing ground support, overhead clearance, and seed-based connectivity constraints. Evaluation across five Matterport3D indoor scenes and three PCT benchmark scenes demonstrates over 80\% state space reduction and sub-millisecond A* search on the Matterport3D scenes, with 100\% planning success across all 300 tested queries.
MLApr 13, 2025
AB-Cache: Training-Free Acceleration of Diffusion Models via Adams-Bashforth Cached Feature ReuseZichao Yu, Zhen Zou, Guojiang Shao et al.
Diffusion models have demonstrated remarkable success in generative tasks, yet their iterative denoising process results in slow inference, limiting their practicality. While existing acceleration methods exploit the well-known U-shaped similarity pattern between adjacent steps through caching mechanisms, they lack theoretical foundation and rely on simplistic computation reuse, often leading to performance degradation. In this work, we provide a theoretical understanding by analyzing the denoising process through the second-order Adams-Bashforth method, revealing a linear relationship between the outputs of consecutive steps. This analysis explains why the outputs of adjacent steps exhibit a U-shaped pattern. Furthermore, extending Adams-Bashforth method to higher order, we propose a novel caching-based acceleration approach for diffusion models, instead of directly reusing cached results, with a truncation error bound of only \(O(h^k)\) where $h$ is the step size. Extensive validation across diverse image and video diffusion models (including HunyuanVideo and FLUX.1-dev) with various schedulers demonstrates our method's effectiveness in achieving nearly $3\times$ speedup while maintaining original performance levels, offering a practical real-time solution without compromising generation quality.
CLMay 30, 2025
Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error CorrectionYangui Fang, Baixu Cheng, Jing Peng et al.
Automatic Speech Recognition (ASR) error correction aims to correct recognition errors while preserving accurate text. Although traditional approaches demonstrate moderate effectiveness, LLMs offer a paradigm that eliminates the need for training and labeled data. However, directly using LLMs will encounter hallucinations problem, which may lead to the modification of the correct text. To address this problem, we propose the Reliable LLM Correction Framework (RLLM-CF), which consists of three stages: (1) error pre-detection, (2) chain-of-thought sub-tasks iterative correction, and (3) reasoning process verification. The advantage of our method is that it does not require additional information or fine-tuning of the model, and ensures the correctness of the LLM correction under multi-pass programming. Experiments on AISHELL-1, AISHELL-2, and Librispeech show that the GPT-4o model enhanced by our framework achieves 21%, 11%, 9%, and 11.4% relative reductions in CER/WER.
CLDec 5, 2024
Representation Purification for End-to-End Speech TranslationChengwei Zhang, Yue Zhou, Rui Zhao et al.
Speech-to-text translation (ST) is a cross-modal task that involves converting spoken language into text in a different language. Previous research primarily focused on enhancing speech translation by facilitating knowledge transfer from machine translation, exploring various methods to bridge the gap between speech and text modalities. Despite substantial progress made, factors in speech that are not relevant to translation content, such as timbre and rhythm, often limit the efficiency of knowledge transfer. In this paper, we conceptualize speech representation as a combination of content-agnostic and content-relevant factors. We examine the impact of content-agnostic factors on translation performance through preliminary experiments and observe a significant performance deterioration when content-agnostic perturbations are introduced to speech signals. To address this issue, we propose a \textbf{S}peech \textbf{R}epresentation \textbf{P}urification with \textbf{S}upervision \textbf{E}nhancement (SRPSE) framework, which excludes the content-agnostic components within speech representations to mitigate their negative impact on ST. Experiments on MuST-C and CoVoST-2 datasets demonstrate that SRPSE significantly improves translation performance across all translation directions in three settings and achieves preeminent performance under a \textit{transcript-free} setting.
CVMar 14, 2025
Towards Privacy-preserved Pre-training of Remote Sensing Foundation Models with Federated Mutual-guidance LearningJieyi Tan, Chengwei Zhang, Bo Dang et al.
Traditional Remote Sensing Foundation models (RSFMs) are pre-trained with a data-centralized paradigm, through self-supervision on large-scale curated remote sensing data. For each institution, however, pre-training RSFMs with limited data in a standalone manner may lead to suboptimal performance, while aggregating remote sensing data from multiple institutions for centralized pre-training raises privacy concerns. Seeking for collaboration is a promising solution to resolve this dilemma, where multiple institutions can collaboratively train RSFMs without sharing private data. In this paper, we propose a novel privacy-preserved pre-training framework (FedSense), which enables multiple institutions to collaboratively train RSFMs without sharing private data. However, it is a non-trivial task hindered by a vicious cycle, which results from model drift by remote sensing data heterogeneity and high communication overhead. To break this vicious cycle, we introduce Federated Mutual-guidance Learning. Specifically, we propose a Server-to-Clients Guidance (SCG) mechanism to guide clients updates towards global-flatness optimal solutions. Additionally, we propose a Clients-to-Server Guidance (CSG) mechanism to inject local knowledge into the server by low-bit communication. Extensive experiments on four downstream tasks demonstrate the effectiveness of our FedSense in both full-precision and communication-reduced scenarios, showcasing remarkable communication efficiency and performance gains.
MAMar 11, 2025
Enhancing Traffic Signal Control through Model-based Reinforcement Learning and Policy ReuseYihong Li, Chengwei Zhang, Furui Zhan et al.
Multi-agent reinforcement learning (MARL) has shown significant potential in traffic signal control (TSC). However, current MARL-based methods often suffer from insufficient generalization due to the fixed traffic patterns and road network conditions used during training. This limitation results in poor adaptability to new traffic scenarios, leading to high retraining costs and complex deployment. To address this challenge, we propose two algorithms: PLight and PRLight. PLight employs a model-based reinforcement learning approach, pretraining control policies and environment models using predefined source-domain traffic scenarios. The environment model predicts the state transitions, which facilitates the comparison of environmental features. PRLight further enhances adaptability by adaptively selecting pre-trained PLight agents based on the similarity between the source and target domains to accelerate the learning process in the target domain. We evaluated the algorithms through two transfer settings: (1) adaptability to different traffic scenarios within the same road network, and (2) generalization across different road networks. The results show that PRLight significantly reduces the adaptation time compared to learning from scratch in new TSC scenarios, achieving optimal performance using similarities between available and target scenarios.
CVAug 16, 2025
Deep Learning For Point Cloud Denoising: A SurveyChengwei Zhang, Xueyi Zhang, Mingrui Lao et al.
Real-world environment-derived point clouds invariably exhibit noise across varying modalities and intensities. Hence, point cloud denoising (PCD) is essential as a preprocessing step to improve downstream task performance. Deep learning (DL)-based PCD models, known for their strong representation capabilities and flexible architectures, have surpassed traditional methods in denoising performance. To our best knowledge, despite recent advances in performance, no comprehensive survey systematically summarizes the developments of DL-based PCD. To fill the gap, this paper seeks to identify key challenges in DL-based PCD, summarizes the main contributions of existing methods, and proposes a taxonomy tailored to denoising tasks. To achieve this goal, we formulate PCD as a two-step process: outlier removal and surface noise restoration, encompassing most scenarios and requirements of PCD. Additionally, we compare methods in terms of similarities, differences, and respective advantages. Finally, we discuss research limitations and future directions, offering insights for further advancements in PCD.
LGMar 10, 2025
Using a single actor to output personalized policy for different intersectionsKailing Zhou, Chengwei Zhang, Furui Zhan et al.
Recently, with the development of Multi-agent reinforcement learning (MARL), adaptive traffic signal control (ATSC) has achieved satisfactory results. In traffic scenarios with multiple intersections, MARL treats each intersection as an agent and optimizes traffic signal control strategies through learning and real-time decision-making. Considering that observation distributions of intersections might be different in real-world scenarios, shared parameter methods might lack diversity and thus lead to high generalization requirements in the shared-policy network. A typical solution is to increase the size of network parameters. However, simply increasing the scale of the network does not necessarily improve policy generalization, which is validated in our experiments. Accordingly, an approach that considers both the personalization of intersections and the efficiency of parameter sharing is required. To this end, we propose Hyper-Action Multi-Head Proximal Policy Optimization (HAMH-PPO), a Centralized Training with Decentralized Execution (CTDE) MARL method that utilizes a shared PPO policy network to deliver personalized policies for intersections with non-iid observation distributions. The centralized critic in HAMH-PPO uses graph attention units to calculate the graph representations of all intersections and outputs a set of value estimates with multiple output heads for each intersection. The decentralized execution actor takes the local observation history as input and output distributions of action as well as a so-called hyper-action to balance the multiple values estimated from the centralized critic to further guide the updating of TSC policies. The combination of hyper-action and multi-head values enables multiple agents to share a single actor-critic while achieving personalized policies.
LGApr 22, 2021
Independent Reinforcement Learning for Weakly Cooperative Multiagent Traffic Control ProblemChengwei Zhang, Shan Jin, Wanli Xue et al.
The adaptive traffic signal control (ATSC) problem can be modeled as a multiagent cooperative game among urban intersections, where intersections cooperate to optimize their common goal. Recently, reinforcement learning (RL) has achieved marked successes in managing sequential decision making problems, which motivates us to apply RL in the ASTC problem. Here we use independent reinforcement learning (IRL) to solve a complex traffic cooperative control problem in this study. One of the largest challenges of this problem is that the observation information of intersection is typically partially observable, which limits the learning performance of IRL algorithms. To this, we model the traffic control problem as a partially observable weak cooperative traffic model (PO-WCTM) to optimize the overall traffic situation of a group of intersections. Different from a traditional IRL task that averages the returns of all agents in fully cooperative games, the learning goal of each intersection in PO-WCTM is to reduce the cooperative difficulty of learning, which is also consistent with the traffic environment hypothesis. We also propose an IRL algorithm called Cooperative Important Lenient Double DQN (CIL-DDQN), which extends Double DQN (DDQN) algorithm using two mechanisms: the forgetful experience mechanism and the lenient weight training mechanism. The former mechanism decreases the importance of experiences stored in the experience reply buffer, which deals with the problem of experience failure caused by the strategy change of other agents. The latter mechanism increases the weight experiences with high estimation and `leniently' trains the DDQN neural network, which improves the probability of the selection of cooperative joint strategies. Experimental results show that CIL-DDQN outperforms other methods in almost all performance indicators of the traffic control problem.
AIJan 30, 2021
MUSE: Multi-Scale Temporal Features Evolution for Knowledge TracingChengwei Zhang, Yangzhou Jiang, Wei Zhang et al.
Transformer based knowledge tracing model is an extensively studied problem in the field of computer-aided education. By integrating temporal features into the encoder-decoder structure, transformers can processes the exercise information and student response information in a natural way. However, current state-of-the-art transformer-based variants still share two limitations. First, extremely long temporal features cannot well handled as the complexity of self-attention mechanism is O(n2). Second, existing approaches track the knowledge drifts under fixed a window size, without considering different temporal-ranges. To conquer these problems, we propose MUSE, which is equipped with multi-scale temporal sensor unit, that takes either local or global temporal features into consideration. The proposed model is capable to capture the dynamic changes in users knowledge states at different temporal-ranges, and provides an efficient and powerful way to combine local and global features to make predictions. Our method won the 5-th place over 3,395 teams in the Riiid AIEd Challenge 2020.
CVMay 27, 2020
SPIN: Structure-Preserving Inner Offset Network for Scene Text RecognitionChengwei Zhang, Yunlu Xu, Zhanzhan Cheng et al.
Arbitrary text appearance poses a great challenge in scene text recognition tasks. Existing works mostly handle with the problem in consideration of the shape distortion, including perspective distortions, line curvature or other style variations. Therefore, methods based on spatial transformers are extensively studied. However, chromatic difficulties in complex scenes have not been paid much attention on. In this work, we introduce a new learnable geometric-unrelated module, the Structure-Preserving Inner Offset Network (SPIN), which allows the color manipulation of source data within the network. This differentiable module can be inserted before any recognition architecture to ease the downstream tasks, giving neural networks the ability to actively transform input intensity rather than the existing spatial rectification. It can also serve as a complementary module to known spatial transformations and work in both independent and collaborative ways with them. Extensive experiments show that the use of SPIN results in a significant improvement on multiple text recognition benchmarks compared to the state-of-the-arts.
CVAug 7, 2019
Adversarial Seeded Sequence Growing for Weakly-Supervised Temporal Action LocalizationChengwei Zhang, Yunlu Xu, Zhanzhan Cheng et al.
Temporal action localization is an important yet challenging research topic due to its various applications. Since the frame-level or segment-level annotations of untrimmed videos require amounts of labor expenditure, studies on the weakly-supervised action detection have been springing up. However, most of existing frameworks rely on Class Activation Sequence (CAS) to localize actions by minimizing the video-level classification loss, which exploits the most discriminative parts of actions but ignores the minor regions. In this paper, we propose a novel weakly-supervised framework by adversarial learning of two modules for eliminating such demerits. Specifically, the first module is designed as a well-designed Seeded Sequence Growing (SSG) Network for progressively extending seed regions (namely the highly reliable regions initialized by a CAS-based framework) to their expected boundaries. The second module is a specific classifier for mining trivial or incomplete action regions, which is trained on the shared features after erasing the seeded regions activated by SSG. In this way, a whole network composed of these two modules can be trained in an adversarial manner. The goal of the adversary is to mine features that are difficult for the action classifier. That is, erasion from SSG will force the classifier to discover minor or even new action regions on the input feature sequence, and the classifier will drive the seeds to grow, alternately. At last, we could obtain the action locations and categories from the well-trained SSG and the classifier. Extensive experiments on two public benchmarks THUMOS'14 and ActivityNet1.3 demonstrate the impressive performance of our proposed method compared with the state-of-the-arts.
CVNov 19, 2018
Segregated Temporal Assembly Recurrent Networks for Weakly Supervised Multiple Action DetectionYunlu Xu, Chengwei Zhang, Zhanzhan Cheng et al.
This paper proposes a segregated temporal assembly recurrent (STAR) network for weakly-supervised multiple action detection. The model learns from untrimmed videos with only supervision of video-level labels and makes prediction of intervals of multiple actions. Specifically, we first assemble video clips according to class labels by an attention mechanism that learns class-variable attention weights and thus helps the noise relieving from background or other actions. Secondly, we build temporal relationship between actions by feeding the assembled features into an enhanced recurrent neural network. Finally, we transform the output of recurrent neural network into the corresponding action distribution. In order to generate more precise temporal proposals, we design a score term called segregated temporal gradient-weighted class activation mapping (ST-GradCAM) fused with attention weights. Experiments on THUMOS'14 and ActivityNet1.3 datasets show that our approach outperforms the state-of-the-art weakly-supervised method, and performs at par with the fully-supervised counterparts.
AISep 18, 2018
SCC-rFMQ Learning in Cooperative Markov Games with Continuous ActionsChengwei Zhang, Xiaohong Li, Jianye Hao et al.
Although many reinforcement learning methods have been proposed for learning the optimal solutions in single-agent continuous-action domains, multiagent coordination domains with continuous actions have received relatively few investigations. In this paper, we propose an independent learner hierarchical method, named Sample Continuous Coordination with recursive Frequency Maximum Q-Value (SCC-rFMQ), which divides the cooperative problem with continuous actions into two layers. The first layer samples a finite set of actions from the continuous action spaces by a re-sampling mechanism with variable exploratory rates, and the second layer evaluates the actions in the sampled action set and updates the policy using a reinforcement learning cooperative method. By constructing cooperative mechanisms at both levels, SCC-rFMQ can handle cooperative problems in continuous action cooperative Markov games effectively. The effectiveness of SCC-rFMQ is experimentally demonstrated on two well-designed games, i.e., a continuous version of the climbing game and a cooperative version of the boat problem. Experimental results show that SCC-rFMQ outperforms other reinforcement learning algorithms.
AIMar 8, 2018
SA-IGA: A Multiagent Reinforcement Learning Method Towards Socially Optimal OutcomesChengwei Zhang, Xiaohong Li, Jianye Hao et al.
In multiagent environments, the capability of learning is important for an agent to behave appropriately in face of unknown opponents and dynamic environment. From the system designer's perspective, it is desirable if the agents can learn to coordinate towards socially optimal outcomes, while also avoiding being exploited by selfish opponents. To this end, we propose a novel gradient ascent based algorithm (SA-IGA) which augments the basic gradient-ascent algorithm by incorporating social awareness into the policy update process. We theoretically analyze the learning dynamics of SA-IGA using dynamical system theory and SA-IGA is shown to have linear dynamics for a wide range of games including symmetric games. The learning dynamics of two representative games (the prisoner's dilemma game and the coordination game) are analyzed in details. Based on the idea of SA-IGA, we further propose a practical multiagent learning algorithm, called SA-PGA, based on Q-learning update rule. Simulation results show that SA-PGA agent can achieve higher social welfare than previous social-optimality oriented Conditional Joint Action Learner (CJAL) and also is robust against individually rational opponents by reaching Nash equilibrium solutions.
ROMay 16, 2017
Kinematics, workspace and singularity analysis of a multi-mode parallel robotDamien Chablat, Xianwen Kong, Chengwei Zhang
A family of reconfigurable parallel robots can change motion modes by passing through constraint singularities by locking and releasing some passive joints of the robot. This paper is about the kinematics, the workspace and singularity analysis of a 3-PRPiR parallel robot involving lockable Pi and R (revolute) joints. Here a Pi joint may act as a 1-DOF planar parallelogram if its lock-able P (prismatic) joint is locked or a 2-DOF RR serial chain if its lockable P joint is released. The operation modes of the robot include a 3T operation modes to three 2T1R operation modes with two different directions of the rotation axis of the moving platform. The inverse kinematics and forward kinematics of the robot in each operation modes are dealt with in detail. The workspace analysis of the robot allow us to know the regions of the workspace that the robot can reach in each operation mode. A prototype built at Heriot-Watt University is used to illustrate the results of this work.