Cheng Wang

CV
h-index124
226papers
11,495citations
Novelty50%
AI Score62

226 Papers

CVJun 8, 2023Code
SparseTrack: Multi-Object Tracking by Performing Scene Decomposition based on Pseudo-Depth

Zelin Liu, Xinggang Wang, Cheng Wang et al. · amazon-science

Exploring robust and efficient association methods has always been an important issue in multiple-object tracking (MOT). Although existing tracking methods have achieved impressive performance, congestion and frequent occlusions still pose challenging problems in multi-object tracking. We reveal that performing sparse decomposition on dense scenes is a crucial step to enhance the performance of associating occluded targets. To this end, we propose a pseudo-depth estimation method for obtaining the relative depth of targets from 2D images. Secondly, we design a depth cascading matching (DCM) algorithm, which can use the obtained depth information to convert a dense target set into multiple sparse target subsets and perform data association on these sparse target subsets in order from near to far. By integrating the pseudo-depth method and the DCM strategy into the data association process, we propose a new tracker, called SparseTrack. SparseTrack provides a new perspective for solving the challenging crowded scene MOT problem. Only using IoU matching, SparseTrack achieves comparable performance with the state-of-the-art (SOTA) methods on the MOT17 and MOT20 benchmarks. Code and models are publicly available at \url{https://github.com/hustvl/SparseTrack}.

CVMar 4, 2023Code
Virtual Sparse Convolution for Multimodal 3D Object Detection

Hai Wu, Chenglu Wen, Shaoshuai Shi et al.

Recently, virtual/pseudo-point-based 3D object detection that seamlessly fuses RGB images and LiDAR data by depth completion has gained great attention. However, virtual points generated from an image are very dense, introducing a huge amount of redundant computation during detection. Meanwhile, noises brought by inaccurate depth completion significantly degrade detection precision. This paper proposes a fast yet effective backbone, termed VirConvNet, based on a new operator VirConv (Virtual Sparse Convolution), for virtual-point-based 3D object detection. VirConv consists of two key designs: (1) StVD (Stochastic Voxel Discard) and (2) NRConv (Noise-Resistant Submanifold Convolution). StVD alleviates the computation problem by discarding large amounts of nearby redundant voxels. NRConv tackles the noise problem by encoding voxel features in both 2D image and 3D LiDAR space. By integrating VirConv, we first develop an efficient pipeline VirConv-L based on an early fusion design. Then, we build a high-precision pipeline VirConv-T based on a transformed refinement scheme. Finally, we develop a semi-supervised pipeline VirConv-S based on a pseudo-label framework. On the KITTI car 3D detection test leaderboard, our VirConv-L achieves 85% AP with a fast running speed of 56ms. Our VirConv-T and VirConv-S attains a high-precision of 86.3% and 87.2% AP, and currently rank 2nd and 1st, respectively. The code is available at https://github.com/hailanyi/VirConv.

CHEM-PHOct 31, 2023Code
MLatom 3: Platform for machine learning-enhanced computational chemistry simulations and workflows

Pavlo O. Dral, Fuchun Ge, Yi-Fan Hou et al.

Machine learning (ML) is increasingly becoming a common tool in computational chemistry. At the same time, the rapid development of ML methods requires a flexible software framework for designing custom workflows. MLatom 3 is a program package designed to leverage the power of ML to enhance typical computational chemistry simulations and to create complex workflows. This open-source package provides plenty of choice to the users who can run simulations with the command line options, input files, or with scripts using MLatom as a Python package, both on their computers and on the online XACS cloud computing at XACScloud.com. Computational chemists can calculate energies and thermochemical properties, optimize geometries, run molecular and quantum dynamics, and simulate (ro)vibrational, one-photon UV/vis absorption, and two-photon absorption spectra with ML, quantum mechanical, and combined models. The users can choose from an extensive library of methods containing pre-trained ML models and quantum mechanical approximations such as AIQM1 approaching coupled-cluster accuracy. The developers can build their own models using various ML algorithms. The great flexibility of MLatom is largely due to the extensive use of the interfaces to many state-of-the-art software packages and libraries.

LGNov 25, 2022Code
FedGS: Federated Graph-based Sampling with Arbitrary Client Availability

Zheng Wang, Xiaoliang Fan, Jianzhong Qi et al.

While federated learning has shown strong results in optimizing a machine learning model without direct access to the original data, its performance may be hindered by intermittent client availability which slows down the convergence and biases the final learned model. There are significant challenges to achieve both stable and bias-free training under arbitrary client availability. To address these challenges, we propose a framework named Federated Graph-based Sampling (FedGS), to stabilize the global model update and mitigate the long-term bias given arbitrary client availability simultaneously. First, we model the data correlations of clients with a Data-Distribution-Dependency Graph (3DG) that helps keep the sampled clients data apart from each other, which is theoretically shown to improve the approximation to the optimal model update. Second, constrained by the far-distance in data distribution of the sampled clients, we further minimize the variance of the numbers of times that the clients are sampled, to mitigate long-term bias. To validate the effectiveness of FedGS, we conduct experiments on three datasets under a comprehensive set of seven client availability modes. Our experimental results confirm FedGS's advantage in both enabling a fair client-sampling scheme and improving the model performance under arbitrary client availability. Our code is available at \url{https://github.com/WwZzz/FedGS}.

CVAug 31, 2023Code
Decoupled Local Aggregation for Point Cloud Learning

Binjie Chen, Yunzhou Xia, Yu Zang et al.

The unstructured nature of point clouds demands that local aggregation be adaptive to different local structures. Previous methods meet this by explicitly embedding spatial relations into each aggregation process. Although this coupled approach has been shown effective in generating clear semantics, aggregation can be greatly slowed down due to repeated relation learning and redundant computation to mix directional and point features. In this work, we propose to decouple the explicit modelling of spatial relations from local aggregation. We theoretically prove that basic neighbor pooling operations can too function without loss of clarity in feature fusion, so long as essential spatial information has been encoded in point features. As an instantiation of decoupled local aggregation, we present DeLA, a lightweight point network, where in each learning stage relative spatial encodings are first formed, and only pointwise convolutions plus edge max-pooling are used for local aggregation then. Further, a regularization term is employed to reduce potential ambiguity through the prediction of relative coordinates. Conceptually simple though, experimental results on five classic benchmarks demonstrate that DeLA achieves state-of-the-art performance with reduced or comparable latency. Specifically, DeLA achieves over 90\% overall accuracy on ScanObjectNN and 74\% mIoU on S3DIS Area 5. Our code is available at https://github.com/Matrix-ASC/DeLA .

MANov 3, 2023Code
RiskQ: Risk-sensitive Multi-Agent Reinforcement Learning Value Factorization

Siqi Shen, Chennan Ma, Chao Li et al.

Multi-agent systems are characterized by environmental uncertainty, varying policies of agents, and partial observability, which result in significant risks. In the context of Multi-Agent Reinforcement Learning (MARL), learning coordinated and decentralized policies that are sensitive to risk is challenging. To formulate the coordination requirements in risk-sensitive MARL, we introduce the Risk-sensitive Individual-Global-Max (RIGM) principle as a generalization of the Individual-Global-Max (IGM) and Distributional IGM (DIGM) principles. This principle requires that the collection of risk-sensitive action selections of each agent should be equivalent to the risk-sensitive action selection of the central policy. Current MARL value factorization methods do not satisfy the RIGM principle for common risk metrics such as the Value at Risk (VaR) metric or distorted risk measurements. Therefore, we propose RiskQ to address this limitation, which models the joint return distribution by modeling quantiles of it as weighted quantile mixtures of per-agent return distribution utilities. RiskQ satisfies the RIGM principle for the VaR and distorted risk metrics. We show that RiskQ can obtain promising performance through extensive experiments. The source code of RiskQ is available in https://github.com/xmu-rl-3dv/RiskQ.

LGFeb 6, 2023Code
INCREASE: Inductive Graph Representation Learning for Spatio-Temporal Kriging

Chuanpan Zheng, Xiaoliang Fan, Cheng Wang et al.

Spatio-temporal kriging is an important problem in web and social applications, such as Web or Internet of Things, where things (e.g., sensors) connected into a web often come with spatial and temporal properties. It aims to infer knowledge for (the things at) unobserved locations using the data from (the things at) observed locations during a given time period of interest. This problem essentially requires \emph{inductive learning}. Once trained, the model should be able to perform kriging for different locations including newly given ones, without retraining. However, it is challenging to perform accurate kriging results because of the heterogeneous spatial relations and diverse temporal patterns. In this paper, we propose a novel inductive graph representation learning model for spatio-temporal kriging. We first encode heterogeneous spatial relations between the unobserved and observed locations by their spatial proximity, functional similarity, and transition probability. Based on each relation, we accurately aggregate the information of most correlated observed locations to produce inductive representations for the unobserved locations, by jointly modeling their similarities and differences. Then, we design relation-aware gated recurrent unit (GRU) networks to adaptively capture the temporal correlations in the generated sequence representations for each relation. Finally, we propose a multi-relation attention mechanism to dynamically fuse the complex spatio-temporal information at different time steps from multiple relations to compute the kriging output. Experimental results on three real-world datasets show that our proposed model outperforms state-of-the-art methods consistently, and the advantage is more significant when there are fewer observed locations. Our code is available at https://github.com/zhengchuanpan/INCREASE.

CVOct 13, 2022
HSurf-Net: Normal Estimation for 3D Point Clouds by Learning Hyper Surfaces

Qing Li, Yu-Shen Liu, Jin-San Cheng et al. · tsinghua

We propose a novel normal estimation method called HSurf-Net, which can accurately predict normals from point clouds with noise and density variations. Previous methods focus on learning point weights to fit neighborhoods into a geometric surface approximated by a polynomial function with a predefined order, based on which normals are estimated. However, fitting surfaces explicitly from raw point clouds suffers from overfitting or underfitting issues caused by inappropriate polynomial orders and outliers, which significantly limits the performance of existing methods. To address these issues, we introduce hyper surface fitting to implicitly learn hyper surfaces, which are represented by multi-layer perceptron (MLP) layers that take point features as input and output surface patterns in a high dimensional feature space. We introduce a novel space transformation module, which consists of a sequence of local aggregation layers and global shift layers, to learn an optimal feature space, and a relative position encoding module to effectively convert point clouds into the learned feature space. Our model learns hyper surfaces from the noise-less features and directly predicts normal vectors. We jointly optimize the MLP weights and module parameters in a data-driven manner to make the model adaptively find the most suitable surface pattern for various points. Experimental results show that our HSurf-Net achieves the state-of-the-art performance on the synthetic shape dataset, the real-world indoor and outdoor scene datasets. The code, data and pretrained models are publicly available.

CVMar 28, 2023
OpenInst: A Simple Query-Based Method for Open-World Instance Segmentation

Cheng Wang, Guoli Wang, Qian Zhang et al. · amazon-science

Open-world instance segmentation has recently gained significant popularitydue to its importance in many real-world applications, such as autonomous driving, robot perception, and remote sensing. However, previous methods have either produced unsatisfactory results or relied on complex systems and paradigms. We wonder if there is a simple way to obtain state-of-the-art results. Fortunately, we have identified two observations that help us achieve the best of both worlds: 1) query-based methods demonstrate superiority over dense proposal-based methods in open-world instance segmentation, and 2) learning localization cues is sufficient for open world instance segmentation. Based on these observations, we propose a simple query-based method named OpenInst for open world instance segmentation. OpenInst leverages advanced query-based methods like QueryInst and focuses on learning localization cues. Notably, OpenInst is an extremely simple and straightforward framework without any auxiliary modules or post-processing, yet achieves state-of-the-art results on multiple benchmarks. Specifically, in the COCO$\to$UVO scenario, OpenInst achieves a mask AR of 53.3, outperforming the previous best methods by 2.0 AR with a simpler structure. We hope that OpenInst can serve as a solid baselines for future research in this area.

ARJun 4
Space-CIM: Enabling Compute-In-Memory Accelerators for Thermally-Constrained Space Platforms

Sohan Salahuddin Mugdho, Md. Shahedul Hasan, Cheng Wang

The rapid growth in compute demand from artificial intelligence (AI) has driven a massive surge in data center construction, precipitating an energy and sustainability crisis. Motivated by the abundant solar energy in outer space and the recent sharp reduction in space launch costs, orbital data centers are emerging as a potential pathway for the future scaling of AI compute infrastructure. While the cold background in vacuum seems appealing for cooling, computing systems operating in space without convection ultimately rely on radiative cooling, requiring large-area radiators. Such limitations in thermal management pose a significant challenge for deploying the standard liquid/air-cooled computers in space. In this work, we investigate the impact of the thermal constraints in space on both graphics processing units (GPUs) with high-bandwidth memory (HBM) and the emerging compute-in-memory (CIM) accelerators. We develop a radiator-in-the-loop co-design methodology that directly links the permitted system TOPS (terra-operations per second) with the practical radiator cooling capacity in space. Our thermal simulations reveal that the separately located GPU die and HBMs create severe thermal hotspots under limited radiator capacity, necessitating GPU thermal throttling. In contrast, CIM accelerators exhibit a much more uniform heat distribution and consistently outperform GPUs in TOPS/W across a wide range of radiator budgets. We systematically evaluated the performance of CIM and GPU across various AI workloads and demonstrated that CIM has a magnified advantage for deployment in space under realistic thermal constraints.

SIMay 21
Fostering cultural change in research through innovative knowledge sharing, evaluation, and community engagement strategies

Junsuk Rho, Jinn-Kong Sheu, Andrew Forbes et al.

Scientific research needs a system that better values rigorous, reusable contributions. Although open knowledge and FAIR (findable, accessible, interoperable, and reusable) principles, along with coalitions and infrastructures, are accelerating reform, evaluation still often defaults to standardized metrics such as the h-index and journal impact factor. This misalignment still incentivizes quantity over quality, undermining integrity and reproducibility, and making it harder for communities to learn from and build on existing work. In this perspective, we bring together a global community of researchers, funding institutions, industrial partners, and publishers from 14 different countries across the 5 continents to advance ongoing debates on open science and research evaluation. Our contribution to the research practice is to offer an integrative conceptual framework, an open knowledge system, that links knowledge production, validation, assessment, and reuse into a single ecosystem view, and to translate into practical recommendations across key stakeholder roles (researchers, institutions/evaluators, funders, and publishers). By shifting attention from papers and bibliometrics toward reusable knowledge contributions and their validation, the framework highlights concrete levers for cultural change (what to share, when/how to validate, how to support reuse, and what to reward) and offers a practical lens that stakeholders can use to diagnose misaligned incentives and to design reforms that make high-quality, cumulative contributions visible and valued.

NAAug 7, 2012
Convergence Analysis of a Second Order Convex Splitting Scheme for the Modified Phase Field Crystal Equation

Arvind Baskaran, John Lowengrub, Cheng Wang et al.

In this paper we provide a detailed convergence analysis for an unconditionally energy stable, second-order accurate convex splitting scheme for the Modified Phase Field Crystal equation, a generalized damped wave equation for which the usual Phase Field Crystal equation is a special degenerate case. The fully discrete, fully second-order finite difference scheme in question was derived in a recent work [2]. An introduction of a new variable ψ, corresponding to the temporal derivative of the phase variable ϕ, could bring an accuracy reduction in the formal consistency estimate, because of the hyperbolic nature of the equation. A higher order consistency analysis by an asymptotic expansion is performed to overcome this difficulty. In turn, second order convergence in both time and space is established in a discrete L^\infty (0,T; H^3) norm.

NAAug 8, 2012
Energy Stable and Efficient Finite-Difference Nonlinear Multigrid Schemes for the Modified Phase Field Crystal Equation

Arvind Baskaran, Peng Zhou, Zhengzheng Hu et al.

In this paper we present two unconditionally energy stable finite difference schemes for the Modified Phase Field Crystal (MPFC) equation, a sixth-order nonlinear damped wave equation, of which the purely parabolic Phase Field Crystal (PFC) model can be viewed as a special case. The first is a convex splitting scheme based on an appropriate decomposition of the discrete energy and is first order accurate in time and second order accurate in space. The second is a new, fully second-order scheme that also respects the convex splitting of the energy. Both schemes are nonlinear but may be formulated from the gradients of strictly convex, coercive functionals. Thus, both are uniquely solvable regardless of the time and space step sizes. The schemes are solved by efficient nonlinear multigrid methods. Numerical results are presented demonstrating the accuracy, energy stability, efficiency, and practical utility of the schemes. In particular, we show that our multigrid solvers enjoy optimal, or nearly optimal complexity in the solution of the nonlinear schemes.

MAAug 2, 2022
Deep Reinforcement Learning for Multi-Agent Interaction

Ibrahim H. Ahmed, Cillian Brewitt, Ignacio Carlucho et al. · microsoft-research

The development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel machine learning algorithms for autonomous systems control, with a specific focus on deep reinforcement learning and multi-agent reinforcement learning. Research problems include scalable learning of coordinated agent policies and inter-agent communication; reasoning about the behaviours, goals, and composition of other agents from limited observations; and sample-efficient learning based on intrinsic motivation, curriculum learning, causal inference, and representation learning. This article provides a broad overview of the ongoing research portfolio of the group and discusses open problems for future directions.

NAJun 8, 2016
Convergence Analysis and Error Estimates for a Second Order Accurate Finite Element Method for the Cahn-Hilliard-Navier-Stokes System

Amanda E. Diegel, Cheng Wang, Xiaoming Wang et al.

In this paper, we present a novel second order in time mixed finite element scheme for the Cahn-Hilliard-Navier-Stokes equations with matched densities. The scheme combines a standard second order Crank-Nicholson method for the Navier-Stokes equations and a modification to the Crank-Nicholson method for the Cahn-Hilliard equation. In particular, a second order Adams-Bashforth extrapolation and a trapezoidal rule are included to help preserve the energy stability natural to the Cahn-Hilliard equation. We show that our scheme is unconditionally energy stable with respect to a modification of the continuous free energy of the PDE system. Specifically, the discrete phase variable is shown to be bounded in $\ell^\infty \left(0,T;L^\infty\right)$ and the discrete chemical potential bounded in $\ell^\infty \left(0,T;L^2\right)$, for any time and space step sizes, in two and three dimensions, and for any finite final time $T$. We subsequently prove that these variables along with the fluid velocity converge with optimal rates in the appropriate energy norms in both two and three dimensions.

NADec 17, 2017
An energy stable fourth order finite difference scheme for the Cahn-Hilliard equation

Kelong Cheng, Wenqiang Feng, Cheng Wang et al.

In this paper we propose and analyze an energy stable numerical scheme for the Cahn-Hilliard equation, with second order accuracy in time and the fourth order finite difference approximation in space. In particular, the truncation error for the long stencil fourth order finite difference approximation, over a uniform numerical grid with a periodic boundary condition, is analyzed, via the help of discrete Fourier analysis instead of the the standard Taylor expansion. This in turn results in a reduced regularity requirement for the test function. In the temporal approximation, we apply a second order BDF stencil, combined with a second order extrapolation formula applied to the concave diffusion term, as well as a second order artificial Douglas-Dupont regularization term, for the sake of energy stability. As a result, the unique solvability, energy stability are established for the proposed numerical scheme, and an optimal rate convergence analysis is derived in the $\ell^\infty (0,T; \ell^2) \cap \ell^2 (0,T; H_h^2)$ norm. A few numerical experiments are presented, which confirm the robustness and accuracy of the proposed scheme.

NAOct 21, 2016
Convergence Analysis for Second Order Accurate Convex Splitting Schemes for the Periodic Nonlocal Allen-Cahn and Cahn-Hilliard Equations

Zhen Guan, John Lowengrub, Cheng Wang

In this paper we provide a detailed convergence analysis for fully discrete second order (in both time and space) numerical schemes for nonlocal Allen-Cahn (nAC) and nonlocal Cahn-Hilliard (nCH) equations. The unconditional unique solvability and energy stability ensures $\ell^4$ stability. The convergence analysis for the nAC equation follows the standard procedure of consistency and stability estimate for the numerical error function. For the nCH equation, due to the complicated form of the nonlinear term, a careful expansion of its discrete gradient is undertaken and an $H^{-1}$ inner product estimate of this nonlinear numerical error is derived to establish convergence. In addition, an a-priori $W^{1,\infty}$ bound of the numerical solution at the discrete level is needed in the error estimate. Such a bound can be obtained by performing a higher order consistency analysis by using asymptotic expansions for the numerical solution. Following the technique originally proposed by Strang (e.g., 1964), instead of the standard comparison between the exact and numerical solutions, an error estimate between the numerical solution and the constructed approximate solution yields an $O( s^3 + h^4)$ convergence in $\ell^\infty (0, T; \ell^2)$ norm, which leads to the necessary bound under a standard constraint $s \le C h$. Here, we also prove convergence of the scheme in the maximum norm under the same constraint.

CVApr 13Code
LEADER: Learning Reliable Local-to-Global Correspondences for LiDAR Relocalization

Jianshi Wu, Minghang Zhu, Dunqiang Liu et al.

LiDAR relocalization has attracted increasing attention as it can deliver accurate 6-DoF pose estimation in complex 3D environments. Recent learning-based regression methods offer efficient solutions by directly predicting global poses without the need for explicit map storage. However, these methods often struggle in challenging scenes due to their equal treatment of all predicted points, which is vulnerable to noise and outliers. In this paper, we propose LEADER, a robust LiDAR-based relocalization framework enhanced by a simple, yet effective geometric encoder. Specifically, a Robust Projection-based Geometric Encoder architecture which captures multi-scale geometric features is first presented to enhance descriptiveness in geometric representation. A Truncated Relative Reliability loss is then formulated to model point-wise ambiguity and mitigate the influence of unreliable predictions. Extensive experiments on the Oxford RobotCar and NCLT datasets demonstrate that LEADER outperforms state-of-the-art methods, achieving 24.1% and 73.9% relative reductions in position error over existing techniques, respectively. The source code is released on https://github.com/JiansW/LEADER.

NAMar 8, 2019
A third order exponential time differencing numerical scheme for no-slope-selection epitaxial thin film model with energy stability

Kelong Cheng, Zhonghua Qiao, Cheng Wang

In this paper we propose and analyze a (temporally) third order accurate exponential time differencing (ETD) numerical scheme for the no-slope-selection (NSS) equation of the epitaxial thin film growth model, with Fourier pseudo-spectral discretization in space. A linear splitting is applied to the physical model, and an ETD-based multistep approximation is used for time integration of the corresponding equation. In addition, a third order accurate Douglas-Dupont regularization term, in the form of $-A \dt^2 ϕ_0 (L_N) Δ_N^2 ( u^{n+1} - u^n)$, is added in the numerical scheme. A careful Fourier eigenvalue analysis results in the energy stability in a modified version, and a theoretical justification of the coefficient $A$ becomes available. As a result of this energy stability analysis, a uniform in time bound of the numerical energy is obtained. And also, the optimal rate convergence analysis and error estimate are derived in details, in the $\ell^\infty (0,T; H_h^1) \cap \ell^2 (0,T; H_h^3)$ norm, with the help of a careful eigenvalue bound estimate, combined with the nonlinear analysis for the NSS model. This convergence estimate is the first such result for a third order accurate scheme for a gradient flow. Some numerical simulation results are presented to demonstrate the efficiency of the numerical scheme and the third order convergence. The long time simulation results for $\varepsilon=0.02$ (up to $T=3 \times 10^5$) have indicated a logarithm law for the energy decay, as well as the power laws for growth of the surface roughness and the mound width. In particular, the power index for the surface roughness and the mound width growth, created by the third order numerical scheme, is more accurate than those produced by certain second order energy stable schemes in the existing literature.

NANov 9, 2016
A Second Order Energy Stable Scheme for the Cahn-Hilliard-Hele-Shaw Equations

Wenbin Chen, Wenqiang Feng, Yuan Liu et al.

We present a second-order-in-time finite difference scheme for the Cahn-Hilliard-Hele-Shaw equations. This numerical method is uniquely solvable and unconditionally energy stable. At each time step, this scheme leads to a system of nonlinear equations that can be efficiently solved by a nonlinear multigrid solver. Owing to the energy stability, we derive an $\ell^2 (0,T; H_h^3)$ stability of the numerical scheme. To overcome the difficulty associated with the convection term $\nabla \cdot (ϕ\boldsymbol{u})$, we perform an $\ell^\infty (0,T; H_h^1)$ error estimate instead of the classical $\ell^\infty (0,T; \ell^2)$ one to obtain the optimal rate convergence analysis. In addition, various numerical simulations are carried out, which demonstrate the accuracy and efficiency of the proposed numerical scheme.

CVNov 30, 2023Code
E2PNet: Event to Point Cloud Registration with Spatio-Temporal Representation Learning

Xiuhong Lin, Changjie Qiu, Zhipeng Cai et al.

Event cameras have emerged as a promising vision sensor in recent years due to their unparalleled temporal resolution and dynamic range. While registration of 2D RGB images to 3D point clouds is a long-standing problem in computer vision, no prior work studies 2D-3D registration for event cameras. To this end, we propose E2PNet, the first learning-based method for event-to-point cloud registration. The core of E2PNet is a novel feature representation network called Event-Points-to-Tensor (EP2T), which encodes event data into a 2D grid-shaped feature tensor. This grid-shaped feature enables matured RGB-based frameworks to be easily used for event-to-point cloud registration, without changing hyper-parameters and the training procedure. EP2T treats the event input as spatio-temporal point clouds. Unlike standard 3D learning architectures that treat all dimensions of point clouds equally, the novel sampling and information aggregation modules in EP2T are designed to handle the inhomogeneity of the spatial and temporal dimensions. Experiments on the MVSEC and VECtor datasets demonstrate the superiority of E2PNet over hand-crafted and other learning-based methods. Compared to RGB-based registration, E2PNet is more robust to extreme illumination or fast motion due to the use of event data. Beyond 2D-3D registration, we also show the potential of EP2T for other vision tasks such as flow estimation, event-to-image reconstruction and object recognition. The source code can be found at: https://github.com/Xmu-qcj/E2PNet.

CVMar 17Code
AW-MoE: All-Weather Mixture of Experts for Robust Multi-Modal 3D Object Detection

Hongwei Lin, Xun Huang, Chenglu Wen et al.

Robust 3D object detection under adverse weather conditions is crucial for autonomous driving. However, most existing methods simply combine all weather samples for training while overlooking data distribution discrepancies across different weather scenarios, leading to performance conflicts. To address this issue, we introduce AW-MoE, the framework that innovatively integrates Mixture of Experts (MoE) into weather-robust multi-modal 3D object detection approaches. AW-MoE incorporates Image-guided Weather-aware Routing (IWR), which leverages the superior discriminability of image features across weather conditions and their invariance to scene variations for precise weather classification. Based on this accurate classification, IWR selects the top-K most relevant Weather-Specific Experts (WSE) that handle data discrepancies, ensuring optimal detection under all weather conditions. Additionally, we propose a Unified Dual-Modal Augmentation (UDMA) for synchronous LiDAR and 4D Radar dual-modal data augmentation while preserving the realism of scenes. Extensive experiments on the real-world dataset demonstrate that AW-MoE achieves ~ 15% improvement in adverse-weather performance over state-of-the-art methods, while incurring negligible inference overhead. Moreover, integrating AW-MoE into established baseline detectors yields performance improvements surpassing current state-of-the-art methods. These results show the effectiveness and strong scalability of our AW-MoE. We will release the code publicly at https://github.com/windlinsherlock/AW-MoE.

CVMay 12
SOAR: Regression-based LiDAR Relocalization for UAVs

Hengyu Mu, Jianshi Wu, Yuxin Guo et al.

Regression-based LiDAR relocalization has recently emerged as a promising solution for high-precision positioning in GNSS-denied environments. However, these methods are primarily tailored to autonomous driving, exhibiting significantly degraded accuracy in unmanned aerial vehicle (UAV) scenarios due to arbitrary pose variations and irregular flight paths. In this paper, we propose SOAR, a regression-based LiDAR relocalization framework for UAVs. Specifically, we introduce a locality-preserving sliding window attention module with locally invariant positional encoding to capture discriminative geometric structures robust to viewpoint changes. A coordinate-independent feature initialization module is further designed to eliminate sensitivity to global transformations. Furthermore, most existing UAV datasets are limited to evaluate LiDAR relocalization in real-world, due to the lack of synchronized LiDAR scans, accurate 6-DoF poses, or multiple traversals. Thus, we construct a large-scale UAV LiDAR localization dataset with 4 scenes and 13 irregular paths exhibiting rotation and altitude variations, providing a more realistic benchmark for UAVs. Extensive experiments demonstrate that our method achieves state-of-the-art performance, improving the localization success rate by 40% and reducing mean error over 10m on UAVLoc. Our code and dataset will be released soon.

NANov 24, 2016
Preconditioned Steepest Descent Methods for some Nonlinear Elliptic Equations Involving p-Laplacian Terms

Wenqiang Feng, Abner J. Salgado, Cheng Wang et al.

We describe and analyze preconditioned steepest descent (PSD) solvers for fourth and sixth-order nonlinear elliptic equations that include p-Laplacian terms on periodic domains in 2 and 3 dimensions. The highest and lowest order terms of the equations are constant-coefficient, positive linear operators, which suggests a natural preconditioning strategy. Such nonlinear elliptic equations often arise from time discretization of parabolic equations that model various biological and physical phenomena, in particular, liquid crystals, thin film epitaxial growth and phase transformations. The analyses of the schemes involve the characterization of the strictly convex energies associated with the equations. We first give a general framework for PSD in generic Hilbert spaces. Based on certain reasonable assumptions of the linear pre-conditioner, a geometric convergence rate is shown for the nonlinear PSD iteration. We then apply the general the theory to the fourth and sixth-order problems of interest, making use of Sobolev embedding and regularity results to confirm the appropriateness of our pre-conditioners for the regularized p-Lapacian problems. Our results include a sharper theoretical convergence result for p-Laplacian systems compared to what may be found in existing works. We demonstrate rigorously how to apply the theory in the finite dimensional setting using finite difference discretization methods. Numerical simulations for some important physical application problems -- including thin film epitaxy with slope selection and the square phase field crystal model -- are carried out to verify the efficiency of the scheme.

NANov 19, 2016
Convergence Analysis and Numerical Implementation of a Second Order Numerical Scheme for the Three-Dimensional Phase Field Crystal Equation

Lixiu Dong, Wenqiang Feng, Cheng Wang et al.

In this paper we analyze and implement a second-order-in-time numerical scheme for the three-dimensional phase field crystal (PFC) equation. The numerical scheme was proposed in [46], with the unique solvability and unconditional energy stability established. However, its convergence analysis remains open. We present a detailed convergence analysis in this article, in which the maximum norm estimate of the numerical solution over grid points plays an essential role. Moreover, we outline the detailed multigrid method to solve the highly nonlinear numerical scheme over a cubic domain, and various three-dimensional numerical results are presented, including the numerical convergence test, complexity test of the multigrid solver and the polycrystal growth simulation.

CLMay 24Code
STREAM: A Data-Centric Framework for Mining High-Value Task-Oriented Dialogues from Streaming Media

Liang Xue, Haoyu Liu, Cheng Wang et al.

Large language models for vertical domains are bottlenecked by the scarcity of complex, domain-specific task-oriented dialogues. Existing data acquisition pipelines face a persistent trilemma: expert annotation is expensive, real-world service conversations are constrained by privacy and commercial restrictions, and static corpora quickly become temporally stale. We propose Stream, a data-centric framework that leverages publicly available streaming media (live streams and short videos) to synthesize high-value service dialogues at scale. Stream mines authentic interaction signals from noisy streams and synthesizes conversations by integrating role-grounded persona construction with Conversational Blueprint construction; it further adopts retrieval-augmented generation (RAG) to support knowledge-aware responses. Based on Stream, we release StreamDial, a large-scale multi-domain dataset covering Automotive, Restaurant, and Hotel. StreamDial contains 87,498 dialogue sessions and 1,497,320 turns in total, with an average of 17.11 turns per session and a comparable scale across domains. Each session is organized as a structured quadruplet $\langle P_u, P_a, B, H \rangle$ that pairs dialogue history with explicit user/agent personas and a Conversational Blueprint, capturing realistic service behaviors such as requirement mining, constraint conflicts, negotiation, and recovery. Evaluations with automatic judges and downstream tasks show that StreamDial improves intrinsic dialogue quality over strong baselines, and models trained with StreamDial improve Dialogue State Tracking across backbones; we further report a completed human-evaluation set and encouraging multilingual transfer on Qwen3-8B under a controlled training budget. The data is released in https://github.com/hitxueliang/DialogDataSetBySTREAM.

ARJun 1
CRAM-ER: Error-Resilient Spintronic Computational Random Access Memory for Scalable In-Memory Computation

Sohan Salahuddin Mugdho, Md. Shahedul Hasan, Brahmdutta Dixit et al.

Deep neural networks (DNNs) have achieved state-of-the-art performance across diverse domains. However, typical Von Neumann compute paradigms face severe memory bottlenecks. Emerging near-memory and compute-in-memory approaches alleviate this but incur significant peripheral overhead. Computational Random Access Memory (CRAM) based on MRAM enables in-situ logic without peripheral overhead, offering a dense, energy-efficient solution. However, probabilistic MRAM switching induces gate-level errors that limit the scalability and reliability of CRAM for accelerating DNN. Moreover, the large number of sequential MRAM writes severely constrains CRAM throughput. To address these challenges, we propose an error-resilient CRAM (CRAM-ER) architecture for scalable in-memory matrix-vector multiplications (MVMs). Our error-aware hardware-software co-design framework leverages a hybrid spintronic-CRAM + CMOS adder-tree architecture to mitigate the impact of device-level errors, demonstrating MVM functionality with high area and energy efficiency. We further develop an error-aware model fine-tuning and fine-grained error correction for enhanced error resilience. Evaluations of the CMOS+spintronic hybrid architecture on DNN benchmarks show near-lossless accuracy while reducing CRAM latency by up to 2 orders of magnitude, outperforming CPU/GPU+high-bandwidth DRAM in both energy efficiency and energy-delay product.

CVSep 3, 2024Code
When 3D Partial Points Meets SAM: Tooth Point Cloud Segmentation with Sparse Labels

Yifan Liu, Wuyang Li, Cheng Wang et al.

Tooth point cloud segmentation is a fundamental task in many orthodontic applications. Current research mainly focuses on fully supervised learning which demands expensive and tedious manual point-wise annotation. Although recent weakly-supervised alternatives are proposed to use weak labels for 3D segmentation and achieve promising results, they tend to fail when the labels are extremely sparse. Inspired by the powerful promptable segmentation capability of the Segment Anything Model (SAM), we propose a framework named SAMTooth that leverages such capacity to complement the extremely sparse supervision. To automatically generate appropriate point prompts for SAM, we propose a novel Confidence-aware Prompt Generation strategy, where coarse category predictions are aggregated with confidence-aware filtering. Furthermore, to fully exploit the structural and shape clues in SAM's outputs for assisting the 3D feature learning, we advance a Mask-guided Representation Learning that re-projects the generated tooth masks of SAM into 3D space and constrains these points of different teeth to possess distinguished representations. To demonstrate the effectiveness of the framework, we conduct experiments on the public dataset and surprisingly find with only 0.1\% annotations (one point per tooth), our method can surpass recent weakly supervised methods by a large margin, and the performance is even comparable to the recent fully-supervised methods, showcasing the significant potential of applying SAM to 3D perception tasks with sparse labels. Code is available at https://github.com/CUHK-AIM-Group/SAMTooth.

CVMar 26Code
V2U4Real: A Real-world Large-scale Dataset for Vehicle-to-UAV Cooperative Perception

Weijia Li, Haoen Xiang, Tianxu Wang et al.

Modern autonomous vehicle perception systems are often constrained by occlusions, blind spots, and limited sensing range. While existing cooperative perception paradigms, such as Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I), have demonstrated their effectiveness in mitigating these challenges, they remain limited to ground-level collaboration and cannot fully address large-scale occlusions or long-range perception in complex environments. To advance research in cross-view cooperative perception, we present V2U4Real, the first large-scale real-world multi-modal dataset for Vehicle-to-UAV (V2U) cooperative object perception. V2U4Real is collected by a ground vehicle and a UAV equipped with multi-view LiDARs and RGB cameras. The dataset covers urban streets, university campuses, and rural roads under diverse traffic scenarios, comprising over 56K LiDAR frames, 56K multi-view camera images, and 700K annotated 3D bounding boxes across four classes. To support a wide range of research tasks, we establish benchmarks for single-agent 3D object detection, cooperative 3D object detection, and object tracking. Comprehensive evaluations of several state-of-the-art models demonstrate the effectiveness of V2U cooperation in enhancing perception robustness and long-range awareness. The V2U4Real dataset and codebase is available at https://github.com/VjiaLi/V2U4Real.

CVApr 9, 2023Code
DSMNet: Deep High-precision 3D Surface Modeling from Sparse Point Cloud Frames

Changjie Qiu, Zhiyong Wang, Xiuhong Lin et al.

Existing point cloud modeling datasets primarily express the modeling precision by pose or trajectory precision rather than the point cloud modeling effect itself. Under this demand, we first independently construct a set of LiDAR system with an optical stage, and then we build a HPMB dataset based on the constructed LiDAR system, a High-Precision, Multi-Beam, real-world dataset. Second, we propose an modeling evaluation method based on HPMB for object-level modeling to overcome this limitation. In addition, the existing point cloud modeling methods tend to generate continuous skeletons of the global environment, hence lacking attention to the shape of complex objects. To tackle this challenge, we propose a novel learning-based joint framework, DSMNet, for high-precision 3D surface modeling from sparse point cloud frames. DSMNet comprises density-aware Point Cloud Registration (PCR) and geometry-aware Point Cloud Sampling (PCS) to effectively learn the implicit structure feature of sparse point clouds. Extensive experiments demonstrate that DSMNet outperforms the state-of-the-art methods in PCS and PCR on Multi-View Partial Point Cloud (MVP) database. Furthermore, the experiments on the open source KITTI and our proposed HPMB datasets show that DSMNet can be generalized as a post-processing of Simultaneous Localization And Mapping (SLAM), thereby improving modeling precision in environments with sparse point clouds.

NAMay 22, 2011
Long time stability of a classical efficient scheme for two dimensional Navier-Stokes equations

Sigal Gottlieb, Florentina Tone, Cheng Wang et al.

We prove that a popular classical implicit-explicit scheme for the 2D incompressible Navier--Stokes equations that treats the viscous term implicitly while the nonlinear advection term explicitly is long time stable provided that the time step is sufficiently small in the case with periodic boundary conditions. The long time stability in the $L^2$ and $H^1$ norms further leads to the convergence of the global attractors and invariant measures of the scheme to those of the NSE itself at vanishing time step. Both semi-discrete in time and fully discrete schemes with either Galerkin Fourier spectral or collocation Fourier spectral methods are considered.

SYJun 26, 2018
Completely Distributed Guaranteed-performance Consensualization for High-order Multiagent Systems with Switching Topologies

Jianxiang Xi, Cheng Wang, Hao Liu et al.

The guaranteed-performance consensualization for high-order linear and nonlinear multiagent systems with switching topologies is respectively realized in a completely distributed manner in the sense that consensus design criteria are independent of interaction topologies and switching motions. The current paper firstly proposes an adaptive consensus protocol with guaranteed-performance constraints and switching topologies, where interaction weights among neighboring agents are adaptively adjusted and state errors among all agents can be regulated. Then, a new translation-adaptive strategy is shown to realize completely distributed guaranteed-performance consensus control and an adaptive guaranteed-performance consensualization criterion is given on the basis of the Riccati inequality. Furthermore, an approach to regulate the consensus control gain and the guaranteed-performance cost is proposed in terms of linear matrix inequalities. Moreover, main conclusions for linear multiagent systems are extended to Lipschitz nonlinear cases. Finally, two numerical examples are provided to demonstrate theoretical results.

CVNov 22, 2022
Transformation-Equivariant 3D Object Detection for Autonomous Driving

Hai Wu, Chenglu Wen, Wei Li et al.

3D object detection received increasing attention in autonomous driving recently. Objects in 3D scenes are distributed with diverse orientations. Ordinary detectors do not explicitly model the variations of rotation and reflection transformations. Consequently, large networks and extensive data augmentation are required for robust detection. Recent equivariant networks explicitly model the transformation variations by applying shared networks on multiple transformed point clouds, showing great potential in object geometry modeling. However, it is difficult to apply such networks to 3D object detection in autonomous driving due to its large computation cost and slow reasoning speed. In this work, we present TED, an efficient Transformation-Equivariant 3D Detector to overcome the computation cost and speed issues. TED first applies a sparse convolution backbone to extract multi-channel transformation-equivariant voxel features; and then aligns and aggregates these equivariant features into lightweight and compact representations for high-performance 3D object detection. On the highly competitive KITTI 3D car detection leaderboard, TED ranked 1st among all submissions with competitive efficiency.

ROMay 31
Tether-Aware Dynamic Collision Avoidance for USV-HROV Systems

Yang Gu, Ziyang Hong, Xuanlin Chen et al.

Heterogeneous marine robotic systems composed of an unmanned surface vehicle (USV) and a hybrid remotely operated vehicle (HROV) have shown great potential for subsea cable inspection. In such missions, the USV tracks the HROV at the surface while supplying power and communication through an umbilical tether. However, dynamic collision avoidance for the USV during HROV tracking is challenging because the submerged tether may scrape against passing vessels, while evasive maneuvers can enlarge the USV--HROV separation, thereby increasing the likelihood of tether tautness and compromising HROV operations. To address these challenges, this work proposes a tether-aware dynamic collision avoidance method for a USV tracking an HROV. First, a tether safety-aware planar domain is introduced to represent the three-dimensional collision risk between the tether and obstacle vessels without an explicit tether shape model. Second, a tether tautness-aware velocity obstacle method is developed to achieve safe avoidance while reducing the likelihood of tether tautness. Finally, the method is integrated with line-of-sight guidance to coordinate HROV tracking and collision avoidance. Gazebo-based simulations show that the proposed method avoids dynamic obstacle vessels while maintaining tether safety and reducing the likelihood of tether tautness during USV evasive maneuvers.

SYFeb 22, 2018
Dynamic Output Feedback Guaranteed-Cost Synchronization for Multiagent Networks with Given Cost Budgets

Jianxiang Xi, Cheng Wang, Hao Liu et al.

The current paper addresses the distributed guaranteed-cost synchronization problems for general high-order linear multiagent networks. Existing works on the guaranteed-cost synchronization usually require all state information of neighboring agents and cannot give the cost budget previously. For both leaderless and leader-following interaction topologies, the current paper firstly proposes a dynamic output feedback synchronization protocol with guaranteed-cost constraints, which can realize the tradeoff design between the energy consumption and the synchronization regulation performance with the given cost budget. Then, according to different structure features of interaction topologies, leaderless and leader-following guaranteed-cost synchronization analysis and design criteria are presented, respectively, and an algorithm is proposed to deal with the impacts of nonlinear terms by using both synchronization analysis and design criteria. Especially, an explicit expression of the synchronization function is shown for leaderless cases, which is independent of protocol states and the given cost budget. Finally, numerical examples are presented to demonstrate theoretical results.

LGJun 21, 2023
FLGo: A Fully Customizable Federated Learning Platform

Zheng Wang, Xiaoliang Fan, Zhaopeng Peng et al.

Federated learning (FL) has found numerous applications in healthcare, finance, and IoT scenarios. Many existing FL frameworks offer a range of benchmarks to evaluate the performance of FL under realistic conditions. However, the process of customizing simulations to accommodate application-specific settings, data heterogeneity, and system heterogeneity typically remains unnecessarily complicated. This creates significant hurdles for traditional ML researchers in exploring the usage of FL, while also compromising the shareability of codes across FL frameworks. To address this issue, we propose a novel lightweight FL platform called FLGo, to facilitate cross-application FL studies with a high degree of shareability. Our platform offers 40+ benchmarks, 20+ algorithms, and 2 system simulators as out-of-the-box plugins. We also provide user-friendly APIs for quickly customizing new plugins that can be readily shared and reused for improved reproducibility. Finally, we develop a range of experimental tools, including parallel acceleration, experiment tracker and analyzer, and parameters auto-tuning. FLGo is maintained at \url{flgo-xmu.github.io}.

CVMar 16, 2023
SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments

Yudi Dai, Yitai Lin, Xiping Lin et al.

We present SLOPER4D, a novel scene-aware dataset collected in large urban environments to facilitate the research of global human pose estimation (GHPE) with human-scene interaction in the wild. Employing a head-mounted device integrated with a LiDAR and camera, we record 12 human subjects' activities over 10 diverse urban scenes from an egocentric view. Frame-wise annotations for 2D key points, 3D pose parameters, and global translations are provided, together with reconstructed scene point clouds. To obtain accurate 3D ground truth in such large dynamic scenes, we propose a joint optimization method to fit local SMPL meshes to the scene and fine-tune the camera calibration during dynamic motions frame by frame, resulting in plausible and scene-natural 3D human poses. Eventually, SLOPER4D consists of 15 sequences of human motions, each of which has a trajectory length of more than 200 meters (up to 1,300 meters) and covers an area of more than 2,000 $m^2$ (up to 13,000 $m^2$), including more than 100K LiDAR frames, 300k video frames, and 500K IMU-based motion frames. With SLOPER4D, we provide a detailed and thorough analysis of two critical tasks, including camera-based 3D HPE and LiDAR-based 3D HPE in urban environments, and benchmark a new task, GHPE. The in-depth analysis demonstrates SLOPER4D poses significant challenges to existing methods and produces great research opportunities. The dataset and code are released at \url{http://www.lidarhumanmotion.net/sloper4d/}

NAJul 2, 2018
Numerical methods for Porous Medium Equation by an Energetic Variational Approach

Chenghua Duan, Chun Liu, Cheng Wang et al.

We study numerical methods for porous media equation (PME). There are two important characteristics: the finite speed propagation of the free boundary and the potential waiting time, which make the problem not easy to handle. Based on different dissipative energy laws, we develop two numerical schemes by an energetic variational approach. Firstly, based on $f \log f$ as the total energy form of the dissipative law, we obtain the trajectory equation, and then construct a fully discrete scheme. It is proved that the scheme is uniquely solvable on an admissible convex set by taking the advantage of the singularity of the total energy. Next, based on $\frac{1}{2 f}$ as the total energy form of the dissipation law, we construct a linear numerical scheme for the corresponding trajectory equation. Both schemes preserve the corresponding discrete dissipation law. Meanwhile, under some smoothness assumption, it is proved, by a higher order expansion technique, that both schemes are second-order convergent in space and first-order convergent in time. Each scheme yields a good approximation for the solution and the free boundary. No oscillation is observed for the numerical solution around the free boundary. Furthermore, the waiting time problem could be naturally treated, which has been a well-known difficult issue for all the existence methods. Due to its linear nature, the second scheme is more efficient.

CVMar 28, 2022
LiDARCap: Long-range Marker-less 3D Human Motion Capture with LiDAR Point Clouds

Jialian Li, Jingyi Zhang, Zhiyong Wang et al.

Existing motion capture datasets are largely short-range and cannot yet fit the need of long-range applications. We propose LiDARHuman26M, a new human motion capture dataset captured by LiDAR at a much longer range to overcome this limitation. Our dataset also includes the ground truth human motions acquired by the IMU system and the synchronous RGB images. We further present a strong baseline method, LiDARCap, for LiDAR point cloud human motion capture. Specifically, we first utilize PointNet++ to encode features of points and then employ the inverse kinematics solver and SMPL optimizer to regress the pose through aggregating the temporally encoded features hierarchically. Quantitative and qualitative experiments show that our method outperforms the techniques based only on RGB images. Ablation experiments demonstrate that our dataset is challenging and worthy of further research. Finally, the experiments on the KITTI Dataset and the Waymo Open Dataset show that our method can be generalized to different LiDAR sensor settings.

NAOct 8, 2016
An Energy Stable Finite-Difference Scheme for Functionalized Cahn-Hilliard Equation and its Convergence Analysis

Wenqiang Feng, Zhen Guan, John Lowengrub et al.

We present and analyze an unconditionally energy stable and convergent finite difference scheme for the Functionalized Cahn-Hilliard equation. One key difficulty associated with the energy stability is based on the fact that one nonlinear energy functional term in the expansion appears as non-convex, non-concave. To overcome this subtle difficulty, we add two auxiliary terms to make the combined term convex, which in turns yields a convex-concave decomposition of the physical energy. As a result, an application of the convex splitting methodology assures both the unique solvability and the unconditional energy stability of the proposed numerical scheme. To deal with a 4-Laplacian solver in an $H^{-1}$ gradient flow at each time step, we apply an efficient preconditioned steepest descent algorithm to solve the corresponding nonlinear systems. In addition, a global in time $H_{\rm per}^2$ stability of the numerical scheme is established at a theoretical level, which in turn ensures the full order convergence analysis of the scheme. A few numerical results are presented, which confirm the stability and accuracy of the proposed numerical scheme.

AIApr 21Code
EvoMaster: A Foundational Evolving Agent Framework for Agentic Science at Scale

Xinyu Zhu, Yuzhu Cai, Zexi Liu et al.

The convergence of large language models and agents is catalyzing a new era of scientific discovery: Agentic Science. While the scientific method is inherently iterative, existing agent frameworks are predominantly static, narrowly scoped, and lack the capacity to learn from trial and error. To bridge this gap, we present EvoMaster, a foundational evolving agent framework engineered specifically for Agentic Science at Scale. Driven by the core principle of continuous self-evolution, EvoMaster empowers agents to iteratively refine hypotheses, self-critique, and progressively accumulate knowledge across experimental cycles, faithfully mirroring human scientific inquiry. Crucially, as a domain-agnostic base harness, EvoMaster is exceptionally easy to scale up -- enabling developers to build and deploy highly capable, self-evolving scientific agents for arbitrary disciplines in approximately 100 lines of code. Built upon EvoMaster, we incubated the SciMaster ecosystem across domains such as machine learning, physics, and general science. Evaluations on four authoritative benchmarks (Humanity's Last Exam, MLE-Bench Lite, BrowseComp, and FrontierScience) demonstrate that EvoMaster achieves state-of-the-art scores of 41.1%, 75.8%, 73.3%, and 53.3%, respectively. It comprehensively outperforms the general-purpose baseline OpenClaw with relative improvements ranging from +159% to +316%, robustly validating its efficacy and generality as the premier foundational framework for the next generation of autonomous scientific discovery. EvoMaster is available at https://github.com/sjtu-sai-agents/EvoMaster.

NAMar 26, 2018
Numerical Complete Solution for Random Genetic Drift by Energetic Variational Approach

Chenghua Duan, Chun Liu, Cheng Wang et al.

In this paper, we focus on numerical solutions for random genetic drift problem, which is governed by a degenerated convection-dominated parabolic equation. Due to the fixation phenomenon of genes, Dirac delta singularities will develop at boundary points as time evolves. Based on an energetic variational approach (EnVarA), a balance between the maximal dissipation principle (MDP) and least action principle (LAP), we obtain the trajectory equation. In turn, a numerical scheme is proposed using a convex splitting technique, with the unique solvability (on a convex set) and the energy decay property (in time) justified at a theoretical level. Numerical examples are presented for cases of pure drift and drift with semi-selection. The remarkable advantage of this method is its ability to catch the Dirac delta singularity close to machine precision over any equidistant grid.

CVMar 17, 2022
HSC4D: Human-centered 4D Scene Capture in Large-scale Indoor-outdoor Space Using Wearable IMUs and LiDAR

Yudi Dai, Yitai Lin, Chenglu Wen et al.

We propose Human-centered 4D Scene Capture (HSC4D) to accurately and efficiently create a dynamic digital world, containing large-scale indoor-outdoor scenes, diverse human motions, and rich interactions between humans and environments. Using only body-mounted IMUs and LiDAR, HSC4D is space-free without any external devices' constraints and map-free without pre-built maps. Considering that IMUs can capture human poses but always drift for long-period use, while LiDAR is stable for global localization but rough for local positions and orientations, HSC4D makes both sensors complement each other by a joint optimization and achieves promising results for long-term capture. Relationships between humans and environments are also explored to make their interaction more realistic. To facilitate many down-stream tasks, like AR, VR, robots, autonomous driving, etc., we propose a dataset containing three large scenes (1k-5k $m^2$) with accurate dynamic human motions and locations. Diverse scenarios (climbing gym, multi-story building, slope, etc.) and challenging human activities (exercising, walking up/down stairs, climbing, etc.) demonstrate the effectiveness and the generalization ability of HSC4D. The dataset and code are available at http://www.lidarhumanmotion.net/hsc4d/.

IVJun 21, 2023
Encoding Enhanced Complex CNN for Accurate and Highly Accelerated MRI

Zimeng Li, Sa Xiao, Cheng Wang et al. · amazon-science

Magnetic resonance imaging (MRI) using hyperpolarized noble gases provides a way to visualize the structure and function of human lung, but the long imaging time limits its broad research and clinical applications. Deep learning has demonstrated great potential for accelerating MRI by reconstructing images from undersampled data. However, most existing deep conventional neural networks (CNN) directly apply square convolution to k-space data without considering the inherent properties of k-space sampling, limiting k-space learning efficiency and image reconstruction quality. In this work, we propose an encoding enhanced (EN2) complex CNN for highly undersampled pulmonary MRI reconstruction. EN2 employs convolution along either the frequency or phase-encoding direction, resembling the mechanisms of k-space sampling, to maximize the utilization of the encoding correlation and integrity within a row or column of k-space. We also employ complex convolution to learn rich representations from the complex k-space data. In addition, we develop a feature-strengthened modularized unit to further boost the reconstruction performance. Experiments demonstrate that our approach can accurately reconstruct hyperpolarized 129Xe and 1H lung MRI from 6-fold undersampled k-space data and provide lung function measurements with minimal biases compared with fully-sampled image. These results demonstrate the effectiveness of the proposed algorithmic components and indicate that the proposed approach could be used for accelerated pulmonary MRI in research and clinical lung disease patient care.

CVOct 31, 2025Code
NAUTILUS: A Large Multimodal Model for Underwater Scene Understanding

Wei Xu, Cheng Wang, Dingkang Liang et al.

Underwater exploration offers critical insights into our planet and attracts increasing attention for its broader applications in resource exploration, national security, etc. We study the underwater scene understanding methods, which aim to achieve automated underwater exploration. The underwater scene understanding task demands multi-task perceptions from multiple granularities. However, the absence of large-scale underwater multi-task instruction-tuning datasets hinders the progress of this research. To bridge this gap, we construct NautData, a dataset containing 1.45 M image-text pairs supporting eight underwater scene understanding tasks. It enables the development and thorough evaluation of the underwater scene understanding models. Underwater image degradation is a widely recognized challenge that interferes with underwater tasks. To improve the robustness of underwater scene understanding, we introduce physical priors derived from underwater imaging models and propose a plug-and-play vision feature enhancement (VFE) module, which explicitly restores clear underwater information. We integrate this module into renowned baselines LLaVA-1.5 and Qwen2.5-VL and build our underwater LMM, NAUTILUS. Experiments conducted on the NautData and public underwater datasets demonstrate the effectiveness of the VFE module, consistently improving the performance of both baselines on the majority of supported tasks, thus ensuring the superiority of NAUTILUS in the underwater scene understanding area. Data and models are available at https://github.com/H-EmbodVis/NAUTILUS.

CVMar 31, 2023
CIMI4D: A Large Multimodal Climbing Motion Dataset under Human-scene Interactions

Ming Yan, Xin Wang, Yudi Dai et al.

Motion capture is a long-standing research problem. Although it has been studied for decades, the majority of research focus on ground-based movements such as walking, sitting, dancing, etc. Off-grounded actions such as climbing are largely overlooked. As an important type of action in sports and firefighting field, the climbing movements is challenging to capture because of its complex back poses, intricate human-scene interactions, and difficult global localization. The research community does not have an in-depth understanding of the climbing action due to the lack of specific datasets. To address this limitation, we collect CIMI4D, a large rock \textbf{C}l\textbf{I}mbing \textbf{M}ot\textbf{I}on dataset from 12 persons climbing 13 different climbing walls. The dataset consists of around 180,000 frames of pose inertial measurements, LiDAR point clouds, RGB videos, high-precision static point cloud scenes, and reconstructed scene meshes. Moreover, we frame-wise annotate touch rock holds to facilitate a detailed exploration of human-scene interaction. The core of this dataset is a blending optimization process, which corrects for the pose as it drifts and is affected by the magnetic conditions. To evaluate the merit of CIMI4D, we perform four tasks which include human pose estimations (with/without scene constraints), pose prediction, and pose generation. The experimental results demonstrate that CIMI4D presents great challenges to existing methods and enables extensive research opportunities. We share the dataset with the research community in http://www.lidarhumanmotion.net/cimi4d/.

NAMar 10, 2016
A discrete-ordinate discontinuous-streamline diffusion method for the radiative transfer equation

Cheng Wang, Qiwei Sheng, Weimin Han

The radiative transfer equation (RTE) arises in many different areas of science and engineering. In this paper, we propose and investigate a discrete-ordinate discontinuous-streamline diffusion (DODSD) method for solving the RTE, which is a combination of the discrete-ordinate technique and the discontinuous-streamline diffusion method. Different from the discrete-ordinate discontinuous Galerkin (DODG) method for the RTE, an artificial diffusion parameter is added to the test functions in the spatial discretization. Stability and error estimates in certain norms are proved. Numerical results show that the proposed method can lead to a more accurate approximation in comparison with the DODG method.

CVAug 7, 2024
L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object Detection

Xun Huang, Ziyu Xu, Hai Wu et al.

LiDAR-based vision systems are integral for 3D object detection, which is crucial for autonomous navigation. However, they suffer from performance degradation in adverse weather conditions due to the quality deterioration of LiDAR point clouds. Fusing LiDAR with the weather-robust 4D radar sensor is expected to solve this problem. However, the fusion of LiDAR and 4D radar is challenging because they differ significantly in terms of data quality and the degree of degradation in adverse weather. To address these issues, we introduce L4DR, a weather-robust 3D object detection method that effectively achieves LiDAR and 4D Radar fusion. Our L4DR includes Multi-Modal Encoding (MME) and Foreground-Aware Denoising (FAD) technique to reconcile sensor gaps, which is the first exploration of the complementarity of early fusion between LiDAR and 4D radar. Additionally, we design an Inter-Modal and Intra-Modal ({IM}2 ) parallel feature extraction backbone coupled with a Multi-Scale Gated Fusion (MSGF) module to counteract the varying degrees of sensor degradation under adverse weather conditions. Experimental evaluation on a VoD dataset with simulated fog proves that L4DR is more adaptable to changing weather conditions. It delivers a significant performance increase under different fog levels, improving the 3D mAP by up to 20.0% over the traditional LiDAR-only approach. Moreover, the results on the K-Radar dataset validate the consistent performance improvement of L4DR in real-world adverse weather conditions.

ARAug 3, 2023
Evaluation of STT-MRAM as a Scratchpad for Training in ML Accelerators

Sourjya Roy, Cheng Wang, Anand Raghunathan

Progress in artificial intelligence and machine learning over the past decade has been driven by the ability to train larger deep neural networks (DNNs), leading to a compute demand that far exceeds the growth in hardware performance afforded by Moore's law. Training DNNs is an extremely memory-intensive process, requiring not just the model weights but also activations and gradients for an entire minibatch to be stored. The need to provide high-density and low-leakage on-chip memory motivates the exploration of emerging non-volatile memory for training accelerators. Spin-Transfer-Torque MRAM (STT-MRAM) offers several desirable properties for training accelerators, including 3-4x higher density than SRAM, significantly reduced leakage power, high endurance and reasonable access time. On the one hand, MRAM write operations require high write energy and latency due to the need to ensure reliable switching. In this study, we perform a comprehensive device-to-system evaluation and co-optimization of STT-MRAM for efficient ML training accelerator design. We devised a cross-layer simulation framework to evaluate the effectiveness of STT-MRAM as a scratchpad replacing SRAM in a systolic-array-based DNN accelerator. To address the inefficiency of writes in STT-MRAM, we propose to reduce write voltage and duration. To evaluate the ensuing accuracy-efficiency trade-off, we conduct a thorough analysis of the error tolerance of input activations, weights, and errors during the training. We propose heterogeneous memory configurations that enable training convergence with good accuracy. We show that MRAM provide up to 15-22x improvement in system level energy across a suite of DNN benchmarks under iso-capacity and iso-area scenarios. Further optimizing STT-MRAM write operations can provide over 2x improvement in write energy for minimal degradation in application-level training accuracy.

AIFeb 21, 2023
Causal Explanations for Sequential Decision-Making in Multi-Agent Systems

Balint Gyevnar, Cheng Wang, Christopher G. Lucas et al.

We present CEMA: Causal Explanations in Multi-Agent systems; a framework for creating causal natural language explanations of an agent's decisions in dynamic sequential multi-agent systems to build more trustworthy autonomous agents. Unlike prior work that assumes a fixed causal structure, CEMA only requires a probabilistic model for forward-simulating the state of the system. Using such a model, CEMA simulates counterfactual worlds that identify the salient causes behind the agent's decisions. We evaluate CEMA on the task of motion planning for autonomous driving and test it in diverse simulated scenarios. We show that CEMA correctly and robustly identifies the causes behind the agent's decisions, even when a large number of other agents is present, and show via a user study that CEMA's explanations have a positive effect on participants' trust in autonomous vehicles and are rated as high as high-quality baseline explanations elicited from other participants. We release the collected explanations with annotations as the HEADD dataset.