Mengshi Zhang

SE
h-index13
11papers
251citations
Novelty36%
AI Score48

11 Papers

70.3ROMay 27
World Models for Robotic Manipulation: A Survey

Fangyuan Wang, Ziyuan Wang, Guorui Pei et al.

Robotic manipulation depends on the ability to anticipate how actions reshape objects, contacts, and scene geometry before execution. Learned world models provide this capability by predicting task-relevant future evolution under robot intervention, yet the term now spans latent dynamics models, action-conditioned video generators, three- and four-dimensional scene predictors, physics-informed simulators, and predictive modules inside vision-language-action systems. This breadth has fragmented the literature and obscured the design choices that matter for manipulation. We survey world models for robotic manipulation through three questions: what future representation is predicted, how prediction is connected to action, and when prediction is used in the robot-learning pipeline. We operationally define a world model as an action-conditioned predictive system and distinguish it from perception modules, inverse models, policies, rewards, and value functions. We then organize existing work into five representation families, develop a functional taxonomy that separates integrated prediction-action models from explicit predictive planners, and characterize infrastructure roles including synthetic experience generation, candidate filtering, search-based evaluation, learned environments, and outcome verification. We further map these roles across pretraining, post-training, and inference adaptation, review 34 manipulation datasets, and synthesize evaluation protocols for predictive fidelity, task performance, and simulator reliability. This survey shows that world models are evolving from task-specific dynamics predictors into predictive infrastructure for robot learning, while exposing open challenges in contact modeling, hallucination control, action alignment, and benchmarking under closed-loop use.

38.6ROMay 27
Learning a Kinodynamic Trajectory Manifold for Impact-Aware Compliant Catching of Fast-Moving Objects

Guorui Pei, Mengshi Zhang, Xi Chen et al.

Fast catching of free-flying objects is difficult because of short reaction time, impact uncertainty, and kinodynamic constraints. We use reinforcement learning in simulation to collect successful catching trajectories and learn a low-dimensional kinodynamic trajectory manifold. At run time, the estimated object initial state is mapped directly to a reference catching trajectory without online nonlinear optimization. The trajectory is tracked with compliant control near contact for improved impact absorption and capture stability.

IRJul 17, 2023
Evaluating and Enhancing Robustness of Deep Recommendation Systems Against Hardware Errors

Dongning Ma, Xun Jiao, Fred Lin et al.

Deep recommendation systems (DRS) heavily depend on specialized HPC hardware and accelerators to optimize energy, efficiency, and recommendation quality. Despite the growing number of hardware errors observed in large-scale fleet systems where DRS are deployed, the robustness of DRS has been largely overlooked. This paper presents the first systematic study of DRS robustness against hardware errors. We develop Terrorch, a user-friendly, efficient and flexible error injection framework on top of the widely-used PyTorch. We evaluate a wide range of models and datasets and observe that the DRS robustness against hardware errors is influenced by various factors from model parameters to input characteristics. We also explore 3 error mitigation methods including algorithm based fault tolerance (ABFT), activation clipping and selective bit protection (SBP). We find that applying activation clipping can recover up to 30% of the degraded AUC-ROC score, making it a promising mitigation method.

31.9AIMay 15
GRID: Graph Representation of Intelligence Data for Security Text Knowledge Graph Construction

Liangyi Huang, Zichen Liu, Fei Shao et al.

Security knowledge graphs can provide computable external memory for security agents, but constructing them from long-form cyber threat intelligence (CTI) remains difficult: LLMs often lack grounded security-domain knowledge, and end-to-end document-to-graph training is hard to supervise with cheap, stable rewards. We present GRID (Graph Representation of Intelligence Data), an end-to-end framework for security text knowledge graph construction. GRID first builds security-domain supervision from CTI articles by creating traceable article-graph alignments through graph extraction and knowledge-graph-conditioned text revision. It then turns document-to-graph learning into a scripted task bank combining four-option multi-select questions with triple-level regex matching targets, yielding more stable task-specific rewards than repeatedly scoring full graph outputs with an LLM judge. Using this supervision pipeline, we train two Qwen3-4B-Instruct-2507-based 4B extractors: a primary Task-bank Reward model and a secondary End2End Reward model with LLM-as-judge precision/recall rewards. On 249 CTI articles from GRID, CASIE, CTINexus, MalKG, and SecureNLP, the Task-bank Reward model with the ontology-guided GRID extraction pipeline reaches 84.62% source-averaged precision, 64.91% source-averaged recall, and 68.53% Avg F1, achieving the best source-averaged recall and near-top Avg F1 with lower token usage and deployment cost. The End2End Reward model reaches 76.91% precision, 53.85% recall, and 58.06% Avg F1. Further analyses show that task-bank rewards can be built once offline and reused across later post-training runs, outperforming online End2End LLM-as-judge reward and weaker alternatives such as Choice-only Reward and End2End SFT without RL.

IVAug 27, 2025
UltraEar: a multicentric, large-scale database combining ultra-high-resolution computed tomography and clinical data for ear diseases

Ruowei Tang, Pengfei Zhao, Xiaoguang Li et al.

Ear diseases affect billions of people worldwide, leading to substantial health and socioeconomic burdens. Computed tomography (CT) plays a pivotal role in accurate diagnosis, treatment planning, and outcome evaluation. The objective of this study is to present the establishment and design of UltraEar Database, a large-scale, multicentric repository of isotropic 0.1 mm ultra-high-resolution CT (U-HRCT) images and associated clinical data dedicated to ear diseases. UltraEar recruits patients from 11 tertiary hospitals between October 2020 and October 2035, integrating U-HRCT images, structured CT reports, and comprehensive clinical information, including demographics, audiometric profiles, surgical records, and pathological findings. A broad spectrum of otologic disorders is covered, such as otitis media, cholesteatoma, ossicular chain malformation, temporal bone fracture, inner ear malformation, cochlear aperture stenosis, enlarged vestibular aqueduct, and sigmoid sinus bony deficiency. Standardized preprocessing pipelines have been developed for geometric calibration, image annotation, and multi-structure segmentation. All personal identifiers in DICOM headers and metadata are removed or anonymized to ensure compliance with data privacy regulation. Data collection and curation are coordinated through monthly expert panel meetings, with secure storage on an offline cloud system. UltraEar provides an unprecedented ultra-high-resolution reference atlas with both technical fidelity and clinical relevance. This resource has significant potential to advance radiological research, enable development and validation of AI algorithms, serve as an educational tool for training in otologic imaging, and support multi-institutional collaborative studies. UltraEar will be continuously updated and expanded, ensuring long-term accessibility and usability for the global otologic research community.

SEJun 23, 2021
Testing of Autonomous Driving Systems: Where Are We and Where Should We Go?

Guannan Lou, Yao Deng, Xi Zheng et al.

Autonomous driving has shown great potential to reform modern transportation. Yet its reliability and safety have drawn a lot of attention and concerns. Compared with traditional software systems, autonomous driving systems (ADSs) often use deep neural networks in tandem with logic-based modules. This new paradigm poses unique challenges for software testing. Despite the recent development of new ADS testing techniques, it is not clear to what extent those techniques have addressed the needs of ADS practitioners. To fill this gap, we present the first comprehensive study to identify the current practices and needs of ADS testing. We conducted semi-structured interviews with developers from 10 autonomous driving companies and surveyed 100 developers who have worked on autonomous driving systems. A systematic analysis of the interview and survey data revealed 7 common practices and 4 emerging needs of autonomous driving testing. Through a comprehensive literature review, we developed a taxonomy of existing ADS testing techniques and analyzed the gap between ADS research and practitioners' needs. Finally, we proposed several future directions for SE researchers, such as developing test reduction techniques to accelerate simulation-based ADS testing.

SEApr 9, 2021
Self-Boosted Automated Program Repair

Samuel Benton, Mengshi Zhang, Xia Li et al.

Program repair is an integral part of every software system's life-cycle but can be extremely challenging. To date, researchers have proposed various automated program repair (APR) techniques to reduce efforts of manual debugging. However, given a real-world buggy program, a typical APR technique usually generates a large number of patches, each of which needs to be validated against the original test suite which incurs extremely high computation costs. Although existing APR techniques have already leveraged various static and/or dynamic information to find the desired patches faster, they are still rather costly. In a recent work, researchers proposed unified debugging to leverage the patch execution information during APR to help boost fault localization; in this way,the application scope of APR techniques can be extended to all possible bugs, e.g., the patch execution information during APR can help with manual repair of the bugs that cannot be automatically fixed. Inspired by unified debugging, this work proposes SeAPR (Self-Boosted Automated Program Repair), the first technique to leverage the earlier patch execution information during APR to help boost automated repair itself on-the-fly. Our basic intuition is that patches similar to earlier high-quality/low-quality patches should be promoted/degraded to speed up the detection of the desired patches. This experimental study on 12 state-of-the-art APR systems demonstrates that, overall, SeAPR can substantially reduce the number of patch executions with negligible overhead. Our study also investigates the impact of various configurations on SeAPR. Lastly, our study demonstrates that SeAPR can even leverage the patch execution information from other APR tools from the same buggy program to further boost APR.

LGMar 12, 2021
SCEI: A Smart-Contract Driven Edge Intelligence Framework for IoT Systems

Chenhao Xu, Jiaqi Ge, Yong Li et al.

Federated learning (FL) enables collaborative training of a shared model on edge devices while maintaining data privacy. FL is effective when dealing with independent and identically distributed (iid) datasets, but struggles with non-iid datasets. Various personalized approaches have been proposed, but such approaches fail to handle underlying shifts in data distribution, such as data distribution skew commonly observed in real-world scenarios (e.g., driver behavior in smart transportation systems changing across time and location). Additionally, trust concerns among unacquainted devices and security concerns with the centralized aggregator pose additional challenges. To address these challenges, this paper presents a dynamically optimized personal deep learning scheme based on blockchain and federated learning. Specifically, the innovative smart contract implemented in the blockchain allows distributed edge devices to reach a consensus on the optimal weights of personalized models. Experimental evaluations using multiple models and real-world datasets demonstrate that the proposed scheme achieves higher accuracy and faster convergence compared to traditional federated and personalized learning approaches.

SYApr 7, 2019
Integration of Nonlinear Disturbance Observer within Proxy-based Sliding Mode Control for Pneumatic Muscle Actuators

Yu Cao, Jian Huang, Dongrui Wu et al.

This paper presents an integration of nonlinear disturbance observer within proxy-based sliding mode control (IDO-PSMC) approach for Pneumatic Muscle Actuators (PMAs). Due to the nonlinearities, uncertainties, hysteresis, and time-varying characteristics of the PMA, the model parameters are difficult to be identified accurately, which results in unmeasurable uncertainties and disturbances of the system. To solve this problem, a novel design of proxy-based sliding mode controller (PSMC) combined with a nonlinear disturbance observer (DO) is used for the tracking control of the PMA. Our approach combines both the merits of the PSMC and the DO so that it is effective in both reducing the ``chattering" phenomenon and improving the system robustness. A constrained Firefly Algorithm is used to search for the optimal control parameters. Based on the Lyapunov theorem, the states of the PMA are shown to be globally uniformly ultimately bounded. Extensive experiments were conducted to verify the superior performance of our approach, in multiple tracking scenarios.

SEJul 27, 2018
Symbolic Execution for Deep Neural Networks

Divya Gopinath, Kaiyuan Wang, Mengshi Zhang et al.

Deep Neural Networks (DNN) are increasingly used in a variety of applications, many of them with substantial safety and security concerns. This paper introduces DeepCheck, a new approach for validating DNNs based on core ideas from program analysis, specifically from symbolic execution. The idea is to translate a DNN into an imperative program, thereby enabling program analysis to assist with DNN validation. A basic translation however creates programs that are very complex to analyze. DeepCheck introduces novel techniques for lightweight symbolic analysis of DNNs and applies them in the context of image classification to address two challenging problems in DNN analysis: 1) identification of important pixels (for attribution and adversarial generation); and 2) creation of 1-pixel and 2-pixel attacks. Experimental results using the MNIST data-set show that DeepCheck's lightweight symbolic analysis provides a valuable tool for DNN validation.

SEFeb 7, 2018
DeepRoad: GAN-based Metamorphic Autonomous Driving System Testing

Mengshi Zhang, Yuqun Zhang, Lingming Zhang et al.

While Deep Neural Networks (DNNs) have established the fundamentals of DNN-based autonomous driving systems, they may exhibit erroneous behaviors and cause fatal accidents. To resolve the safety issues of autonomous driving systems, a recent set of testing techniques have been designed to automatically generate test cases, e.g., new input images transformed from the original ones. Unfortunately, many such generated input images often render inferior authenticity, lacking accurate semantic information of the driving scenes and hence compromising the resulting efficacy and reliability. In this paper, we propose DeepRoad, an unsupervised framework to automatically generate large amounts of accurate driving scenes to test the consistency of DNN-based autonomous driving systems across different scenes. In particular, DeepRoad delivers driving scenes with various weather conditions (including those with rather extreme conditions) by applying the Generative Adversarial Networks (GANs) along with the corresponding real-world weather scenes. Moreover, we have implemented DeepRoad to test three well-recognized DNN-based autonomous driving systems. Experimental results demonstrate that DeepRoad can detect thousands of behavioral inconsistencies in these systems.