Wenzheng Zhao

RO
h-index16
9papers
340citations
Novelty46%
AI Score49

9 Papers

CVDec 18, 2025
Kling-Omni Technical Report

Kling Team, Jialu Chen, Yuanzheng Ci et al.

We present Kling-Omni, a generalist generative framework designed to synthesize high-fidelity videos directly from multimodal visual language inputs. Adopting an end-to-end perspective, Kling-Omni bridges the functional separation among diverse video generation, editing, and intelligent reasoning tasks, integrating them into a holistic system. Unlike disjointed pipeline approaches, Kling-Omni supports a diverse range of user inputs, including text instructions, reference images, and video contexts, processing them into a unified multimodal representation to deliver cinematic-quality and highly-intelligent video content creation. To support these capabilities, we constructed a comprehensive data system that serves as the foundation for multimodal video creation. The framework is further empowered by efficient large-scale pre-training strategies and infrastructure optimizations for inference. Comprehensive evaluations reveal that Kling-Omni demonstrates exceptional capabilities in in-context generation, reasoning-based editing, and multimodal instruction following. Moving beyond a content creation tool, we believe Kling-Omni is a pivotal advancement toward multimodal world simulators capable of perceiving, reasoning, generating and interacting with the dynamic and complex worlds.

ROApr 1
An Edge-Host-Cloud Architecture for Robot-Agnostic, Caregiver-in-the-Loop Personalized Cognitive Exercise: Multi-Site Deployment in Dementia Care

Wenzheng Zhao, Ruth Palan Lopez, Shu Fen Wung et al.

We present Speaking Memories, a distributed, stakeholder-in-the-loop robotic interaction platform for personalized cognitive exercise support. Rather than a single robot-centric system, Speaking Memories is designed as a generalizable robotics architecture that integrates caregiver-authored knowledge, local edge intelligence, and embodied robotic agents into a unified socio-technical loop. The platform fuses auditory, visual, and textual signals to enable emotion-aware, personalized dialogue, while decoupling multimodal perception and reasoning from robot-specific hardware through a local edge interaction server. This design achieves low-latency, privacy-preserving operation and supports scalable deployment across heterogeneous robotic embodiments. Caregivers and family members contribute structured biographical knowledge via a secure cloud portal, which conditions downstream dialogue policies and enables longitudinal personalization across interaction sessions. Beyond real-time interaction, the system incorporates an automated multimodal evaluation layer that continuously analyzes user responses, affective cues, and engagement patterns, producing structured interaction metrics at scale. These metrics support systematic assessment of interaction quality, enable data-driven model fine-tuning, and lay the foundation for future clinician- and caregiver-informed personalization and intervention planning. We evaluate the platform through real-world deployments, measuring end-to-end latency, dialogue coherence, interaction stability, and stakeholder-reported usability and engagement. Results demonstrate sub-6-second response latency, robust multimodal synchronization, and consistently positive feedback from both participants and caregivers. Furthermore, subsets of the dataset can be shared upon request, subject to participant consent and IRB constraints.

LGMay 1, 2022
Accurate non-stationary short-term traffic flow prediction method

Wenzheng Zhao

Precise and timely traffic flow prediction plays a critical role in developing intelligent transportation systems and has attracted considerable attention in recent decades. Despite the significant progress in this area brought by deep learning, challenges remain. Traffic flows usually change dramatically in a short period, which prevents the current methods from accurately capturing the future trend and likely causes the over-fitting problem, leading to unsatisfied accuracy. To this end, this paper proposes a Long Short-Term Memory (LSTM) based method that can forecast the short-term traffic flow precisely and avoid local optimum problems during training. Specifically, instead of using the non-stationary raw traffic data directly, we first decompose them into sub-components, where each one is less noisy than the original input. Afterward, Sample Entropy (SE) is employed to merge similar components to reduce the computation cost. The merged features are fed into the LSTM, and we then introduce a spatiotemporal module to consider the neighboring relationships in the recombined signals to avoid strong autocorrelation. During training, we utilize the Grey Wolf Algorithm (GWO) to optimize the parameters of LSTM, which overcome the overfitting issue. We conduct the experiments on a UK public highway traffic flow dataset, and the results show that the proposed method performs favorably against other state-of-the-art methods with better adaption performance on extreme outliers, delay effects, and trend-changing responses.

ROMar 12
Bridging the Awareness Gap: Socially Mediated State Externalization for Transparent Distributed Home Robots

Wenzheng Zhao, Manideep Duggi, Fengpei Yuan

Distributed multi-robot systems for the home often require robots to operate out of the user's sight, creating a state awareness gap that can diminish trust and perceived transparency and control. This paper investigates whether real-time, socially mediated state externalization can bridge this gap without compromising task performance. We developed a system where a co-located social mediator robot (Pepper) externalizes the hidden execution states of an out-of-sight mobile manipulator (Stretch~3) for voice-driven object retrieval and delivery, where task-level states are synchronized and externalized through verbal updates and visual progress display. In a counterbalanced within-subject study (N=30), we compared a baseline of Autonomous Hidden Execution against Socially Mediated State Externalization. Our results show that externalization significantly increases user task-focused attention (from 15.8% to 84.6%, p<.001) and substantially improves perceived perspicuity, dependability, stimulation, and attractiveness (all p<.001). Furthermore, 83% of participants preferred the externalized condition, and this improvement in user experience was achieved without a statistically significant increase in end-to-end task completion time (p=.271). The results suggest that socially mediated state externalization is an effective architectural mechanism for designing more transparent and trustworthy distributed robot systems, ultimately enhancing user experience without sacrificing performance in distributed home robot deployments.

CVMar 12
SafeScreen: A Safety-First Screening Framework for Personalized Video Retrieval for Vulnerable Users

Wenzheng Zhao, Madhava Kalyan Gadiputi, Fengpei Yuan

Open-domain video platforms offer rich, personalized content that could support health, caregiving, and educational applications, but their engagement-optimized recommendation algorithms can expose vulnerable users to inappropriate or harmful material. These risks are especially acute in child-directed and care settings (e.g., dementia care), where content must satisfy individualized safety constraints before being shown. We introduce SafeScreen, a safety-first video screening framework that retrieves and presents personalized video while enforcing individualized safety constraints. Rather than ranking videos by relevance or popularity, SafeScreen treats safety as a prerequisite and performs sequential approval or rejection of candidate videos through an automated pipeline. SafeScreen integrates three key components: (i) profile-driven extraction of individualized safety criteria, (ii) evidence-grounded assessments via adaptive question generation and multimodal VideoRAG analysis, and (iii) LLM-based decision-making that verifies safety, appropriateness, and relevance before content exposure. This design enables explainable, real-time screening of uncurated video repositories without relying on precomputed safety labels. We evaluate SafeScreen in a dementia-care reminiscence case study using 30 synthetic patient profiles and 90 test queries. Results demonstrate that SafeScreen prioritizes safety over engagement, diverging from YouTube's engagement-optimized rankings in 80-93% of cases, while maintaining high levels of safety coverage, sensibleness, and groundedness, as validated by both LLM-based evaluation and domain experts.

CLMay 11, 2023
Improving Continual Relation Extraction by Distinguishing Analogous Semantics

Wenzheng Zhao, Yuanning Cui, Wei Hu

Continual relation extraction (RE) aims to learn constantly emerging relations while avoiding forgetting the learned relations. Existing works store a small number of typical samples to re-train the model for alleviating forgetting. However, repeatedly replaying these samples may cause the overfitting problem. We conduct an empirical study on existing works and observe that their performance is severely affected by analogous relations. To address this issue, we propose a novel continual extraction model for analogous relations. Specifically, we design memory-insensitive relation prototypes and memory augmentation to overcome the overfitting problem. We also introduce integrated training and focal knowledge distillation to enhance the performance on analogous relations. Experimental results show the superiority of our model and demonstrate its effectiveness in distinguishing analogous relations and overcoming overfitting.

ROJan 12, 2022
Coverage Path Planning for Robotic Quality Inspection with Control on Measurement Uncertainty

Yinhua Liu, Wenzheng Zhao, Hongpeng Liu et al.

The optical scanning gauges mounted on the robots are commonly used in quality inspection, such as verifying the dimensional specification of sheet structures. Coverage path planning (CPP) significantly influences the accuracy and efficiency of robotic quality inspection. Traditional CPP strategies focus on minimizing the number of viewpoints or traveling distance of robots under the condition of full coverage inspection. The measurement uncertainty when collecting the scanning data is less considered in the free-form surface inspection. To address this problem, a novel CPP method with the optimal viewpoint sampling strategy is proposed to incorporate the measurement uncertainty of key measurement points (MPs) into free-form surface inspection. At first, the feasible ranges of measurement uncertainty are calculated based on the tolerance specifications of the MPs. The initial feasible viewpoint set is generated considering the measurement uncertainty and the visibility of MPs. Then, the inspection cost function is built to evaluate the number of selected viewpoints and the average measurement uncertainty in the field of views (FOVs) of all the selected viewpoints. Afterward, an enhanced rapidly-exploring random tree (RRT*) algorithm is proposed for viewpoint sampling using the inspection cost function and CPP optimization. Case studies, including simulation tests and inspection experiments, have been conducted to evaluate the effectiveness of the proposed method. Results show that the scanning precision of key MPs is significantly improved compared with the benchmark method.

ROJun 15, 2021
Task Allocation and Coordinated Motion Planning for Autonomous Multi-Robot Optical Inspection Systems

Yinhua Liu, Wenzheng Zhao, Tim Lutz et al.

Autonomous multi-robot optical inspection systems are increasingly applied for obtaining inline measurements in process monitoring and quality control. Numerous methods for path planning and robotic coordination have been developed for static and dynamic environments and applied to different fields. However, these approaches may not work for the autonomous multi-robot optical inspection system due to fast computation requirements of inline optimization, unique characteristics on robotic end-effector orientations, and complex large-scale free-form product surfaces. This paper proposes a novel task allocation methodology for coordinated motion planning of multi-robot inspection. Specifically, (1) a local robust inspection task allocation is proposed to achieve efficient and well-balanced measurement assignment among robots; (2) collision-free path planning and coordinated motion planning are developed via dynamic searching in robotic coordinate space and perturbation of probe poses or local paths in the conflicting robots. A case study shows that the proposed approach can mitigate the risk of collisions between robots and environments, resolve conflicts among robots, and reduce the inspection cycle time significantly and consistently.

ROMay 15, 2020
Optimal Path Planning for Automated Dimensional Inspection of Free-Form Surfaces

Yinhua Liu, Wenzheng Zhao, Rui Sun et al.

Structural dimensional inspection is vital for the process monitoring, quality control, and fault diagnosis in the mass production of auto bodies. Comparing with the non-contact measurement, the high-precision five-axis measuring machine with the touch-trigger probe is a preferred choice for data collection. It can assist manufacturers in making accurate inspection quickly. As the increase of free-form surfaces and diverse surface orientations in auto body design, existing inspection approaches cannot capture some new critical features in the curvature of auto bodies in an efficient way. Therefore, we need to develop new path planning methods for automated dimensional inspection of free-form surfaces. This paper proposes an optimal path planning system for automated programming of measuring point inspection by incorporating probe rotations and effective collision detection. Specifically, the methodological contributions include: (i) a dynamic searching volume-based algorithm is developed to detect potential collisions in the local path between measurement points; (ii) a local path generation method is proposed with the integration of the probe trajectory and the stylus rotation. Then, the inspection time matrix is proposed to quantify the measuring time of diverse local paths; (iii) an optimization approach of the global inspection path for all critical points on the auto body is developed to minimize the total inspection time. A case study has been conducted on an auto body to verify the performance of the proposed method. Results show that the collision-free path for the free-form auto body could be generated automatically with off-line programming, and the proposed method produces about 40% fewer dummy points and needs 32% less movement time in the auto body inspection process.