Kuo Zhang

RO
h-index16
4papers
113citations
Novelty54%
AI Score41

4 Papers

ROMar 9, 2022
Regularized Deep Signed Distance Fields for Reactive Motion Generation

Puze Liu, Kuo Zhang, Davide Tateo et al.

Autonomous robots should operate in real-world dynamic environments and collaborate with humans in tight spaces. A key component for allowing robots to leave structured lab and manufacturing settings is their ability to evaluate online and real-time collisions with the world around them. Distance-based constraints are fundamental for enabling robots to plan their actions and act safely, protecting both humans and their hardware. However, different applications require different distance resolutions, leading to various heuristic approaches for measuring distance fields w.r.t. obstacles, which are computationally expensive and hinder their application in dynamic obstacle avoidance use-cases. We propose Regularized Deep Signed Distance Fields (ReDSDF), a single neural implicit function that can compute smooth distance fields at any scale, with fine-grained resolution over high-dimensional manifolds and articulated bodies like humans, thanks to our effective data generation and a simple inductive bias during training. We demonstrate the effectiveness of our approach in representative simulated tasks for whole-body control (WBC) and safe Human-Robot Interaction (HRI) in shared workspaces. Finally, we provide proof of concept of a real-world application in a HRI handover task with a mobile manipulator robot.

ROSep 27, 2022
Safe Reinforcement Learning of Dynamic High-Dimensional Robotic Tasks: Navigation, Manipulation, Interaction

Puze Liu, Kuo Zhang, Davide Tateo et al.

Safety is a crucial property of every robotic platform: any control policy should always comply with actuator limits and avoid collisions with the environment and humans. In reinforcement learning, safety is even more fundamental for exploring an environment without causing any damage. While there are many proposed solutions to the safe exploration problem, only a few of them can deal with the complexity of the real world. This paper introduces a new formulation of safe exploration for reinforcement learning of various robotic tasks. Our approach applies to a wide class of robotic platforms and enforces safety even under complex collision constraints learned from data by exploring the tangent space of the constraint manifold. Our proposed approach achieves state-of-the-art performance in simulated high-dimensional and dynamic tasks while avoiding collisions with the environment. We show safe real-world deployment of our learned controller on a TIAGo++ robot, achieving remarkable performance in manipulation and human-robot interaction tasks.

CVDec 15, 2025
Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

Team Seedance, Heyi Chen, Siyan Chen et al.

Recent strides in video generation have paved the way for unified audio-visual generation. In this work, we present Seedance 1.5 pro, a foundational model engineered specifically for native, joint audio-video generation. Leveraging a dual-branch Diffusion Transformer architecture, the model integrates a cross-modal joint module with a specialized multi-stage data pipeline, achieving exceptional audio-visual synchronization and superior generation quality. To ensure practical utility, we implement meticulous post-training optimizations, including Supervised Fine-Tuning (SFT) on high-quality datasets and Reinforcement Learning from Human Feedback (RLHF) with multi-dimensional reward models. Furthermore, we introduce an acceleration framework that boosts inference speed by over 10X. Seedance 1.5 pro distinguishes itself through precise multilingual and dialect lip-syncing, dynamic cinematic camera control, and enhanced narrative coherence, positioning it as a robust engine for professional-grade content creation. Seedance 1.5 pro is now accessible on Volcano Engine at https://console.volcengine.com/ark/region:ark+cn-beijing/experience/vision?type=GenVideo.

LGJul 10, 2025
An Automated Classifier of Harmful Brain Activities for Clinical Usage Based on a Vision-Inspired Pre-trained Framework

Yulin Sun, Xiaopeng Si, Runnan He et al.

Timely identification of harmful brain activities via electroencephalography (EEG) is critical for brain disease diagnosis and treatment, which remains limited application due to inter-rater variability, resource constraints, and poor generalizability of existing artificial intelligence (AI) models. In this study, a convolutional neural network model, VIPEEGNet, was developed and validated using EEGs recorded from Massachusetts General Hospital/Harvard Medical School. The VIPEEGNet was developed and validated using two independent datasets, collected between 2006 and 2020. The development cohort included EEG recordings from 1950 patients, with 106,800 EEG segments annotated by at least one experts (ranging from 1 to 28). The online testing cohort consisted of EEG segments from a subset of an additional 1,532 patients, each annotated by at least 10 experts. For the development cohort (n=1950), the VIPEEGNet achieved high accuracy, with an AUROC for binary classification of seizure, LPD, GPD, LRDA, GRDA, and "other" categories at 0.972 (95% CI, 0.957-0.988), 0.962 (95% CI, 0.954-0.970), 0.972 (95% CI, 0.960-0.984), 0.938 (95% CI, 0.917-0.959), 0.949 (95% CI, 0.941-0.957), and 0.930 (95% CI, 0.926-0.935). For multi classification, the sensitivity of VIPEEGNET for the six categories ranges from 36.8% to 88.2% and the precision ranges from 55.6% to 80.4%, and performance similar to human experts. Notably, the external validation showed Kullback-Leibler Divergence (KLD)of 0.223 and 0.273, ranking top 2 among the existing 2,767 competing algorithms, while we only used 2.8% of the parameters of the first-ranked algorithm.