Hai Zhu

RO
18papers
617citations
Novelty49%
AI Score55

18 Papers

CLMar 9, 2023Code
BeamAttack: Generating High-quality Textual Adversarial Examples through Beam Search and Mixed Semantic Spaces

Hai Zhu, Qingyang Zhao, Yuren Wu

Natural language processing models based on neural networks are vulnerable to adversarial examples. These adversarial examples are imperceptible to human readers but can mislead models to make the wrong predictions. In a black-box setting, attacker can fool the model without knowing model's parameters and architecture. Previous works on word-level attacks widely use single semantic space and greedy search as a search strategy. However, these methods fail to balance the attack success rate, quality of adversarial examples and time consumption. In this paper, we propose BeamAttack, a textual attack algorithm that makes use of mixed semantic spaces and improved beam search to craft high-quality adversarial examples. Extensive experiments demonstrate that BeamAttack can improve attack success rate while saving numerous queries and time, e.g., improving at most 7\% attack success rate than greedy search when attacking the examples from MR dataset. Compared with heuristic search, BeamAttack can save at most 85\% model queries and achieve a competitive attack success rate. The adversarial examples crafted by BeamAttack are highly transferable and can effectively improve model's robustness during adversarial training. Code is available at https://github.com/zhuhai-ustc/beamattack/tree/master

CLAug 1, 2023
LimeAttack: Local Explainable Method for Textual Hard-Label Adversarial Attack

Hai Zhu, Zhaoqing Yang, Weiwei Shang et al.

Natural language processing models are vulnerable to adversarial examples. Previous textual adversarial attacks adopt gradients or confidence scores to calculate word importance ranking and generate adversarial examples. However, this information is unavailable in the real world. Therefore, we focus on a more realistic and challenging setting, named hard-label attack, in which the attacker can only query the model and obtain a discrete prediction label. Existing hard-label attack algorithms tend to initialize adversarial examples by random substitution and then utilize complex heuristic algorithms to optimize the adversarial perturbation. These methods require a lot of model queries and the attack success rate is restricted by adversary initialization. In this paper, we propose a novel hard-label attack algorithm named LimeAttack, which leverages a local explainable method to approximate word importance ranking, and then adopts beam search to find the optimal solution. Extensive experiments show that LimeAttack achieves the better attacking performance compared with existing hard-label attack under the same query budget. In addition, we evaluate the effectiveness of LimeAttack on large language models, and results indicate that adversarial examples remain a significant threat to large language models. The adversarial examples crafted by LimeAttack are highly transferable and effectively improve model robustness in adversarial training.

52.9ROApr 9
Vision-Language Navigation for Aerial Robots: Towards the Era of Large Language Models

Xingyu Xia, Lekai Zhou, Yujie Tang et al.

Aerial vision-and-language navigation (Aerial VLN) aims to enable unmanned aerial vehicles (UAVs) to interpret natural language instructions and autonomously navigate complex three-dimensional environments by grounding language in visual perception. This survey provides a critical and analytical review of the Aerial VLN field, with particular attention to the recent integration of large language models (LLMs) and vision-language models (VLMs). We first formally introduce the Aerial VLN problem and define two interaction paradigms: single-instruction and dialog-based, as foundational axes. We then organize the body of Aerial VLN methods into a taxonomy of five architectural categories: sequence-to-sequence and attention-based methods, end-to-end LLM/VLM methods, hierarchical methods, multi-agent methods, and dialog-based navigation methods. For each category, we systematically analyze design rationales, technical trade-offs, and reported performance. We critically assess the evaluation infrastructure for Aerial VLN, including datasets, simulation platforms, and metrics, and identify their gaps in scale, environmental diversity, real-world grounding, and metric coverage. We consolidate cross-method comparisons on shared benchmarks and analyze key architectural trade-offs, including discrete versus continuous actions, end-to-end versus hierarchical designs, and the simulation-to-reality gap. Finally, we synthesize seven concrete open problems: long-horizon instruction grounding, viewpoint robustness, scalable spatial representation, continuous 6-DoF action execution, onboard deployment, benchmark standardization, and multi-UAV swarm navigation, with specific research directions grounded in the evidence presented throughout the survey.

32.7ROApr 17
Robust Real-Time Coordination of CAVs: A Distributed Optimization Framework under Uncertainty

Haojie Bai, Tingting Zhang, Cong Guo et al.

Achieving both safety guarantees and real-time performance in cooperative vehicle coordination remains a fundamental challenge, particularly in dynamic and uncertain environments. Existing methods often suffer from insufficient uncertainty treatment in safety modeling, which intertwines with the heavy computational burden under complex multi-vehicle coupling. This paper presents a novel coordination framework that resolves this challenge through three key innovations: 1) direct control of vehicles' trajectory distributions during coordination, formulated as a robust cooperative planning problem with adaptive enhanced safety constraints, ensuring a specified level of safety regarding the uncertainty of the interactive trajectory, 2) a fully parallel ADMM-based distributed trajectory negotiation (ADMM-DTN) algorithm that efficiently solves the optimization problem while allowing configurable negotiation rounds to balance solution quality and computational resources, and 3) an interactive attention mechanism that selectively focuses on critical interactive participants to further enhance computational efficiency. Simulation results demonstrate that our framework achieves significant advantages in safety (reducing collision rates by up to 40.79\% in various scenarios) and real-time performance compared to representative benchmarks, while maintaining strong scalability with increasing vehicle numbers. The proposed interactive attention mechanism further reduces the computational demand by 15.4\%. Real-world experiments further validate robustness and real-time feasibility with unexpected dynamic obstacles, demonstrating reliable coordination in complex traffic scenes. The experiment demo could be found at https://youtu.be/4PZwBnCsb6Q.

75.3NAApr 8
Slip optimization on arbitrary 3D microswimmers: a reduced-dimension and boundary-integral framework

Marc Bonnet, Kausik Das, Shravan Veerapaneni et al.

This article presents a computational framework for determining the optimal slip velocity of a microswimmer with arbitrary three-dimensional geometry suspended in a viscous fluid. The objective is to minimize the hydrodynamic power dissipation required to maintain unit speed along the net swimming direction. By exploiting the linearity of the Stokes equations and the Lorentz reciprocal theorem, we derive an explicit linear operator that maps the tangential surface slip velocity to the resulting rigid-body translational and rotational velocities, effectively decoupling the hydrodynamic boundary value problem from the optimization loop. The a priori infinite-dimensional search space for the slip optimization is reduced to the finite dimension $r$ of rigid-body motions by finding an appropriate subspace of the operator's domain. This reduces the PDE-constrained optimization to a low-dimensional programming problem that can be solved at negligible computational cost once the system matrices are assembled. The optimization algorithm requires 2$r$ auxiliary flow problems that are solved numerically using a high-order boundary integral method. We validate the accuracy of the proposed method and present optimal slip profiles and swimming trajectories for a variety of microswimmer shapes. We investigate the effect of some common geometrical symmetries of the swimmer shape on the resulting optimal motion, and in particular present a modified version of the slip optimization algorithm for axisymmetric shapes, where tangential rigid-body velocities may occur

61.4IRMar 24
GateSID: Adaptive Gating for Semantic-Collaborative Alignment in Cold-Start Recommendation

Hai Zhu, Yantao Yu, Lei Shen et al.

In cold-start scenarios, the scarcity of collaborative signals for new items exacerbates the Matthew effect, which undermines platform diversity and remains a persistent challenge in real-world recommender systems. Existing methods typically enhance collaborative signals with semantic information, but they often suffer from a collaborative-semantic tradeoff: collaborative signals are effective for popular items but unreliable for cold-start items, whereas over-reliance on semantic information may obscure meaningful collaborative differences. To address this issue, we propose GateSID, a framework that uses an adaptive gating network to dynamically balance semantic and collaborative signals according to item maturity. Specifically, we first discretize multimodal features into hierarchical Semantic IDs using Residual Quantized VAE. Building on this representation, we design two key components: (1) Gating-Fused Shared Attention, which fuses intra-modal attention distributions with item-level gating weights derived from embeddings and statistical features; and (2) Gate-Regulated Contrastive Alignment, which adaptively calibrates cross-modal alignment, enforcing stronger semantic-behavior consistency for cold-start items while relaxing the constraint for popular items to preserve reliable collaborative signals. Extensive offline experiments on large-scale industrial datasets demonstrate that GateSID consistently outperforms strong baselines. Online A/B tests further confirm its practical value, yielding +2.6% GMV, +1.1% CTR, and +1.6% orders with less than 5 ms additional latency.

60.1CRMar 13
Bipartite Randomized Response Mechanism for Local Differential Privacy

Shun Zhang, Hai Zhu, Zhili Chen et al.

With the increasing importance of data privacy, Local Differential Privacy (LDP) has recently become a strong measure of privacy for protecting each user's privacy from data analysts without relying on a trusted third party. In this paper, we consider the problem of high-utility differentially private release. Given a domain of items and a distance-defined utility function, our goal is to design a differentially private mechanism that releases an item with the global expected error as small as possible. The most common LDP mechanism for this task is the Generalized Randomized Response (GRR) mechanism that treats all candidate items equally except for the true item. In this paper, we introduce Bipartite Randomized Response mechanism (BRR), which adaptively divides all candidate items into two parts by utility rankings. In the local search phase, we confirm how many high-utility candidates to be assigned with high release probability, which gives the locally optimal bipartite classification of all candidates. For preserving LDP, the global search phase uniformly selects the smallest number of dynamic high-utility candidates obtained locally. In particular, we give explicit formulas on the uniform number of dynamic high-utility candidates. The global expected error of our BRR can theoretically deliver a decrease with an asymptotically exact ratio, and when the privacy budget is set to $3$ the expected error can be reduced by $66.4\%$. Extensive experiments demonstrate that BRR outperforms the state-of-the-art methods across the standard metrics and datasets.

ROFeb 15, 2022
Improving Pedestrian Prediction Models with Self-Supervised Continual Learning

Luzia Knoedler, Chadi Salmi, Hai Zhu et al.

Autonomous mobile robots require accurate human motion predictions to safely and efficiently navigate among pedestrians, whose behavior may adapt to environmental changes. This paper introduces a self-supervised continual learning framework to improve data-driven pedestrian prediction models online across various scenarios continuously. In particular, we exploit online streams of pedestrian data, commonly available from the robot's detection and tracking pipeline, to refine the prediction model and its performance in unseen scenarios. To avoid the forgetting of previously learned concepts, a problem known as catastrophic forgetting, our framework includes a regularization loss to penalize changes of model parameters that are important for previous scenarios and retrains on a set of previous examples to retain past knowledge. Experimental results on real and simulation data show that our approach can improve prediction performance in unseen scenarios while retaining knowledge from seen scenarios when compared to naively training the prediction model online.

ROJan 11, 2022
Decentralized Probabilistic Multi-Robot Collision Avoidance Using Buffered Uncertainty-Aware Voronoi Cells

Hai Zhu, Bruno Brito, Javier Alonso-Mora

In this paper, we present a decentralized and communication-free collision avoidance approach for multi-robot systems that accounts for both robot localization and sensing uncertainties. The approach relies on the computation of an uncertainty-aware safe region for each robot to navigate among other robots and static obstacles in the environment, under the assumption of Gaussian-distributed uncertainty. In particular, at each time step, we construct a chance-constrained buffered uncertainty-aware Voronoi cell (B-UAVC) for each robot given a specified collision probability threshold. Probabilistic collision avoidance is achieved by constraining the motion of each robot to be within its corresponding B-UAVC, i.e. the collision probability between the robots and obstacles remains below the specified threshold. The proposed approach is decentralized, communication-free, scalable with the number of robots and robust to robots' localization and sensing uncertainties. We applied the approach to single-integrator, double-integrator, differential-drive robots, and robots with general nonlinear dynamics. Extensive simulations and experiments with a team of ground vehicles, quadrotors, and heterogeneous robot teams are performed to analyze and validate the proposed approach.

SDOct 11, 2021
Pitch Preservation In Singing Voice Synthesis

Shujun Liu, Hai Zhu, Kun Wang et al.

Suffering from limited singing voice corpus, existing singing voice synthesis (SVS) methods that build encoder-decoder neural networks to directly generate spectrogram could lead to out-of-tune issues during the inference phase. To attenuate these issues, this paper presents a novel acoustic model with independent pitch encoder and phoneme encoder, which disentangles the phoneme and pitch information from music score to fully utilize the corpus. Specifically, according to equal temperament theory, the pitch encoder is constrained by a pitch metric loss that maps distances between adjacent input pitches into corresponding frequency multiples between the encoder outputs. For the phoneme encoder, based on the analysis that same phonemes corresponding to varying pitches can produce similar pronunciations, this encoder is followed by an adversarially trained pitch classifier to enforce the identical phonemes with different pitches mapping into the same phoneme feature space. By these means, the sparse phonemes and pitches in original input spaces can be transformed into more compact feature spaces respectively, where the same elements cluster closely and cooperate mutually to enhance synthesis quality. Then, the outputs of the two encoders are summed together to pass through the following decoder in the acoustic model. Experimental results indicate that the proposed approaches can characterize intrinsic structure between pitch inputs to obtain better pitch synthesis accuracy and achieve superior singing synthesis performance against the advanced baseline system.

ROMar 17, 2021
Online Informative Path Planning for Active Information Gathering of a 3D Surface

Hai Zhu, Jen Jen Chung, Nicholas R. J. Lawrance et al.

This paper presents an online informative path planning approach for active information gathering on three-dimensional surfaces using aerial robots. Most existing works on surface inspection focus on planning a path offline that can provide full coverage of the surface, which inherently assumes the surface information is uniformly distributed hence ignoring potential spatial correlations of the information field. In this paper, we utilize manifold Gaussian processes (mGPs) with geodesic kernel functions for mapping surface information fields and plan informative paths online in a receding horizon manner. Our approach actively plans information-gathering paths based on recent observations that respect dynamic constraints of the vehicle and a total flight time budget. We provide planning results for simulated temperature modeling for simple and complex 3D surface geometries (a cylinder and an aircraft model). We demonstrate that our informative planning method outperforms traditional approaches such as 3D coverage planning and random exploration, both in reconstruction error and information-theoretic metrics. We also show that by taking spatial correlations of the information field into planning using mGPs, the information gathering efficiency is significantly improved.

ROFeb 10, 2021
Learning Interaction-Aware Trajectory Predictions for Decentralized Multi-Robot Motion Planning in Dynamic Environments

Hai Zhu, Francisco Martinez Claramunt, Bruno Brito et al.

This paper presents a data-driven decentralized trajectory optimization approach for multi-robot motion planning in dynamic environments. When navigating in a shared space, each robot needs accurate motion predictions of neighboring robots to achieve predictive collision avoidance. These motion predictions can be obtained among robots by sharing their future planned trajectories with each other via communication. However, such communication may not be available nor reliable in practice. In this paper, we introduce a novel trajectory prediction model based on recurrent neural networks (RNN) that can learn multi-robot motion behaviors from demonstrated trajectories generated using a centralized sequential planner. The learned model can run efficiently online for each robot and provide interaction-aware trajectory predictions of its neighbors based on observations of their history states. We then incorporate the trajectory prediction model into a decentralized model predictive control (MPC) framework for multi-robot collision avoidance. Simulation results show that our decentralized approach can achieve a comparable level of performance to a centralized planner while being communication-free and scalable to a large number of robots. We also validate our approach with a team of quadrotors in real-world experiments.

NIDec 10, 2020
Edge Computing Assisted Autonomous Flight for UAV: Synergies between Vision and Communications

Quan Chen, Hai Zhu, Lei Yang et al.

Autonomous flight for UAVs relies on visual information for avoiding obstacles and ensuring a safe collision-free flight. In addition to visual clues, safe UAVs often need connectivity with the ground station. In this paper, we study the synergies between vision and communications for edge computing-enabled UAV flight. By proposing a framework of Edge Computing Assisted Autonomous Flight (ECAAF), we illustrate that vision and communications can interact with and assist each other with the aid of edge computing and offloading, and further speed up the UAV mission completion. ECAAF consists of three functionalities that are discussed in detail: edge computing for 3D map acquisition, radio map construction from the 3D map, and online trajectory planning. During ECAAF, the interactions of communication capacity, video offloading, 3D map quality, and channel state of the trajectory form a positive feedback loop. Simulation results verify that the proposed method can improve mission performance by enhancing connectivity. Finally, we conclude with some future research directions.

ROOct 18, 2020
Social-VRNN: One-Shot Multi-modal Trajectory Prediction for Interacting Pedestrians

Bruno Brito, Hai Zhu, Wei Pan et al.

Prediction of human motions is key for safe navigation of autonomous robots among humans. In cluttered environments, several motion hypotheses may exist for a pedestrian, due to its interactions with the environment and other pedestrians. Previous works for estimating multiple motion hypotheses require a large number of samples which limits their applicability in real-time motion planning. In this paper, we present a variational learning approach for interaction-aware and multi-modal trajectory prediction based on deep generative neural networks. Our approach can achieve faster convergence and requires significantly fewer samples comparing to state-of-the-art methods. Experimental results on real and simulation data show that our model can effectively learn to infer different trajectories. We compare our method with three baseline approaches and present performance results demonstrating that our generative model can achieve higher accuracy for trajectory prediction by producing diverse trajectories.

ROSep 25, 2020
With Whom to Communicate: Learning Efficient Communication for Multi-Robot Collision Avoidance

Álvaro Serra-Gómez, Bruno Brito, Hai Zhu et al.

Decentralized multi-robot systems typically perform coordinated motion planning by constantly broadcasting their intentions as a means to cope with the lack of a central system coordinating the efforts of all robots. Especially in complex dynamic environments, the coordination boost allowed by communication is critical to avoid collisions between cooperating robots. However, the risk of collision between a pair of robots fluctuates through their motion and communication is not always needed. Additionally, constant communication makes much of the still valuable information shared in previous time steps redundant. This paper presents an efficient communication method that solves the problem of "when" and with "whom" to communicate in multi-robot collision avoidance scenarios. In this approach, every robot learns to reason about other robots' states and considers the risk of future collisions before asking for the trajectory plans of other robots. We evaluate and verify the proposed communication strategy in simulation with four quadrotors and compare it with three baseline strategies: non-communicating, broadcasting and a distance-based method broadcasting information with quadrotors within a predefined distance.

ROFeb 12, 2020
Robust Vision-based Obstacle Avoidance for Micro Aerial Vehicles in Dynamic Environments

Jiahao Lin, Hai Zhu, Javier Alonso-Mora

In this paper, we present an on-board vision-based approach for avoidance of moving obstacles in dynamic environments. Our approach relies on an efficient obstacle detection and tracking algorithm based on depth image pairs, which provides the estimated position, velocity and size of the obstacles. Robust collision avoidance is achieved by formulating a chance-constrained model predictive controller (CC-MPC) to ensure that the collision probability between the micro aerial vehicle (MAV) and each moving obstacle is below a specified threshold. The method takes into account MAV dynamics, state estimation and obstacle sensing uncertainties. The proposed approach is implemented on a quadrotor equipped with a stereo camera and is tested in a variety of environments, showing effective on-line collision avoidance of moving obstacles.

AIMar 21, 2019
Iteratively Learning Embeddings and Rules for Knowledge Graph Reasoning

Wen Zhang, Bibek Paudel, Liang Wang et al.

Reasoning is essential for the development of large knowledge graphs, especially for completion, which aims to infer new triples based on existing ones. Both rules and embeddings can be used for knowledge graph reasoning and they have their own advantages and difficulties. Rule-based reasoning is accurate and explainable but rule learning with searching over the graph always suffers from efficiency due to huge search space. Embedding-based reasoning is more scalable and efficient as the reasoning is conducted via computation between embeddings, but it has difficulty learning good representations for sparse entities because a good embedding relies heavily on data richness. Based on this observation, in this paper we explore how embedding and rule learning can be combined together and complement each other's difficulties with their advantages. We propose a novel framework IterE iteratively learning embeddings and rules, in which rules are learned from embeddings with proper pruning strategy and embeddings are learned from existing triples and new triples inferred by rules. Evaluations on embedding qualities of IterE show that rules help improve the quality of sparse entity embeddings and their link prediction results. We also evaluate the efficiency of rule learning and quality of rules from IterE compared with AMIE+, showing that IterE is capable of generating high quality rules more efficiently. Experiments show that iteratively learning embeddings and rules benefit each other during learning and prediction.