Xusheng Luo

h-index7

17papers

315citations

Novelty51%

AI Score47

Ranked #57,900 of 205,806 authors (top 28%)#1,700 in RO (top 22%)

17 Papers

95.0CVJun 2Code

VLESA: Vision-Language Embodied Safety Agent for Human Activity Monitoring

Hanjiang Hu, Yiyuan Pan, Jiaxing Li et al.

As AI systems increasingly assist humans in physical tasks, ensuring safety becomes paramount -- physical actions carry immediate and irreversible consequences that digital errors do not. We introduce the Vision-Language Embodied Safety Agent (VLESA), a framework that monitors human activities from egocentric video and triggers real-time safety interventions when dangerous actions are predicted. VLESA addresses intent-dependent safety where identical actions can be safe or dangerous depending on context. A dataset pairing egocentric frames with goal-conditioned safety annotations is introduced, enabling a goal-conditioned safety Q-filter trained via GRPO that evaluates actions with respect to inferred intent without retraining. On top of that, an intent-action prediction agent is proposed to jointly infer goals and predict future actions from video. On the ASIMOV-2.0 benchmark, VLESA achieves higher intervention accuracy at the exact ground-truth frame compared to baselines, while the GRPO-trained Q-filter improves action safety by over 41 percentage points through goal-conditioned constrained decoding. Code is available at https://github.com/HanjiangHu/VLESA.

ROAug 20, 2023

Decomposition-based Hierarchical Task Allocation and Planning for Multi-Robots under Hierarchical Temporal Logic Specifications

Xusheng Luo, Shaojun Xu, Ruixuan Liu et al.

Past research into robotic planning with temporal logic specifications, notably Linear Temporal Logic (LTL), was largely based on a single formula for individual or groups of robots. But with increasing task complexity, LTL formulas unavoidably grow lengthy, complicating interpretation and specification generation, and straining the computational capacities of the planners. A recent development has been the hierarchical representation of LTL~\cite{luo2024simultaneous} that contains multiple temporal logic specifications, providing a more interpretable framework. However, the proposed planning algorithm assumes the independence of robots within each specification, limiting their application to multi-robot coordination with complex temporal constraints. In this work, we formulated a decomposition-based hierarchical framework. At the high level, each specification is first decomposed into a set of atomic sub-tasks. We further infer the temporal relations among the sub-tasks of different specifications to construct a task network. Subsequently, a Mixed Integer Linear Program is used to assign sub-tasks to various robots. At the lower level, domain-specific controllers are employed to execute sub-tasks. Our approach was experimentally applied to domains of navigation and manipulation. The simulation demonstrated that our approach can find better solutions using less runtimes.

CVJul 31, 2024

Certifying Robustness of Learning-Based Keypoint Detection and Pose Estimation Methods

Xusheng Luo, Tianhao Wei, Simin Liu et al.

This work addresses the certification of the local robustness of vision-based two-stage 6D object pose estimation. The two-stage method for object pose estimation achieves superior accuracy by first employing deep neural network-driven keypoint regression and then applying a Perspective-n-Point (PnP) technique. Despite advancements, the certification of these methods' robustness remains scarce. This research aims to fill this gap with a focus on their local robustness on the system level--the capacity to maintain robust estimations amidst semantic input perturbations. The core idea is to transform the certification of local robustness into neural network verification for classification tasks. The challenge is to develop model, input, and output specifications that align with off-the-shelf verification tools. To facilitate verification, we modify the keypoint detection model by substituting nonlinear operations with those more amenable to the verification processes. Instead of injecting random noise into images, as is common, we employ a convex hull representation of images as input specifications to more accurately depict semantic perturbations. Furthermore, by conducting a sensitivity analysis, we propagate the robustness criteria from pose to keypoint accuracy, and then formulating an optimal error threshold allocation problem that allows for the setting of a maximally permissible keypoint deviation thresholds. Viewing each pixel as an individual class, these thresholds result in linear, classification-akin output specifications. Under certain conditions, we demonstrate that the main components of our certification framework are both sound and complete, and validate its effects through extensive evaluations on realistic perturbations. To our knowledge, this is the first study to certify the robustness of large-scale, keypoint-based pose estimation given images in real-world scenarios.

ROAug 15, 2024

Nl2Hltl2Plan: Scaling Up Natural Language Understanding for Multi-Robots Through Hierarchical Temporal Logic Task Representation

Shaojun Xu, Xusheng Luo, Yutong Huang et al.

To enable non-experts to specify long-horizon, multi-robot collaborative tasks, language models are increasingly used to translate natural language commands into formal specifications. However, because translation can occur in multiple ways, such translations may lack accuracy or lead to inefficient multi-robot planning. Our key insight is that concise hierarchical specifications can simplify planning while remaining straightforward to derive from human instructions. We propose Nl2Hltl2Plan, a framework that translates natural language commands into hierarchical Linear Temporal Logic (LTL) and solves the corresponding planning problem. The translation involves two steps leveraging Large Language Models (LLMs). First, an LLM transforms instructions into a Hierarchical Task Tree, capturing logical and temporal relations. Next, a fine-tuned LLM converts sub-tasks into flat LTL formulas, which are aggregated into hierarchical specifications, with the lowest level corresponding to ordered robot actions. These specifications are then used with off-the-shelf planners. Our Nl2Hltl2Plan demonstrates the potential of LLMs in hierarchical reasoning for multi-robot task planning. Evaluations in simulation and real-world experiments with human participants show that Nl2Hltl2Plan outperforms existing methods, handling more complex instructions while achieving higher success rates and lower costs in task allocation and planning. Additional details are available at https://nl2hltl2plan.github.io .

72.3ROMar 26

Emergent Neural Automaton Policies: Learning Symbolic Structure from Visuomotor Trajectories

Yiyuan Pan, Xusheng Luo, Hanjiang Hu et al.

Scaling robot learning to long-horizon tasks remains a formidable challenge. While end-to-end policies often lack the structural priors needed for effective long-term reasoning, traditional neuro-symbolic methods rely heavily on hand-crafted symbolic priors. To address the issue, we introduce ENAP (Emergent Neural Automaton Policy), a framework that allows a bi-level neuro-symbolic policy adaptively emerge from visuomotor demonstrations. Specifically, we first employ adaptive clustering and an extension of the L* algorithm to infer a Mealy state machine from visuomotor data, which serves as an interpretable high-level planner capturing latent task modes. Then, this discrete structure guides a low-level reactive residual network to learn precise continuous control via behavior cloning (BC). By explicitly modeling the task structure with discrete transitions and continuous residuals, ENAP achieves high sample efficiency and interpretability without requiring task-specific labels. Extensive experiments on complex manipulation and long-horizon tasks demonstrate that ENAP outperforms state-of-the-art (SoTA) end-to-end VLA policies by up to 27% in low-data regimes, while offering a structured representation of robotic intent (Fig. 1).

ROJan 8, 2024

Simultaneous Task Allocation and Planning for Multi-Robots under Hierarchical Temporal Logic Specifications

Xusheng Luo, Changliu Liu

Research in robotic planning with temporal logic specifications, such as Linear Temporal Logic (LTL), has relied on single formulas. However, as task complexity increases, LTL formulas become lengthy, making them difficult to interpret and generate, and straining the computational capacities of planners. To address this, we introduce a hierarchical structure for a widely used specification type -- LTL on finite traces (LTL$_f$). The resulting language, termed H-LTL$_f$, is defined with both its syntax and semantics. We further prove that H-LTL$_f$ is more expressive than its standard "flat" counterparts. Moreover, we conducted a user study that compared the standard LTL$_f$ with our hierarchical version and found that users could more easily comprehend complex tasks using the hierarchical structure. We develop a search-based approach to synthesize plans for multi-robot systems, achieving simultaneous task allocation and planning. This method approximates the search space by loosely interconnected sub-spaces, each corresponding to an LTL$_f$ specification. The search primarily focuses on a single sub-space, transitioning to another under conditions determined by the decomposition of automata. We develop multiple heuristics to significantly expedite the search. Our theoretical analysis, conducted under mild assumptions, addresses completeness and optimality. Compared to existing methods used in various simulators for service tasks, our approach improves planning times while maintaining comparable solution quality.

LGJun 30, 2024

ModelVerification.jl: a Comprehensive Toolbox for Formally Verifying Deep Neural Networks

Tianhao Wei, Hanjiang Hu, Luca Marzari et al.

Deep Neural Networks (DNN) are crucial in approximating nonlinear functions across diverse applications, ranging from image classification to control. Verifying specific input-output properties can be a highly challenging task due to the lack of a single, self-contained framework that allows a complete range of verification types. To this end, we present \texttt{ModelVerification.jl (MV)}, the first comprehensive, cutting-edge toolbox that contains a suite of state-of-the-art methods for verifying different types of DNNs and safety specifications. This versatile toolbox is designed to empower developers and machine learning practitioners with robust tools for verifying and ensuring the trustworthiness of their DNN models.

SYMar 8, 2021

Formal Verification of Stochastic Systems with ReLU Neural Network Controllers

Shiqi Sun, Yan Zhang, Xusheng Luo et al.

In this work, we address the problem of formal safety verification for stochastic cyber-physical systems (CPS) equipped with ReLU neural network (NN) controllers. Our goal is to find the set of initial states from where, with a predetermined confidence, the system will not reach an unsafe configuration within a specified time horizon. Specifically, we consider discrete-time LTI systems with Gaussian noise, which we abstract by a suitable graph. Then, we formulate a Satisfiability Modulo Convex (SMC) problem to estimate upper bounds on the transition probabilities between nodes in the graph. Using this abstraction, we propose a method to compute tight bounds on the safety probabilities of nodes in this graph, despite possible over-approximations of the transition probabilities between these nodes. Additionally, using the proposed SMC formula, we devise a heuristic method to refine the abstraction of the system in order to further improve the estimated safety bounds. Finally, we corroborate the efficacy of the proposed method with simulation results considering a robot navigation example and comparison against a state-of-the-art verification scheme.

ROJan 14, 2021

Temporal Logic Task Allocation in Heterogeneous Multi-Robot Systems

Xusheng Luo, Michael M. Zavlanos

In this paper, we consider the problem of optimally allocating tasks, expressed as global Linear Temporal Logic (LTL) specifications, to teams of heterogeneous mobile robots. The robots are classified in different types that capture their different capabilities, and each task may require robots of multiple types. The specific robots assigned to each task are immaterial, as long as they are of the desired type. Given a discrete workspace, our goal is to design paths, i.e., sequences of discrete states, for the robots so that the LTL specification is satisfied. To obtain a scalable solution to this complex temporal logic task allocation problem, we propose a hierarchical approach that first allocates specific robots to tasks using the information about the tasks contained in the Nondeterministic Buchi Automaton (NBA) that captures the LTL specification, and then designs low-level executable plans for the robots that respect the high-level assignment. Specifically, we first prune and relax the NBA by removing all negative atomic propositions. This step is motivated by "lazy collision checking" methods in robotics and allows to simplify the planning problem by checking constraint satisfaction only when needed. Then, we extract sequences of subtasks from the relaxed NBA along with their temporal orders, and formulate a Mixed Integer Linear Program (MILP) to allocate these subtasks to the robots. Finally, we define generalized multi-robot path planning problems to obtain low-level executable robot plans that satisfy both the high-level task allocation and the temporal constraints captured by the negative atomic propositions in the original NBA. We show that our method is complete for a subclass of LTL that covers a broad range of tasks and present numerical simulations demonstrating that it can generate paths with lower cost, considerably faster than existing methods.

RONov 3, 2020

Human-in-the-Loop Robot Planning with Non-Contextual Bandit Feedback

Yijie Zhou, Yan Zhang, Xusheng Luo et al.

In this paper, we consider a robot navigation problem in environments populated by humans. The goal is to determine collision-free and dynamically feasible trajectories that also maximize human satisfaction. This is because they may drive the robot close to humans that need help with their work or because they may keep the robot away from humans when it can interfere with human sight or work. In practice, human satisfaction is subjective and hard to describe mathematically. As a result, the planning problem we consider in this paper may lack important contextual information. To address this challenge, we propose a semi-supervised Bayesian Optimization (BO) method to design globally optimal robot trajectories using non-contextual bandit human feedback in the form of complaints or satisfaction ratings that express how satisfactory a trajectory is, without revealing the reason. Since trajectory planning is typically a high-dimensional optimization problem in the space of waypoints that define a trajectory, BO may require prohibitively many queries for human feedback to return a good solution. To this end, we use an autoencoder to reduce the high-dimensional problem space into a low dimensional latent space, which we update using human feedback. Moreover, we improve the exploration efficiency of BO by biasing the search for new trajectories towards dynamically feasible and collision-free trajectories obtained using off-the-shelf motion planners. We demonstrate the efficiency of our proposed trajectory planning method in a scenario with humans that have diversified and unknown demands.

IRMar 30, 2020

AliCoCo: Alibaba E-commerce Cognitive Concept Net

Xusheng Luo, Luxin Liu, Yonghua Yang et al.

One of the ultimate goals of e-commerce platforms is to satisfy various shopping needs for their customers. Much efforts are devoted to creating taxonomies or ontologies in e-commerce towards this goal. However, user needs in e-commerce are still not well defined, and none of the existing ontologies has the enough depth and breadth for universal user needs understanding. The semantic gap in-between prevents shopping experience from being more intelligent. In this paper, we propose to construct a large-scale e-commerce cognitive concept net named "AliCoCo", which is practiced in Alibaba, the largest Chinese e-commerce platform in the world. We formally define user needs in e-commerce, then conceptualize them as nodes in the net. We present details on how AliCoCo is constructed semi-automatically and its successful, ongoing and potential applications in e-commerce.

ROMar 2, 2020

Socially-Aware Robot Planning via Bandit Human Feedback

Xusheng Luo, Yan Zhang, Michael M. Zavlanos

In this paper, we consider the problem of designing collision-free, dynamically feasible, and socially-aware trajectories for robots operating in environments populated by humans. We define trajectories to be social-aware if they do not interfere with humans in any way that causes discomfort. In this paper, discomfort is defined broadly and, depending on specific individuals, it can result from the robot being too close to a human or from interfering with human sight or tasks. Moreover, we assume that human feedback is a bandit feedback indicating a complaint or no complaint on the part of the robot trajectory that interferes with the humans, and it does not reveal any contextual information about the locations of the humans or the reason for a complaint. Finally, we assume that humans can move in the obstacle-free space and, as a result, human utility can change. We formulate this planning problem as an online optimization problem that minimizes the social value of the time-varying robot trajectory, defined by the total number of incurred human complaints. As the human utility is unknown, we employ zeroth order, or derivative-free, optimization methods to solve this problem, which we combine with off-the-shelf motion planners to satisfy the dynamic feasibility and collision-free specifications of the resulting trajectories. To the best of our knowledge, this is a new framework for socially-aware robot planning that is not restricted to avoiding collisions with humans but, instead, focuses on increasing the social value of the robot trajectories using only bandit human feedback.

IROct 8, 2019

Conceptualize and Infer User Needs in E-commerce

Xusheng Luo, Yonghua Yang, Kenny Q. Zhu et al.

Understanding latent user needs beneath shopping behaviors is critical to e-commercial applications. Without a proper definition of user needs in e-commerce, most industry solutions are not driven directly by user needs at current stage, which prevents them from further improving user satisfaction. Representing implicit user needs explicitly as nodes like "outdoor barbecue" or "keep warm for kids" in a knowledge graph, provides new imagination for various e- commerce applications. Backed by such an e-commerce knowledge graph, we propose a supervised learning algorithm to conceptualize user needs from their transaction history as "concept" nodes in the graph and infer those concepts for each user through a deep attentive model. Offline experiments demonstrate the effectiveness and stability of our model, and online industry strength tests show substantial advantages of such user needs understanding.

ROSep 2, 2019

An Abstraction-Free Method for Multi-Robot Temporal Logic Optimal Control Synthesis

Xusheng Luo, Yiannis Kantaros, Michael M. Zavlanos

The majority of existing Linear Temporal Logic (LTL) planning methods rely on the construction of a discrete product automaton, that combines a discrete abstraction of robot mobility and a B$\ddot{\text{u}}$chi automaton that captures the LTL specification. Representing this product automaton as a graph and using graph search techniques, optimal plans that satisfy the LTL task can be synthesized. However, constructing expressive discrete abstractions makes the synthesis problem computationally intractable. In this paper, we propose a new sampling-based LTL planning algorithm that does not require any discrete abstraction of robot mobility. Instead, it incrementally builds trees that explore the product state-space, until a maximum number of iterations is reached or a feasible plan is found. The use of trees makes data storage and graph search tractable, which significantly increases the scalability of our algorithm. To accelerate the construction of feasible plans, we introduce bias in the sampling process which is guided by transitions in the B$\ddot{\text{u}}$chi automaton that belong to the shortest path to the accepting states. We show that our planning algorithm, with and without bias, is probabilistically complete and asymptotically optimal. Finally, we present numerical experiments showing that our method outperforms relevant temporal logic planning methods.

OCMar 25, 2019

An Optimal Graph-Search Method for Secure State Estimation

Xusheng Luo, Miroslav Pajic, Michael M. Zavlanos

The growing complexity of modern Cyber-Physical Systems (CPS) and the frequent communication between their components make them vulnerable to malicious attacks. As a result, secure state estimation is a critical requirement for the control of these systems. Many existing secure state estimation methods suffer from combinatorial complexity which grows with the number of states and sensors in the system. This complexity can be mitigated using optimization-based methods that relax the original state estimation problem, although at the cost of optimality as these methods often identify attack-free sensors as attacked. In this paper, we propose a new optimal graph-search algorithm to correctly identify malicious attacks and to securely estimate the states even in large-scale CPS modeled as linear time-invariant systems. The graph consists of layers, each one containing two nodes capturing a truth assignment of any given sensor, and directed edges connecting adjacent layers only. Then, our algorithm searches the layers of this graph incrementally, favoring directions at higher layers with more attack-free assignments, while actively managing a repository of nodes to be expanded at later iterations. The proposed search bias and the ability to revisit nodes in the repository and self-correct, allow our graph-search algorithm to reach the optimal assignment faster and tackle larger problems. We show that our algorithm is complete and optimal provided that process and measurement noises do not dominate the attack signal. Moreover, we provide numerical simulations that demonstrate the ability of our algorithm to correctly identify attacked sensors and securely reconstruct the state. Our simulations show that our method outperforms existing algorithms both in terms of optimality and execution time.

CLMar 30, 2018

Automatic Generation of Chinese Short Product Titles for Mobile Display

Yu Gong, Xusheng Luo, Kenny Q. Zhu et al.

This paper studies the problem of automatically extracting a short title from a manually written longer description of E-commerce products for display on mobile devices. It is a new extractive summarization problem on short text inputs, for which we propose a feature-enriched network model, combining three different categories of features in parallel. Experimental results show that our framework significantly outperforms several baselines by a substantial gain of 4.5%. Moreover, we produce an extractive summarization dataset for E-commerce short texts and will release it to the research community.

CLMar 30, 2018

Deep Cascade Multi-task Learning for Slot Filling in Online Shopping Assistant

Yu Gong, Xusheng Luo, Yu Zhu et al.

Slot filling is a critical task in natural language understanding (NLU) for dialog systems. State-of-the-art approaches treat it as a sequence labeling problem and adopt such models as BiLSTM-CRF. While these models work relatively well on standard benchmark datasets, they face challenges in the context of E-commerce where the slot labels are more informative and carry richer expressions. In this work, inspired by the unique structure of E-commerce knowledge base, we propose a novel multi-task model with cascade and residual connections, which jointly learns segment tagging, named entity tagging and slot filling. Experiments show the effectiveness of the proposed cascade and residual structures. Our model has a 14.6% advantage in F1 score over the strong baseline methods on a new Chinese E-commerce shopping assistant dataset, while achieving competitive accuracies on a standard dataset. Furthermore, online test deployed on such dominant E-commerce platform shows 130% improvement on accuracy of understanding user utterances. Our model has already gone into production in the E-commerce platform.