Xiang Yin

CV
h-index19
71papers
2,867citations
Novelty50%
AI Score59

71 Papers

ASJun 6, 2023
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias

Ziyue Jiang, Yi Ren, Zhenhui Ye et al.

Scaling text-to-speech to a large and wild dataset has been proven to be highly effective in achieving timbre and speech style generalization, particularly in zero-shot TTS. However, previous works usually encode speech into latent using audio codec and use autoregressive language models or diffusion models to generate it, which ignores the intrinsic nature of speech and may lead to inferior or uncontrollable results. We argue that speech can be decomposed into several attributes (e.g., content, timbre, prosody, and phase) and each of them should be modeled using a module with appropriate inductive biases. From this perspective, we carefully design a novel and large zero-shot TTS system called Mega-TTS, which is trained with large-scale wild data and models different attributes in different ways: 1) Instead of using latent encoded by audio codec as the intermediate feature, we still choose spectrogram as it separates the phase and other attributes very well. Phase can be appropriately constructed by the GAN-based vocoder and does not need to be modeled by the language model. 2) We model the timbre using global vectors since timbre is a global attribute that changes slowly over time. 3) We further use a VQGAN-based acoustic model to generate the spectrogram and a latent code language model to fit the distribution of prosody, since prosody changes quickly over time in a sentence, and language models can capture both local and long-range dependencies. We scale Mega-TTS to multi-domain datasets with 20K hours of speech and evaluate its performance on unseen speakers. Experimental results demonstrate that Mega-TTS surpasses state-of-the-art TTS systems on zero-shot TTS, speech editing, and cross-lingual TTS tasks, with superior naturalness, robustness, and speaker similarity due to the proper inductive bias of each module. Audio samples are available at https://mega-tts.github.io/demo-page.

CVJun 4, 2023Code
Detector Guidance for Multi-Object Text-to-Image Generation

Luping Liu, Zijian Zhang, Yi Ren et al.

Diffusion models have demonstrated impressive performance in text-to-image generation. They utilize a text encoder and cross-attention blocks to infuse textual information into images at a pixel level. However, their capability to generate images with text containing multiple objects is still restricted. Previous works identify the problem of information mixing in the CLIP text encoder and introduce the T5 text encoder or incorporate strong prior knowledge to assist with the alignment. We find that mixing problems also occur on the image side and in the cross-attention blocks. The noisy images can cause different objects to appear similar, and the cross-attention blocks inject information at a pixel level, leading to leakage of global object understanding and resulting in object mixing. In this paper, we introduce Detector Guidance (DG), which integrates a latent object detection model to separate different objects during the generation process. DG first performs latent object detection on cross-attention maps (CAMs) to obtain object information. Based on this information, DG then masks conflicting prompts and enhances related prompts by manipulating the following CAMs. We evaluate the effectiveness of DG using Stable Diffusion on COCO, CC, and a novel multi-related object benchmark, MRO. Human evaluations demonstrate that DG provides an 8-22\% advantage in preventing the amalgamation of conflicting concepts and ensuring that each object possesses its unique region without any human involvement and additional iterations. Our implementation is available at \url{https://github.com/luping-liu/Detector-Guidance}.

SDJan 30, 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models

Rongjie Huang, Jiawei Huang, Dongchao Yang et al.

Large-scale multimodal generative modeling has created milestones in text-to-image and text-to-video generation. Its application to audio still lags behind for two main reasons: the lack of large-scale datasets with high-quality text-audio pairs, and the complexity of modeling long continuous audio data. In this work, we propose Make-An-Audio with a prompt-enhanced diffusion model that addresses these gaps by 1) introducing pseudo prompt enhancement with a distill-then-reprogram approach, it alleviates data scarcity with orders of magnitude concept compositions by using language-free audios; 2) leveraging spectrogram autoencoder to predict the self-supervised audio representation instead of waveforms. Together with robust contrastive language-audio pretraining (CLAP) representations, Make-An-Audio achieves state-of-the-art results in both objective and subjective benchmark evaluation. Moreover, we present its controllability and generalization for X-to-Audio with "No Modality Left Behind", for the first time unlocking the ability to generate high-definition, high-fidelity audios given a user-defined modality input. Audio samples are available at https://Text-to-Audio.github.io

CLJul 31, 2024Code
Generative Expressive Conversational Speech Synthesis

Rui Liu, Yifan Hu, Yi Ren et al.

Conversational Speech Synthesis (CSS) aims to express a target utterance with the proper speaking style in a user-agent conversation setting. Existing CSS methods employ effective multi-modal context modeling techniques to achieve empathy understanding and expression. However, they often need to design complex network architectures and meticulously optimize the modules within them. In addition, due to the limitations of small-scale datasets containing scripted recording styles, they often fail to simulate real natural conversational styles. To address the above issues, we propose a novel generative expressive CSS system, termed GPT-Talker.We transform the multimodal information of the multi-turn dialogue history into discrete token sequences and seamlessly integrate them to form a comprehensive user-agent dialogue context. Leveraging the power of GPT, we predict the token sequence, that includes both semantic and style knowledge, of response for the agent. After that, the expressive conversational speech is synthesized by the conversation-enriched VITS to deliver feedback to the user.Furthermore, we propose a large-scale Natural CSS Dataset called NCSSD, that includes both naturally recorded conversational speech in improvised styles and dialogues extracted from TV shows. It encompasses both Chinese and English languages, with a total duration of 236 hours.We conducted comprehensive experiments on the reliability of the NCSSD and the effectiveness of our GPT-Talker. Both subjective and objective evaluations demonstrate that our model outperforms other state-of-the-art CSS systems significantly in terms of naturalness and expressiveness. The Code, Dataset, and Pre-trained Model are available at: https://github.com/AI-S2-Lab/GPT-Talker.

CVOct 16, 2023Code
SoybeanNet: Transformer-Based Convolutional Neural Network for Soybean Pod Counting from Unmanned Aerial Vehicle (UAV) Images

Jiajia Li, Raju Thada Magar, Dong Chen et al.

Soybeans are a critical source of food, protein and oil, and thus have received extensive research aimed at enhancing their yield, refining cultivation practices, and advancing soybean breeding techniques. Within this context, soybean pod counting plays an essential role in understanding and optimizing production. Despite recent advancements, the development of a robust pod-counting algorithm capable of performing effectively in real-field conditions remains a significant challenge This paper presents a pioneering work of accurate soybean pod counting utilizing unmanned aerial vehicle (UAV) images captured from actual soybean fields in Michigan, USA. Specifically, this paper presents SoybeanNet, a novel point-based counting network that harnesses powerful transformer backbones for simultaneous soybean pod counting and localization with high accuracy. In addition, a new dataset of UAV-acquired images for soybean pod counting was created and open-sourced, consisting of 113 drone images with more than 260k manually annotated soybean pods captured under natural lighting conditions. Through comprehensive evaluations, SoybeanNet demonstrated superior performance over five state-of-the-art approaches when tested on the collected images. Remarkably, SoybeanNet achieved a counting accuracy of $84.51\%$ when tested on the testing dataset, attesting to its efficacy in real-world scenarios. The publication also provides both the source code (\url{https://github.com/JiajiaLi04/Soybean-Pod-Counting-from-UAV-Images}) and the labeled soybean dataset (\url{https://www.kaggle.com/datasets/jiajiali/uav-based-soybean-pod-images}), offering a valuable resource for future research endeavors in soybean pod counting and related fields.

SYJan 17, 2017
Cloud Resource Allocation for Cloud-Based Automotive Applications

Zhaojian Li, Tianshu Chu, Ilya V. Kolmanovsky et al.

There is a rapidly growing interest in the use of cloud computing for automotive vehicles to facilitate computation and data intensive tasks. Efficient utilization of on-demand cloud resources holds a significant potential to improve future vehicle safety, comfort, and fuel economy. In the meanwhile, issues like cyber security and resource allocation pose great challenges. In this paper, we treat the resource allocation problem for cloud-based automotive systems. Both private and public cloud paradigms are considered where a private cloud provides an internal, company-owned internet service dedicated to its own vehicles while a public cloud serves all subscribed vehicles. This paper establishes comprehensive models of cloud resource provisioning for both private and public cloud- based automotive systems. Complications such as stochastic communication delays and task deadlines are explicitly considered. In particular, a centralized resource provisioning model is developed for private cloud and chance constrained optimization is exploited to utilize the cloud resources for best Quality of Services. On the other hand, a decentralized auction-based model is developed for public cloud and reinforcement learning is employed to obtain an optimal bidding policy for a "selfish" agent. Numerical examples are presented to illustrate the effectiveness of the developed techniques.

SYFeb 25, 2019
On Approximate Opacity of Cyber-Physical Systems

Xiang Yin, Majid Zamani

Opacity is an important information-flow security property in the analysis of cyber-physical systems. It captures the plausible deniability of the system's secret behavior in the presence of an intruder that may access the information flow. Existing works on opacity only consider non-metric systems by assuming that the intruder can always distinguish two different outputs precisely. In this paper, we extend the concept of opacity to systems whose output sets are equipped with metrics. Such systems are widely used in the modeling of many real-world systems whose measurements are physical signals. A new concept called approximate opacity is proposed in order to quantitatively evaluate the security guarantee level with respect to the measurement precision of the intruder. Then we propose a new simulation-type relation, called approximate opacity preserving simulation relation, which characterizes how close two systems are in terms of the satisfaction of approximate opacity. This allows us to verify approximate opacity for large-scale, or even infinite systems, using their abstractions. We also discuss how to construct approximate opacity preserving symbolic models for a class of discrete-time control systems. Our results extend the definitions and analysis techniques for opacity from non-metric systems to metric systems.

AINov 21, 2022
Explaining Random Forests using Bipolar Argumentation and Markov Networks (Technical Report)

Nico Potyka, Xiang Yin, Francesca Toni

Random forests are decision tree ensembles that can be used to solve a variety of machine learning problems. However, as the number of trees and their individual size can be large, their decision making process is often incomprehensible. In order to reason about the decision process, we propose representing it as an argumentation problem. We generalize sufficient and necessary argumentative explanations using a Markov network encoding, discuss the relevance of these explanations and establish relationships to families of abductive explanations from the literature. As the complexity of the explanation problems is high, we discuss a probabilistic approximation algorithm and present first experimental results.

SYMar 11, 2019
A New Microscopic Traffic Model Using a Spring-Mass-Damper-Clutch System

Zhaojian Li, Firas Khasawneh, Xiang Yin et al.

Microscopic traffic models describe how cars interact with their neighbors in an uninterrupted traffic flow and are frequently used for reference in advanced vehicle control design. In this paper, we propose a novel mechanical system inspired microscopic traffic model using a mass-spring-damper-clutch system. This model naturally captures the ego vehicle's resistance to large relative speed and deviation from a (driver and speed dependent) desired relative distance when following the lead vehicle. Comparing to existing car following (CF) models, this model offers physically interpretable insights on the underlying CF dynamics, and is able to characterize the impact of the ego vehicle on the lead vehicle, which is neglected in existing CF models. Thanks to the nonlinear wave propagation analysis techniques for mechanical systems, the proposed model therefore has great scalability so that multiple mass-spring-damper-clutch system can be chained to study the macroscopic traffic flow. We investigate the stability of the proposed model on the system parameters and the time delay using spectral element method. We also develop a parallel recursive least square with inverse QR decomposition (PRLS-IQR) algorithm to identify the model parameters online. These real-time estimated parameters can be used to predict the driving trajectory that can be incorporated in advanced vehicle longitudinal control systems for improved safety and fuel efficiency. The PRLS-IQR is computationally efficient and numerically stable so it is suitable for online implementation. The traffic model and the parameter identification algorithm are validated on both simulations and naturalistic driving data from multiple drivers. Promising performance is demonstrated.

ROMay 20Code
STEAM: A Training-Free Congestion-Aware Enhancement Framework for Decentralized Multi-Agent Path Finding

Mingyang Feng, Mengnuo Zhang, Shaoyuan Li et al.

We propose STEAM (Spatial, Temporal, and Emergent congestion Awareness for MAPF), a training-free test-time enhancement framework for learning-based decentralized Multi-Agent Path Finding (MAPF) in discrete environments. Given a pretrained decentralized policy, STEAM requires no retraining, architectural modification, or replacement by a centralized planner. Instead, it injects lightweight congestion-aware guidance into the original policy execution. STEAM first rolls out the shortest paths induced by the current cost-to-go maps to identify potential future congestion hotspots. Spatially avoidable congestion is mitigated by updating agent-specific cost-to-go information, while spatially unavoidable bottlenecks are handled through temporal logit correction. In addition, emergent local congestion is reduced by a density-aware logit correction based on neighboring agents' corrected cost-to-go maps. Extensive experiments on representative learning-based decentralized MAPF algorithms show that STEAM consistently improves success rate, makespan, and solution cost, with success-rate gains of up to 60% and only minor computational overhead. The implementation is available at https://anonymous.4open.science/r/STEAM-MAPF-7A62.

CVFeb 9Code
ALIVE: Animate Your World with Lifelike Audio-Video Generation

Ying Guo, Qijun Gan, Yifu Zhang et al.

Video generation is rapidly evolving towards unified audio-video generation. In this paper, we present ALIVE, a generation model that adapts a pretrained Text-to-Video (T2V) model to Sora-style audio-video generation and animation. In particular, the model unlocks the Text-to-Video&Audio (T2VA) and Reference-to-Video&Audio (animation) capabilities compared to the T2V foundation models. To support the audio-visual synchronization and reference animation, we augment the popular MMDiT architecture with a joint audio-video branch which includes TA-CrossAttn for temporally-aligned cross-modal fusion and UniTemp-RoPE for precise audio-visual alignment. Meanwhile, a comprehensive data pipeline consisting of audio-video captioning, quality control, etc., is carefully designed to collect high-quality finetuning data. Additionally, we introduce a new benchmark to perform a comprehensive model test and comparison. After continue pretraining and finetuning on million-level high-quality data, ALIVE demonstrates outstanding performance, consistently outperforming open-source models and matching or surpassing state-of-the-art commercial solutions. With detailed recipes and benchmarks, we hope ALIVE helps the community develop audio-video generation models more efficiently. Official page: https://github.com/FoundationVision/Alive.

AIJul 25, 2023
Argument Attribution Explanations in Quantitative Bipolar Argumentation Frameworks (Technical Report)

Xiang Yin, Nico Potyka, Francesca Toni

Argumentative explainable AI has been advocated by several in recent years, with an increasing interest on explaining the reasoning outcomes of Argumentation Frameworks (AFs). While there is a considerable body of research on qualitatively explaining the reasoning outcomes of AFs with debates/disputes/dialogues in the spirit of extension-based semantics, explaining the quantitative reasoning outcomes of AFs under gradual semantics has not received much attention, despite widespread use in applications. In this paper, we contribute to filling this gap by proposing a novel theory of Argument Attribution Explanations (AAEs) by incorporating the spirit of feature attribution from machine learning in the context of Quantitative Bipolar Argumentation Frameworks (QBAFs): whereas feature attribution is used to determine the influence of features towards outputs of machine learning models, AAEs are used to determine the influence of arguments towards topic arguments of interest. We study desirable properties of AAEs, including some new ones and some partially adapted from the literature to our setting. To demonstrate the applicability of our AAEs in practice, we conclude by carrying out two case studies in the scenarios of fake news detection and movie recommender systems.

SYOct 8, 2017
Complexity of Detectability, Opacity and A-Diagnosability for Modular Discrete Event Systems

Tomáš Masopust, Xiang Yin

We study the complexity of deciding whether a modular discrete event system is detectable (resp. opaque, A-diagnosable). Detectability arises in the state estimation of discrete event systems, opacity is related to the privacy and security analysis, and A-diagnosability appears in the fault diagnosis of stochastic discrete event systems. Previously, deciding weak detectability (opacity, A-diagnosability) for monolithic systems was shown to be PSPACE-complete. In this paper, we study the complexity of deciding weak detectability (opacity, A-diagnosability) for modular systems. We show that the complexities of these problems are significantly worse than in the monolithic case. Namely, we show that deciding modular weak detectability (opacity, A-diagnosability) is EXPSPACE-complete. We further discuss a special case where all unobservable events are private, and show that in this case the problems are PSPACE-complete. Consequently, if the systems are all fully observable, then deciding weak detectability (opacity) for modular systems is PSPACE-complete.

LONov 29, 2018
Deciding Detectability for Labeled Petri Nets

Tomas Masopust, Xiang Yin

Detectability of discrete event systems (DESs) is a property to determine a priori whether the current and subsequent states can be determined based on observations. In this paper, we investigate the verification of two detectability properties -- strong detectability and weak detectability -- for DESs modeled by labeled Petri nets. Strong detectability requires that we can always determine, after a finite number of observations, the current and subsequent markings of the system, while weak detectability requires that we can determine, after a finite number of observations, the current and subsequent markings for some trajectories of the system. We show that for DESs modeled by labeled Petri nets, checking strong detectability is decidable whereas checking weak detectability is undecidable. Our results extend the existing studies on the verification of detectability from finite-state automata to labeled Petri nets. As a consequence, we strengthen a result on checking current-state opacity for labeled Petri nets.

LGMay 19, 2022
Towards a Theory of Faithfulness: Faithful Explanations of Differentiable Classifiers over Continuous Data

Nico Potyka, Xiang Yin, Francesca Toni

There is broad agreement in the literature that explanation methods should be faithful to the model that they explain, but faithfulness remains a rather vague term. We revisit faithfulness in the context of continuous data and propose two formal definitions of faithfulness for feature attribution methods. Qualitative faithfulness demands that scores reflect the true qualitative effect (positive vs. negative) of the feature on the model and quanitative faithfulness that the magnitude of scores reflect the true quantitative effect. We discuss under which conditions these requirements can be satisfied to which extent (local vs global). As an application of the conceptual idea, we look at differentiable classifiers over continuous data and characterize Gradient-scores as follows: every qualitatively faithful feature attribution method is qualitatively equivalent to Gradient-scores. Furthermore, if an attribution method is quantitatively faithful in the sense that changes of the output of the classifier are proportional to the scores of features, then it is either equivalent to gradient-scoring or it is based on an inferior approximation of the classifier. To illustrate the practical relevance of the theory, we experimentally demonstrate that popular attribution methods can fail to give faithful explanations in the setting where the data is continuous and the classifier differentiable.

SDAug 8, 2024
MulliVC: Multi-lingual Voice Conversion With Cycle Consistency

Jiawei Huang, Chen Zhang, Yi Ren et al.

Voice conversion aims to modify the source speaker's voice to resemble the target speaker while preserving the original speech content. Despite notable advancements in voice conversion these days, multi-lingual voice conversion (including both monolingual and cross-lingual scenarios) has yet to be extensively studied. It faces two main challenges: 1) the considerable variability in prosody and articulation habits across languages; and 2) the rarity of paired multi-lingual datasets from the same speaker. In this paper, we propose MulliVC, a novel voice conversion system that only converts timbre and keeps original content and source language prosody without multi-lingual paired data. Specifically, each training step of MulliVC contains three substeps: In step one the model is trained with monolingual speech data; then, steps two and three take inspiration from back translation, construct a cyclical process to disentangle the timbre and other information (content, prosody, and other language-related information) in the absence of multi-lingual data from the same speaker. Both objective and subjective results indicate that MulliVC significantly surpasses other methods in both monolingual and cross-lingual contexts, demonstrating the system's efficacy and the viability of the three-step approach with cycle consistency. Audio samples can be found on our demo page (mullivc.github.io).

CVAug 15, 2022
Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective

Pengfei Wei, Lingdong Kong, Xinghua Qu et al.

Unsupervised video domain adaptation is a practical yet challenging task. In this work, for the first time, we tackle it from a disentanglement view. Our key idea is to handle the spatial and temporal domain divergence separately through disentanglement. Specifically, we consider the generation of cross-domain videos from two sets of latent factors, one encoding the static information and another encoding the dynamic information. A Transfer Sequential VAE (TranSVAE) framework is then developed to model such generation. To better serve for adaptation, we propose several objectives to constrain the latent factors. With these constraints, the spatial divergence can be readily removed by disentangling the static domain-specific information out, and the temporal divergence is further reduced from both frame- and video-levels through adversarial learning. Extensive experiments on the UCF-HMDB, Jester, and Epic-Kitchens datasets verify the effectiveness and superiority of TranSVAE compared with several state-of-the-art approaches. Code is publicly available.

ROMay 12
SAGAS: Semantic-Aware Graph-Assisted Stitching for Offline Temporal Logic Planning

Ruijia Liu, Ancheng Hou, Xiang Yin

Linear Temporal Logic (LTL) provides a rigorous framework for specifying long-horizon robotic tasks, yet existing approaches face a trade-off: model-based synthesis relies on accurate labeled transition systems, whereas learning-based methods often require online interaction, task-specific rewards, or specification-conditioned training. We study LTL-specified robotic planning and execution in a stricter offline, model-free setting, where the agent is given only fixed, task-agnostic trajectory fragments, with no dynamics model, task demonstrations, or online data collection. To address this setting, we propose SAGAS, a framework that combines the compositionality of symbolic synthesis with the data-driven reachability structure learned from offline trajectories. SAGAS first learns a reusable latent reachability graph and a frozen goal-conditioned executor from fragmented offline data. For each new LTL formula, it performs task-time semantic graph augmentation to ground state-defined propositions on the learned graph, and applies Büchi product search to synthesize a cost-aware accepting prefix--suffix waypoint plan executed by the frozen executor. By shifting formula-specific reasoning from policy learning to test-time graph augmentation and symbolic search, SAGAS enables zero-shot generalization to unseen, data-supported LTL specifications without task-specific reward design, policy retraining, or online interaction. Experiments on LTL task suites constructed from OGBench locomotion domains show that this design produces executable and cost-efficient prefix--suffix behaviors for diverse unseen LTL tasks from fragmented offline data.

CVJun 6, 2023
Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis

Zhenhui Ye, Ziyue Jiang, Yi Ren et al.

We are interested in a novel task, namely low-resource text-to-talking avatar. Given only a few-minute-long talking person video with the audio track as the training data and arbitrary texts as the driving input, we aim to synthesize high-quality talking portrait videos corresponding to the input text. This task has broad application prospects in the digital human industry but has not been technically achieved yet due to two challenges: (1) It is challenging to mimic the timbre from out-of-domain audio for a traditional multi-speaker Text-to-Speech system. (2) It is hard to render high-fidelity and lip-synchronized talking avatars with limited training data. In this paper, we introduce Adaptive Text-to-Talking Avatar (Ada-TTA), which (1) designs a generic zero-shot multi-speaker TTS model that well disentangles the text content, timbre, and prosody; and (2) embraces recent advances in neural rendering to achieve realistic audio-driven talking face video generation. With these designs, our method overcomes the aforementioned two challenges and achieves to generate identity-preserving speech and realistic talking person video. Experiments demonstrate that our method could synthesize realistic, identity-preserving, and audio-visual synchronized talking avatar videos.

CVAug 29, 2023
C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model

Longbin Ji, Pengfei Wei, Yi Ren et al.

Co-speech gesture generation is crucial for automatic digital avatar animation. However, existing methods suffer from issues such as unstable training and temporal inconsistency, particularly in generating high-fidelity and comprehensive gestures. Additionally, these methods lack effective control over speaker identity and temporal editing of the generated gestures. Focusing on capturing temporal latent information and applying practical controlling, we propose a Controllable Co-speech Gesture Generation framework, named C2G2. Specifically, we propose a two-stage temporal dependency enhancement strategy motivated by latent diffusion models. We further introduce two key features to C2G2, namely a speaker-specific decoder to generate speaker-related real-length skeletons and a repainting strategy for flexible gesture generation/editing. Extensive experiments on benchmark gesture datasets verify the effectiveness of our proposed C2G2 compared with several state-of-the-art baselines. The link of the project demo page can be found at https://c2g2-gesture.github.io/c2_gesture

SYMay 25
Deterministic and Nonblocking Supervisory Control of Discrete Event Systems under Cyber Attacks

Feng Lin, Caisheng Wang, Jun Chen et al.

We investigate deterministic and nonblocking supervisory control of discrete event systems under cyber-attacks using the ALTER (Attack Language for Transition-basEd Replacement) model. While prior works consider supervisory control that achieves either the large (upper bound) language or small (lower bound) language separately, deterministic supervisory control achieves both large language and small language at the same time to ensure that the language generated by the supervised system is unique and deterministic. We introduce two new concepts of CA-D-controllability and CA-D-observability and prove that they are necessary and sufficient for the existence of a deterministic supervisor. For nonblocking supervisory control, the objective is to ensure that the supervised system can always reach marked states under any attack scenario. We prove that relative closure, CA-D-controllability, and CA-D-observability together are necessary and sufficient for the existence of a nonblocking supervisor. We further develop methods to verify CA-D-controllability and CA-D-observability. We also illustrate our results using a robotic system example.

CLMar 2, 2023
LiteG2P: A fast, light and high accuracy model for grapheme-to-phoneme conversion

Chunfeng Wang, Peisong Huang, Yuxiang Zou et al.

As a key component of automated speech recognition (ASR) and the front-end in text-to-speech (TTS), grapheme-to-phoneme (G2P) plays the role of converting letters to their corresponding pronunciations. Existing methods are either slow or poor in performance, and are limited in application scenarios, particularly in the process of on-device inference. In this paper, we integrate the advantages of both expert knowledge and connectionist temporal classification (CTC) based neural network and propose a novel method named LiteG2P which is fast, light and theoretically parallel. With the carefully leading design, LiteG2P can be applied both on cloud and on device. Experimental results on the CMU dataset show that the performance of the proposed method is superior to the state-of-the-art CTC based method with 10 times fewer parameters, and even comparable to the state-of-the-art Transformer-based sequence-to-sequence model with less parameters and 33 times less computation.

CLJun 10, 2022
A Novel Chinese Dialect TTS Frontend with Non-Autoregressive Neural Machine Translation

Junhui Zhang, Wudi Bao, Junjie Pan et al.

Chinese dialects are different variations of Chinese and can be considered as different languages in the same language family with Mandarin. Though they all use Chinese characters, the pronunciations, grammar and idioms can vary significantly, and even local speakers may find it hard to input correct written forms of dialect. Besides, using Mandarin text as text-to-speech inputs would generate speech with poor naturalness. In this paper, we propose a novel Chinese dialect TTS frontend with a translation module, which converts Mandarin text into dialectic expressions to improve the intelligibility and naturalness of synthesized speech. A non-autoregressive neural machine translation model with various tricks is proposed for the translation task. It is the first known work to incorporate translation with TTS frontend. Experiments on Cantonese show the proposed model improves 2.56 BLEU and TTS improves 0.27 MOS with Mandarin inputs.

AIMay 7Code
ReasonSTL: Bridging Natural Language and Signal Temporal Logic via Tool-Augmented Process-Rewarded Learning

Bowen Ye, Zhijian Li, Junyue Huang et al.

Signal Temporal Logic (STL) is an expressive formal language for specifying spatio-temporal requirements over real-valued, real-time signals. It has been widely used for the verification and synthesis of autonomous systems and cyber-physical systems. In practice, however, users often express their requirements in natural language rather than in structured STL formulas, making natural-language-to-STL translation a critical yet challenging task. Manual specification requires temporal-logic expertise and cannot scale, while prompting commercial LLM APIs incurs substantial token costs and may expose sensitive system requirements to third-party services, raising privacy concerns for industrial deployment. To address these challenges, we present \textsc{ReasonSTL}, a tool-augmented framework that adapts local open-source language models for natural-language-to-STL generation. \textsc{ReasonSTL} decomposes the translation process into explicit reasoning, deterministic tool calls, and structured formula construction. We further introduce process-rewarded training to supervise both tool-use trajectories and final formulas, together with \textsc{STL-Bench}, a bilingual, computation-aware benchmark grounded in real-world signals. Experiments show that a 4B model trained with \textsc{ReasonSTL} achieves state-of-the-art performance in both automatic metrics and human evaluations, demonstrating that \textsc{ReasonSTL} provides a transparent, low-cost, and privacy-preserving alternative for formal specification drafting.

ROMay 22
Signal Temporal Logic Motion Planning via Graphs of Convex Sets

Yu Chen, Ancheng Hou, Mingyang Feng et al.

This paper investigates continuous-time motion planning under Signal Temporal Logic (STL) specifications. The goal is to generate smooth robot trajectories that satisfy high-level logical and timing requirements while respecting low-level motion constraints. To this end, we propose an efficient framework that combines timed-automata reasoning with graphs of convex sets (GCS). An STL specification is first represented by a timed automaton, which is then coupled with a convex decomposition of the configuration space to form a joint transition system encoding both task progress and region occupancy. Based on this joint transition system, the STL motion-planning problem is reformulated as a shortest-path problem over a GCS, whose solution induces a smooth Bézier-spline trajectory satisfying the STL specification, smoothness requirements, and velocity bounds. We establish the soundness of the proposed formulation and analyze its computational complexity, showing that, once the timed automaton and convex decomposition are fixed, the convex relaxation scales polynomially with the configuration-space dimension and the Bézier degree. We further develop a compact timed-automaton construction for an expressive STL fragment using dedicated templates and Boolean composition. Numerical experiments on low-dimensional benchmarks, a $3$-D quadrotor, a $30$-DoF humanoid, and a hardware experiment on a UR-3 robot arm demonstrate that the proposed method efficiently solves complex STL motion-planning problems and produces smooth executable trajectories.

CLDec 12, 2022
Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features

Junhui Zhang, Junjie Pan, Xiang Yin et al.

Speech-to-speech translation directly translates a speech utterance to another between different languages, and has great potential in tasks such as simultaneous interpretation. State-of-art models usually contains an auxiliary module for phoneme sequences prediction, and this requires textual annotation of the training dataset. We propose a direct speech-to-speech translation model which can be trained without any textual annotation or content information. Instead of introducing an auxiliary phoneme prediction task in the model, we propose to use bottleneck features as intermediate training objectives for our model to ensure the translation performance of the system. Experiments on Mandarin-Cantonese speech translation demonstrate the feasibility of the proposed approach and the performance can match a cascaded system with respect of translation and synthesis qualities.

AIJul 11, 2024
CE-QArg: Counterfactual Explanations for Quantitative Bipolar Argumentation Frameworks (Technical Report)

Xiang Yin, Nico Potyka, Francesca Toni

There is a growing interest in understanding arguments' strength in Quantitative Bipolar Argumentation Frameworks (QBAFs). Most existing studies focus on attribution-based methods that explain an argument's strength by assigning importance scores to other arguments but fail to explain how to change the current strength to a desired one. To solve this issue, we introduce counterfactual explanations for QBAFs. We discuss problem variants and propose an iterative algorithm named Counterfactual Explanations for Quantitative bipolar Argumentation frameworks (CE-QArg). CE-QArg can identify valid and cost-effective counterfactual explanations based on two core modules, polarity and priority, which help determine the updating direction and magnitude for each argument, respectively. We discuss some formal properties of our counterfactual explanations and empirically evaluate CE-QArg on randomly generated QBAFs.

CLDec 19, 2023Code
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling

Rui Liu, Yifan Hu, Yi Ren et al.

Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting. While recognising the significance of CSS task, the prior studies have not thoroughly investigated the emotional expressiveness problems due to the scarcity of emotional conversational datasets and the difficulty of stateful emotion modeling. In this paper, we propose a novel emotional CSS model, termed ECSS, that includes two main components: 1) to enhance emotion understanding, we introduce a heterogeneous graph-based emotional context modeling mechanism, which takes the multi-source dialogue history as input to model the dialogue context and learn the emotion cues from the context; 2) to achieve emotion rendering, we employ a contrastive learning-based emotion renderer module to infer the accurate emotion style for the target utterance. To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity, and annotate additional emotional information on the existing conversational dataset (DailyTalk). Both objective and subjective evaluations suggest that our model outperforms the baseline models in understanding and rendering emotions. These evaluations also underscore the importance of comprehensive emotional annotations. Code and audio samples can be found at: https://github.com/walker-hyf/ECSS.

AISep 9, 2024
Applying Attribution Explanations in Truth-Discovery Quantitative Bipolar Argumentation Frameworks

Xiang Yin, Nico Potyka, Francesca Toni

Explaining the strength of arguments under gradual semantics is receiving increasing attention. For example, various studies in the literature offer explanations by computing the attribution scores of arguments or edges in Quantitative Bipolar Argumentation Frameworks (QBAFs). These explanations, known as Argument Attribution Explanations (AAEs) and Relation Attribution Explanations (RAEs), commonly employ removal-based and Shapley-based techniques for computing the attribution scores. While AAEs and RAEs have proven useful in several applications with acyclic QBAFs, they remain largely unexplored for cyclic QBAFs. Furthermore, existing applications tend to focus solely on either AAEs or RAEs, but do not compare them directly. In this paper, we apply both AAEs and RAEs, to Truth Discovery QBAFs (TD-QBAFs), which assess the trustworthiness of sources (e.g., websites) and their claims (e.g., the severity of a virus), and feature complex cycles. We find that both AAEs and RAEs can provide interesting explanations and can give non-trivial and surprising insights.

AIMay 18
Evaluating the Utility of Personal Health Records in Personalized Health AI

Rory Sayres, Kejia Chen, Ayush Jain et al.

Patient-managed Personal Health Records (PHRs) promises to empower patients to better understand their health; but information in the record is complex, potentially hindering insights. In this study, we assess the potential of large language models (LLMs, Gemini 3.0 Flash) to provide helpful answers to user health queries, when provided clinical data from PHRs as context. A total of 2,257 user queries were drawn from 3 different distributions to represent patient questions: shorter web search queries, longer questions derived from templates of chatbot conversations, and questions patients asked to their healthcare team (patient calls). Queries were matched with de-identified PHRs (from a pool of 1,945). Gemini responses were generated (1) without PHR context; (2) with a basic summary of demographics, conditions, and medications; (3) with full, extensive clinical notes. For evaluation, we leveraged an existing rating framework (SHARP), and developed a new framework for specific error modes when interpreting PHRs. Evaluation was performed using autoraters for the full set, and with clinician ratings for a subset (n=95), with both sets of raters knowing the full PHR context. We see significant improvements in the helpfulness of answers to all question types with PHR data (p < 0.001, paired t-test). We also observe potential gains in safety, accuracy, relevance and personalization of answers. Our PHR evaluation framework further identifies gaps in LLM understanding of particular aspects of complex PHRs, such as temporal disorientation, and rare but meaningful confabulations. These results suggest potential for PHR data to help people with a wide range of user needs; and provide a framework for monitoring for gaps in LLM answers based on PHR context. This study motivates further work to assess and realize potential benefits to users from understanding their health records.

SYApr 18
Chance-Constrained Neural MPC under Uncontrollable Agents via Sequential Convex Programming

Shuqi Wang, Mingyang Feng, Yu Chen et al.

This work investigates the challenge of ensuring safety guarantees in the presence of uncontrollable agents, whose behaviors are stochastic and depend on both their own and the system's states. We present a neural model predictive control (MPC) framework that predicts the trajectory of the uncontrollable agent using a predictor learned from offline data. To provide formal probabilistic guarantees on prediction errors despite policy-induced distribution shifts, we propose a region-wise robust conformal prediction scheme to construct time-dependent uncertainty bounds, which are integrated into the MPC formulation. To solve the resulting non-convex, discontinuous optimization problem, we propose a two-loop iterative sequential convex programming algorithm. The inner loop solves convexified subproblems with fixed error bounds, while the outer loop refines these bounds based on updated control sequences. We establish convergence guarantees and analyze the optimality of the algorithm. We illustrate our method with an autonomous driving scenario involving interactive pedestrians. Experimental results demonstrate that our approach achieves superior safety and efficiency compared to baseline methods, with success rates exceeding 99.5% while maintaining higher average speeds in multi-pedestrian scenarios.

CVMay 14, 2024Code
MetaFruit Meets Foundation Models: Leveraging a Comprehensive Multi-Fruit Dataset for Advancing Agricultural Foundation Models

Jiajia Li, Kyle Lammers, Xunyuan Yin et al.

Fruit harvesting poses a significant labor and financial burden for the industry, highlighting the critical need for advancements in robotic harvesting solutions. Machine vision-based fruit detection has been recognized as a crucial component for robust identification of fruits to guide robotic manipulation. Despite considerable progress in leveraging deep learning and machine learning techniques for fruit detection, a common shortfall is the inability to swiftly extend the developed models across different orchards and/or various fruit species. Additionally, the limited availability of pertinent data further compounds these challenges. In this work, we introduce MetaFruit, the largest publicly available multi-class fruit dataset, comprising 4,248 images and 248,015 manually labeled instances across diverse U.S. orchards. Furthermore, this study proposes an innovative open-set fruit detection system leveraging advanced Vision Foundation Models (VFMs) for fruit detection that can adeptly identify a wide array of fruit types under varying orchard conditions. This system not only demonstrates remarkable adaptability in learning from minimal data through few-shot learning but also shows the ability to interpret human instructions for subtle detection tasks. The performance of the developed foundation model is comprehensively evaluated using several metrics, which outperforms the existing state-of-the-art algorithms in both our MetaFruit dataset and other open-sourced fruit datasets, thereby setting a new benchmark in the field of agricultural technology and robotic harvesting. The MetaFruit dataset and detection framework are open-sourced to foster future research in vision-based fruit harvesting, marking a significant stride toward addressing the urgent needs of the agricultural sector.

AIMay 2
Zero-Shot Signal Temporal Logic Planning with Disjunctive Branch Selection in Dynamic Semantic Maps

Bowen Ye, Ancheng Hou, Junyue Huang et al.

Signal Temporal Logic (STL) offers verifiable task specifications and is crucial for safety-critical control. Yet STL planning remains challenging: exact optimization-based methods are often too slow, and learning-based methods struggle to generalize across varying environments. We propose a zero-shot STL planning solver for variable-map environments that generates feasible trajectories without retraining. By integrating a map-conditioned Transformer architecture with a lightweight heuristic, our approach effectively handles complex disjunctive (OR) subformulas. Furthermore, we leverage Transitive Reinforcement Learning (TRL) to ensure consistent temporal grounding and logical coherence across decomposed sub-tasks. Experiments on dynamic semantic maps with diverse obstacle layouts demonstrate consistent gains, highlighting the framework's superior zero-shot generalization to changing environments and broad STL coverage.

CVApr 3Code
Can Nano Banana 2 Replace Traditional Image Restoration Models? An Evaluation of Its Performance on Image Restoration Tasks

Weixiong Sun, Xiang Yin, Chao Dong

Recent advances in generative AI raise the question of whether general-purpose image editing models can serve as unified solutions for image restoration. In this work, we conduct a systematic evaluation of Nano Banana 2 for image restoration across diverse scenes and degradation types. Our results show that prompt design plays a critical role, where concise prompts with explicit fidelity constraints achieve the best trade-off between reconstruction accuracy and perceptual quality. Compared with state-of-the-art restoration models, Nano Banana 2 achieves superior performance in full-reference metrics while remaining competitive in perceptual quality, which is further supported by user studies. We also observe strong generalization in challenging scenarios, such as small faces, dense crowds, and severe degradations. However, the model remains sensitive to prompt formulation and may require iterative refinement for optimal results. Overall, our findings suggest that general-purpose generative models hold strong potential as unified image restoration solvers, while highlighting the importance of controllability and robustness. All test results are available on https://github.com/yxyuanxiao/NanoBanana2TestOnIR.

CLDec 1, 2025
Latent Debate: A Surrogate Framework for Interpreting LLM Thinking

Lihu Chen, Xiang Yin, Francesca Toni

Understanding the internal thinking process of Large Language Models (LLMs) and the cause of hallucinations remains a key challenge. To this end, we introduce latent debate, a novel framework for interpreting model predictions through the lens of implicit internal arguments. Unlike the current work of self-consistency and multi-agent debate, which relies on explicit debates among multiple answers or multiple models, latent debate captures the hidden supporting and attacking signals that arise within a single model during a single inference. We first present a model- and task-agnostic conceptual framework, and then instantiate it symbolically to approximate the thinking process of LLMs on True/False prediction tasks. Empirical studies demonstrate that latent debate is a faithful structured surrogate model that has highly consistent predictions with the original LLM. Beyond interpretability, we demonstrate that latent debate provides a strong baseline for hallucination detection. Further analysis reveals strong correlations between hallucinations and debate patterns, such as a high degree of latent debates in the middle layers is linked to a higher risk of hallucinations. These findings position latent debate as a potential framework for understanding internal mechanisms of LLMs, especially for scenarios where internal (dis)agreements appear during the inference steps.

RONov 19, 2025Code
RRT*former: Environment-Aware Sampling-Based Motion Planning using Transformer

Mingyang Feng, Shaoyuan Li, Xiang Yin

We investigate the sampling-based optimal path planning problem for robotics in complex and dynamic environments. Most existing sampling-based algorithms neglect environmental information or the information from previous samples. Yet, these pieces of information are highly informative, as leveraging them can provide better heuristics when sampling the next state. In this paper, we propose a novel sampling-based planning algorithm, called \emph{RRT*former}, which integrates the standard RRT* algorithm with a Transformer network in a novel way. Specifically, the Transformer is used to extract features from the environment and leverage information from previous samples to better guide the sampling process. Our extensive experiments demonstrate that, compared to existing sampling-based approaches such as RRT*, Neural RRT*, and their variants, our algorithm achieves considerable improvements in both the optimality of the path and sampling efficiency. The code for our implementation is available on https://github.com/fengmingyang666/RRTformer.

ROApr 20
DAG-STL: A Hierarchical Framework for Zero-Shot Trajectory Planning under Signal Temporal Logic Specifications

Ruijia Liu, Ancheng Hou, Xiao Yu et al.

Signal Temporal Logic (STL) is a powerful language for specifying temporally structured robotic tasks. Planning executable trajectories under STL constraints remains difficult when system dynamics and environment structure are not analytically available. Existing methods typically either assume explicit models or learn task-specific behaviors, limiting zero-shot generalization to unseen STL tasks. In this work, we study offline STL planning under unknown dynamics using only task-agnostic trajectory data. Our central design philosophy is to separate logical reasoning from trajectory realization. We instantiate this idea in DAG-STL, a hierarchical framework that converts long-horizon STL planning into three stages. It first decomposes an STL formula into reachability and invariance progress conditions linked by shared timing constraints. It then allocates timed waypoints using learned reachability-time estimates. Finally, it synthesizes trajectories between these waypoints with a diffusion-based generator. This decomposition--allocation--generation pipeline reduces global planning to shorter, better-supported subproblems. To bridge the gap between planning-level correctness and execution-level feasibility, we further introduce a rollout-free dynamic consistency metric, an anytime refinement search procedure for improving multiple allocation hypotheses under finite budgets, and a hierarchical online replanning mechanism for execution-time recovery. Experiments in Maze2D, OGBench AntMaze, and the Cube domain show that DAG-STL substantially outperforms direct robustness-guided diffusion on complex long-horizon STL tasks and generalizes across navigation and manipulation settings. In a custom environment with an optimization-based reference, DAG-STL recovers most model-solvable tasks while retaining a clear computational advantage over direct optimization based on the explicit system model.

SYApr 14
Output-Feedback Safe Control of Discrete-Time Stochastic Systems with Chance Constraints

Jianing Zhao, Zhuoting Cai, Xiang Yin

In this paper, we investigate safety-critical control problem of discrete-time stochastic systems with incomplete information, where safety constraints must be enforced using state estimates obtained from noisy measurements. We develop an output-feedback control barrier function (CBF) framework based on an expectation-based discrete-time barrier condition that explicitly incorporates estimation uncertainty through the evolving belief over the state. To enable real-time implementation, we derive deterministic sufficient conditions that conservatively enforce the expectation-based CBF by bounding the expectation with computable functions of the belief statistics using Jensen inequalities. The resulting safety filter is formulated as a tractable optimization problem compatible with standard online controllers. Numerical simulations demonstrate that the proposed output-feedback approach achieves fast online computation while providing reliable safety performance in the presence of process noise and measurement uncertainty.

ASFeb 26, 2025
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis

Ziyue Jiang, Yi Ren, Ruiqi Li et al.

While recent zero-shot text-to-speech (TTS) models have significantly improved speech quality and expressiveness, mainstream systems still suffer from issues related to speech-text alignment modeling: 1) models without explicit speech-text alignment modeling exhibit less robustness, especially for hard sentences in practical applications; 2) predefined alignment-based models suffer from naturalness constraints of forced alignments. This paper introduces \textit{MegaTTS 3}, a TTS system featuring an innovative sparse alignment algorithm that guides the latent diffusion transformer (DiT). Specifically, we provide sparse alignment boundaries to MegaTTS 3 to reduce the difficulty of alignment without limiting the search space, thereby achieving high naturalness. Moreover, we employ a multi-condition classifier-free guidance strategy for accent intensity adjustment and adopt the piecewise rectified flow technique to accelerate the generation process. Experiments demonstrate that MegaTTS 3 achieves state-of-the-art zero-shot TTS speech quality and supports highly flexible control over accent intensity. Notably, our system can generate high-quality one-minute speech with only 8 sampling steps. Audio samples are available at https://sditdemo.github.io/sditdemo/.

CVFeb 7, 2025
HumanDiT: Pose-Guided Diffusion Transformer for Long-form Human Motion Video Generation

Qijun Gan, Yi Ren, Chen Zhang et al.

Human motion video generation has advanced significantly, while existing methods still struggle with accurately rendering detailed body parts like hands and faces, especially in long sequences and intricate motions. Current approaches also rely on fixed resolution and struggle to maintain visual consistency. To address these limitations, we propose HumanDiT, a pose-guided Diffusion Transformer (DiT)-based framework trained on a large and wild dataset containing 14,000 hours of high-quality video to produce high-fidelity videos with fine-grained body rendering. Specifically, (i) HumanDiT, built on DiT, supports numerous video resolutions and variable sequence lengths, facilitating learning for long-sequence video generation; (ii) we introduce a prefix-latent reference strategy to maintain personalized characteristics across extended sequences. Furthermore, during inference, HumanDiT leverages Keypoint-DiT to generate subsequent pose sequences, facilitating video continuation from static images or existing videos. It also utilizes a Pose Adapter to enable pose transfer with given sequences. Extensive experiments demonstrate its superior performance in generating long-form, pose-accurate videos across diverse scenarios.

CLMay 3, 2024
Argumentative Large Language Models for Explainable and Contestable Claim Verification

Gabriel Freedman, Adam Dejl, Deniz Gorur et al.

The profusion of knowledge encoded in large language models (LLMs) and their ability to apply this knowledge zero-shot in a range of settings makes them promising candidates for use in decision-making. However, they are currently limited by their inability to provide outputs which can be faithfully explained and effectively contested to correct mistakes. In this paper, we attempt to reconcile these strengths and weaknesses by introducing \emph{argumentative LLMs (ArgLLMs)}, a method for augmenting LLMs with argumentative reasoning. Concretely, ArgLLMs construct argumentation frameworks, which then serve as the basis for formal reasoning in support of decision-making. The interpretable nature of these argumentation frameworks and formal reasoning means that any decision made by ArgLLMs may be explained and contested. We evaluate ArgLLMs' performance experimentally in comparison with state-of-the-art techniques, in the context of the decision-making task of claim verification. We also define novel properties to characterise contestability and assess ArgLLMs formally in terms of these properties.

AIMay 17, 2024
Contestable AI needs Computational Argumentation

Francesco Leofante, Hamed Ayoobi, Adam Dejl et al.

AI has become pervasive in recent years, but state-of-the-art approaches predominantly neglect the need for AI systems to be contestable. Instead, contestability is advocated by AI guidelines (e.g. by the OECD) and regulation of automated decision-making (e.g. GDPR). In this position paper we explore how contestability can be achieved computationally in and for AI. We argue that contestable AI requires dynamic (human-machine and/or machine-machine) explainability and decision-making processes, whereby machines can (i) interact with humans and/or other machines to progressively explain their outputs and/or their reasoning as well as assess grounds for contestation provided by these humans and/or other machines, and (ii) revise their decision-making processes to redress any issues successfully raised during contestation. Given that much of the current AI landscape is tailored to static AIs, the need to accommodate contestability will require a radical rethinking, that, we argue, computational argumentation is ideally suited to support.

AIApr 22, 2024
Explaining Arguments' Strength: Unveiling the Role of Attacks and Supports (Technical Report)

Xiang Yin, Potyka Nico, Francesca Toni

Quantitatively explaining the strength of arguments under gradual semantics has recently received increasing attention. Specifically, several works in the literature provide quantitative explanations by computing the attribution scores of arguments. These works disregard the importance of attacks and supports, even though they play an essential role when explaining arguments' strength. In this paper, we propose a novel theory of Relation Attribution Explanations (RAEs), adapting Shapley values from game theory to offer fine-grained insights into the role of attacks and supports in quantitative bipolar argumentation towards obtaining the arguments' strength. We show that RAEs satisfy several desirable properties. We also propose a probabilistic algorithm to approximate RAEs efficiently. Finally, we show the application value of RAEs in fraud detection and large language models case studies.

ROJan 23, 2025
Zero-Shot Trajectory Planning for Signal Temporal Logic Tasks

Ruijia Liu, Ancheng Hou, Xiao Yu et al.

Signal Temporal Logic (STL) is a powerful specification language for describing complex temporal behaviors of continuous signals, making it well-suited for high-level robotic task descriptions. However, generating executable plans for STL tasks is challenging, as it requires consideration of the coupling between the task specification and the system dynamics. Existing approaches either follow a model-based setting that explicitly requires knowledge of the system dynamics or adopt a task-oriented data-driven approach to learn plans for specific tasks. In this work, we address the problem of generating executable STL plans for systems with unknown dynamics. We propose a hierarchical planning framework that enables zero-shot generalization to new STL tasks by leveraging only task-agnostic trajectory data during offline training. The framework consists of three key components: (i) decomposing the STL specification into several progresses and time constraints, (ii) searching for timed waypoints that satisfy all progresses under time constraints, and (iii) generating trajectory segments using a pre-trained diffusion model and stitching them into complete trajectories. We formally prove that our method guarantees STL satisfaction, and simulation results demonstrate its effectiveness in generating dynamically feasible trajectories across diverse long-horizon STL tasks.

SYApr 5
Certificates Synthesis for A Class of Observational Properties in Stochastic Systems: A Unified Approach

Bohan Cui, Jianing Zhao, Yu Chen et al.

In this paper, we investigate the probabilistic formal verification of stochastic dynamical systems over continuous state spaces. Motivated by problems in state estimation and information-flow security, we introduce the notion of observational properties, which characterize the inferences an external observer can draw from system outputs. These properties are formulated as probabilistic hyperproperties based on HyperLTL over finite traces, yielding a unified framework that subsumes several existing notions studied separately in the literature. We reduce the verification problem to reachability analysis over an augmented structure that integrates the system dynamics with an automaton representation of the specification. Building on this construction, we develop stochastic barrier certificates that provide probabilistic guarantees for property satisfaction while avoiding explicit state-space discretization. The effectiveness of the proposed framework is demonstrated through a case study.

SYApr 5
Opacity Enforcing Supervisory Control with a Priori Unknown Supervisors

Bohan Cui, Ziyue Ma, Alessandro Giua et al.

We investigate the enforcement of opacity in discrete-event systems via supervisory control. A system is said to be opaque if a passive intruder can never unambiguously infer whether the system is in a secret state through its observations. In this context, the intruder's knowledge about the supervisor plays a critical role in both problem formulation and solvability. Existing studies typically assume that the policy of the supervisor is either fully unknown to the intruder or fully known a priori, the latter leading to severe technical challenges and unresolved problems under incomparable observations. This paper investigates opacity supervisory control under a new intermediate information setting, which we refer to as the a priori unknown supervisor setting. In this setting, the supervisor's internal realization is not publicly available, but the intruder can partially infer its behavior by eavesdropping on the control decisions issued online during system execution. We formalize the intruder's information-flow under both observation-triggered and decision-triggered decision-issuance mechanisms and define the corresponding notions of opacity. We provide sound and complete algorithms for synthesizing opacity-enforcing supervisors without imposing any restrictions on the observable or controllable event sets. By constructing an information-state structure that embeds the supervisor's estimate of the intruder's belief, the synthesis problem is reduced to a safety game. Finally, we show that, under strictly finer intruder observations, the proposed setting coincides with the standard a priori known supervisor model.

AIJul 15, 2025
Contestability in Quantitative Argumentation

Xiang Yin, Nico Potyka, Antonio Rago et al.

Contestable AI requires that AI-driven decisions align with human preferences. While various forms of argumentation have been shown to support contestability, Edge-Weighted Quantitative Bipolar Argumentation Frameworks (EW-QBAFs) have received little attention. In this work, we show how EW-QBAFs can be deployed for this purpose. Specifically, we introduce the contestability problem for EW-QBAFs, which asks how to modify edge weights (e.g., preferences) to achieve a desired strength for a specific argument of interest (i.e., a topic argument). To address this problem, we propose gradient-based relation attribution explanations (G-RAEs), which quantify the sensitivity of the topic argument's strength to changes in individual edge weights, thus providing interpretable guidance for weight adjustments towards contestability. Building on G-RAEs, we develop an iterative algorithm that progressively adjusts the edge weights to attain the desired strength. We evaluate our approach experimentally on synthetic EW-QBAFs that simulate the structural characteristics of personalised recommender systems and multi-layer perceptrons, and demonstrate that it can solve the problem effectively.

ROMar 31
GraSP-STL: A Graph-Based Framework for Zero-Shot Signal Temporal Logic Planning via Offline Goal-Conditioned Reinforcement Learning

Ancheng Hou, Ruijia Liu, Xiang Yin

This paper studies offline, zero-shot planning under Signal Temporal Logic (STL) specifications. We assume access only to an offline dataset of state-action-state transitions collected by a task-agnostic behavior policy, with no analytical dynamics model, no further environment interaction, and no task-specific retraining. The objective is to synthesize a control strategy whose resulting trajectory satisfies an arbitrary unseen STL specification. To this end, we propose GraSP-STL, a graph-search-based framework for zero-shot STL planning from offline trajectories. The method learns a goal-conditioned value function from offline data and uses it to induce a finite-horizon reachability metric over the state space. Based on this metric, it constructs a directed graph abstraction whose nodes represent representative states and whose edges encode feasible short-horizon transitions. Planning is then formulated as a graph search over waypoint sequences, evaluated using arithmetic-geometric mean robustness and its interval semantics, and executed by a learned goal-conditioned policy. The proposed framework separates reusable reachability learning from task-conditioned planning, enabling zero-shot generalization to unseen STL tasks and long-horizon planning through the composition of short-horizon behaviors from offline data. Experimental results demonstrate its effectiveness on a range of offline STL planning tasks.

SYMar 29
On the Computation of Backward Reachable Sets for Max-Plus Linear Systems with Disturbances

Yuda Li, Xiang Yin

This paper investigates one-step backward reachability for uncertain max-plus linear systems with additive disturbances. Given a target set, the problem is to compute the set of states from which there exists an admissible control input such that, for all admissible disturbances, the successor state remains in the target set. This problem is closely related to safety analysis and is challenging due to the high computational complexity of existing approaches. To address this issue, we develop a computational framework based on tropical polyhedra. We assume that the target set, the control set, and the disturbance set are all represented as tropical polyhedra, and study the structural properties of the associated backward operators. In particular, we show that these operators preserve the tropical-polyhedral structure, which enables the constructive computation of reachable sets within the same framework. The proposed approach provides an effective geometric and algebraic tool for reachability analysis of uncertain max-plus linear systems. Illustrative examples are included to demonstrate the proposed method.