Eunsuk Kang

SE
h-index10
18papers
87citations
Novelty49%
AI Score53

18 Papers

SEApr 16Code
Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility

Yining Hong, Yining She, Eunsuk Kang et al. · cmu

AI agents that interact with their environments through tools enable powerful applications, but in high-stakes business settings, unintended actions can cause unacceptable harm, such as privacy breaches and financial loss. Existing mitigations, such as training-based methods and neural guardrails, improve agent reliability but cannot provide guarantees. We study symbolic guardrails as a practical path toward strong safety and security guarantees for AI agents. Our three-part study includes a systematic review of 80 state-of-the-art agent safety and security benchmarks to identify the policies they evaluate, an analysis of which policy requirements can be guaranteed by symbolic guardrails, and an evaluation of how symbolic guardrails affect safety, security, and agent success on $τ^2$-Bench, CAR-bench, and MedAgentBench. We find that 85\% of benchmarks lack concrete policies, relying instead on underspecified high-level goals or common sense. Among the specified policies, 74\% of policy requirements can be enforced by symbolic guardrails, often using simple, low-cost mechanisms. These guardrails improve safety and security without sacrificing agent utility. Overall, our results suggest that symbolic guardrails are a practical and effective way to guarantee some safety and security requirements, especially for domain-specific AI agents. We release all codes and artifacts at https://github.com/hyn0027/agent-symbolic-guardrails.

CRMay 29
FASR: Automated Identification of Unsafe Control Actions in STPA

Ian Dardik, Yining She, Sam Procter et al.

The System-Theoretic Process Analysis (STPA) is a well-established hazard analysis technique that has been applied to a wide range of safety-critical systems. Despite its popularity, there is relatively little automation support for STPA, and most of its steps are carried out manually by a human analyst, which can be time consuming and error prone. This paper investigates the potential use of model-based engineering and formal methods to assist human analysts in efficiently and accurately carrying out STPA. The proposed tool, called FASR (Formalizing and Automating STPA with Robustness), enables automated, complete identification of unsafe control actions (UCAs), leveraging recent advances in robustness analysis to identify UCAs as undesirable deviations in the controller's actions. The use of the tool is demonstrated on a case study involving a Braking System Control Unit (BSCU) in an avionics system. As a preliminary exploration of the potential benefits and limitations of the tool, the paper reports on a user study involving nine participants with varying backgrounds in STPA, model-based engineering, and formal methods; the study found that most participants considered the tool a useful aid in identifying UCAs, while suggesting improvements that would make a tool such as FASR usable and applicable to a wider range of systems and analysts.

LGMay 12
Runtime Monitoring of Perception-Based Autonomous Systems via Embedding Temporal Logic

Parv Kapoor, Abigail Hammer, Ashish Kapoor et al.

Runtime monitoring of autonomous systems traditionally relies on mapping continuous sensor observations to discrete logical propositions defined over low-dimensional state variables. This abstraction breaks down in perception-driven settings, where such mappings require additional learned modules that are often computationally expensive, brittle, and semantically misaligned. In this work, we propose Embedding Temporal Logic (ETL), a temporal logic that performs monitoring directly in learned embedding spaces. ETL defines predicates through distances between observed embeddings and target embeddings derived from reference observations. This formulation allows specifications to capture high-level perceptual concepts, such as similarity to visual goals or avoidance of semantic regions, that are difficult or impossible to express using traditional predicates. By composing these predicates with temporal operators, ETL naturally expresses temporally extended and sequential perceptual behaviors. We introduce ETL monitors for evaluating specifications over bounded embedding traces, along with a conformal calibration procedure that provides reliable and safety-oriented predicate evaluation. We evaluate our approach across multiple manipulation environments to show that ETL achieves strong empirical agreement with ground-truth semantics, including accurate monitoring of temporally composed behaviors.

ROJan 8, 2025
STLCG++: A Masking Approach for Differentiable Signal Temporal Logic Specification

Parv Kapoor, Kazuki Mizuta, Eunsuk Kang et al.

Signal Temporal Logic (STL) offers a concise yet expressive framework for specifying and reasoning about spatio-temporal behaviors of robotic systems. Attractively, STL admits the notion of robustness, the degree to which an input signal satisfies or violates an STL specification, thus providing a nuanced evaluation of system performance. In particular, the differentiability of STL robustness enables direct integration to robotic workflows that rely on gradient-based optimization, such as trajectory optimization and deep learning. However, existing approaches to evaluating and differentiating STL robustness rely on recurrent computations, which become inefficient with longer sequences, limiting their use in time-sensitive applications. In this paper, we present STLCG++, a masking-based approach that parallelizes STL robustness evaluation and backpropagation across timesteps, \revised{achieving more than 1000$\times$ faster computation time than the recurrent approach (STLCG++).}{achieving significant speed-ups compared to a recurrent approach.} We also introduce a smoothing technique to enable the differentiation of time interval bounds, thereby expanding STL's applicability in gradient-based optimization tasks involving spatial and temporal variables. Finally, we demonstrate STLCG++'s benefits through three robotics use cases and provide JAX and PyTorch libraries for seamless integration into modern robotics workflows. Project website with demo and code: https://uw-ctrl.github.io/stlcg/.

LGJan 3, 2025
FairSense: Long-Term Fairness Analysis of ML-Enabled Systems

Yining She, Sumon Biswas, Christian Kästner et al.

Algorithmic fairness of machine learning (ML) models has raised significant concern in the recent years. Many testing, verification, and bias mitigation techniques have been proposed to identify and reduce fairness issues in ML models. The existing methods are model-centric and designed to detect fairness issues under static settings. However, many ML-enabled systems operate in a dynamic environment where the predictive decisions made by the system impact the environment, which in turn affects future decision-making. Such a self-reinforcing feedback loop can cause fairness violations in the long term, even if the immediate outcomes are fair. In this paper, we propose a simulation-based framework called FairSense to detect and analyze long-term unfairness in ML-enabled systems. Given a fairness requirement, FairSense performs Monte-Carlo simulation to enumerate evolution traces for each system configuration. Then, FairSense performs sensitivity analysis on the space of possible configurations to understand the impact of design options and environmental factors on the long-term fairness of the system. We demonstrate FairSense's potential utility through three real-world case studies: Loan lending, opioids risk scoring, and predictive policing.

ROSep 1, 2025
Constrained Decoding for Robotics Foundation Models

Parv Kapoor, Akila Ganlath, Michael Clifford et al.

Recent advances in the development of robotic foundation models have led to promising end-to-end and general-purpose capabilities in robotic systems. Trained on vast datasets of simulated and real-world trajectories, these models map multimodal observations directly to action sequences for physical execution. Despite promising real-world capabilities, these models are still data-driven and, therefore, lack explicit notions of behavioral correctness. We address this gap by introducing SafeDec, a constrained decoding framework for autoregressive, robot foundation models that enforces invariant safety specifications on candidate action trajectories. Task-specific safety rules are expressed as Signal Temporal Logic (STL) formulas and are enforced at inference time with minimal overhead. Our method ensures that generated actions provably satisfy STL specifications under assumed dynamics at runtime without retraining , while remaining agnostic of the underlying policy. We evaluate SafeDec on tasks from the CHORES benchmark for state-of-the-art generalist policies (e.g., SPOC, Flare, PoliFormer) across hundreds of procedurally generated environments and show that our decoding-time interventions are useful not only for filtering unsafe actions but also for conditional action generation. Videos are available at constrained-robot-fms.github.io.

ROMay 6, 2025
Demonstrating ViSafe: Vision-enabled Safety for High-speed Detect and Avoid

Parv Kapoor, Ian Higgins, Nikhil Keetha et al.

Assured safe-separation is essential for achieving seamless high-density operation of airborne vehicles in a shared airspace. To equip resource-constrained aerial systems with this safety-critical capability, we present ViSafe, a high-speed vision-only airborne collision avoidance system. ViSafe offers a full-stack solution to the Detect and Avoid (DAA) problem by tightly integrating a learning-based edge-AI framework with a custom multi-camera hardware prototype designed under SWaP-C constraints. By leveraging perceptual input-focused control barrier functions (CBF) to design, encode, and enforce safety thresholds, ViSafe can provide provably safe runtime guarantees for self-separation in high-speed aerial operations. We evaluate ViSafe's performance through an extensive test campaign involving both simulated digital twins and real-world flight scenarios. By independently varying agent types, closure rates, interaction geometries, and environmental conditions (e.g., weather and lighting), we demonstrate that ViSafe consistently ensures self-separation across diverse scenarios. In first-of-its-kind real-world high-speed collision avoidance tests with closure rates reaching 144 km/h, ViSafe sets a new benchmark for vision-only autonomous collision avoidance, establishing a new standard for safety in high-speed aerial navigation.

AIMar 3, 2025
Pretrained Embeddings as a Behavior Specification Mechanism

Parv Kapoor, Abigail Hammer, Ashish Kapoor et al.

We propose an approach to formally specifying the behavioral properties of systems that rely on a perception model for interactions with the physical world. The key idea is to introduce embeddings -- mathematical representations of a real-world concept -- as a first-class construct in a specification language, where properties are expressed in terms of distances between a pair of ideal and observed embeddings. To realize this approach, we propose a new type of temporal logic called Embedding Temporal Logic (ETL), and describe how it can be used to express a wider range of properties about AI-enabled systems than previously possible. We demonstrate the applicability of ETL through a preliminary evaluation involving planning tasks in robots that are driven by foundation models; the results are promising, showing that embedding-based specifications can be used to steer a system towards desirable behaviors.

DBMar 31
Reasoning about Transactional Isolation Levels with Isolde

Manuel Barros, Alcino Cunha, Jose Pereira et al.

Most databases can be configured to operate under isolation levels weaker than serializability. These enforce fewer restrictions on the concurrent access to data and consequently allow for more performant implementations. While formal frameworks for rigorously specifying isolation levels exist, reasoning about the semantic differences between specifications remains notoriously difficult. This paper proposes a tool -- Isolde -- that can automatically generate examples that are allowed by an isolation level but disallowed by another. This simple primitive unlocks a range of useful reasoning tasks, including checking equivalence between definitions, and verifying (by refutation) subtle semantic properties of isolation levels. For example, Isolde allowed us to easily and automatically reproduce a famously elusive result from the literature and to discover a previously unknown bug in the alternative specification of a standard isolation level used in a state-of-the-art isolation checker.

CLOct 6, 2025
RAG Makes Guardrails Unsafe? Investigating Robustness of Guardrails under RAG-style Contexts

Yining She, Daniel W. Peterson, Marianne Menglin Liu et al.

With the increasing adoption of large language models (LLMs), ensuring the safety of LLM systems has become a pressing concern. External LLM-based guardrail models have emerged as a popular solution to screen unsafe inputs and outputs, but they are themselves fine-tuned or prompt-engineered LLMs that are vulnerable to data distribution shifts. In this paper, taking Retrieval Augmentation Generation (RAG) as a case study, we investigated how robust LLM-based guardrails are against additional information embedded in the context. Through a systematic evaluation of 3 Llama Guards and 2 GPT-oss models, we confirmed that inserting benign documents into the guardrail context alters the judgments of input and output guardrails in around 11% and 8% of cases, making them unreliable. We separately analyzed the effect of each component in the augmented context: retrieved documents, user query, and LLM-generated response. The two mitigation methods we tested only bring minor improvements. These results expose a context-robustness gap in current guardrails and motivate training and evaluation protocols that are robust to retrieval and query composition.

SYJun 24, 2024
Tolerance of Reinforcement Learning Controllers against Deviations in Cyber Physical Systems

Changjian Zhang, Parv Kapoor, Eunsuk Kang et al.

Cyber-physical systems (CPS) with reinforcement learning (RL)-based controllers are increasingly being deployed in complex physical environments such as autonomous vehicles, the Internet-of-Things(IoT), and smart cities. An important property of a CPS is tolerance; i.e., its ability to function safely under possible disturbances and uncertainties in the actual operation. In this paper, we introduce a new, expressive notion of tolerance that describes how well a controller is capable of satisfying a desired system requirement, specified using Signal Temporal Logic (STL), under possible deviations in the system. Based on this definition, we propose a novel analysis problem, called the tolerance falsification problem, which involves finding small deviations that result in a violation of the given requirement. We present a novel, two-layer simulation-based analysis framework and a novel search heuristic for finding small tolerance violations. To evaluate our approach, we construct a set of benchmark problems where system parameters can be configured to represent different types of uncertainties and disturbancesin the system. Our evaluation shows that our falsification approach and heuristic can effectively find small tolerance violations.

SEDec 12, 2021
A Game-Theoretical Self-Adaptation Framework for Securing Software-Intensive Systems

Mingyue Zhang, Nianyu Li, Sridhar Adepu et al.

The increasing prevalence of security attacks on software-intensive systems calls for new, effective methods for detecting and responding to these attacks. As one promising approach, game theory provides analytical tools for modeling the interaction between the system and the adversarial environment and designing reliable defense. In this paper, we propose an approach for securing software-intensive systems using a rigorous game-theoretical framework. First, a self-adaptation framework is deployed on a component-based software intensive system, which periodically monitors the system for anomalous behaviors. A learning-based method is proposed to detect possible on-going attacks on the system components and predict potential threats to components. Then, an algorithm is designed to automatically build a \emph{Bayesian game} based on the system architecture (of which some components might have been compromised) once an attack is detected, in which the system components are modeled as independent players in the game. Finally, an optimal defensive policy is computed by solving the Bayesian game to achieve the best system utility, which amounts to minimizing the impact of the attack. We conduct two sets of experiments on two general benchmark tasks for security domain. Moreover, we systematically present a case study on a real-world water treatment testbed, i.e. the Secure Water Treatment System. Experiment results show the applicability and the effectiveness of our approach.

SEJul 29, 2021
Counterexample Classification

Cole Vick, Eunsuk Kang, Stavros Tripakis

In model checking, when a given model fails to satisfy the desired specification, a typical model checker provides a counterexample that illustrates how the violation occurs. In general, there exist many diverse counterexamples that exhibit distinct violating behaviors, which the user may wish to examine before deciding how to repair the model. Unfortunately, obtaining this information is challenging in existing model checkers since (1) the number of counterexamples may be too large to enumerate one by one, and (2) many of these counterexamples are redundant, in that they describe the same type of violating behavior. In this paper, we propose a technique called counterexample classification. The goal of classification is to partition the space of all counterexamples into a finite set of counterexample classes, each of which describes a distinct type of violating behavior for the given specification. These classes are then presented as a summary of possible violating behaviors in the system, freeing the user from manually having to inspect or analyze numerous counterexamples to extract the same information. We have implemented a prototype of our technique on top of an existing formal modeling and verification tool, the Alloy Analyzer, and evaluated the effectiveness of the technique on case studies involving the well-known Needham-Schroeder protocol with promising results.

SEMay 13, 2021
Feature Interactions on Steroids: On the Composition of ML Models

Christian Kästner, Eunsuk Kang, Sven Apel

The lack of specifications is a key difference between traditional software engineering and machine learning. We discuss how it drastically impacts how we think about divide-and-conquer approaches to system design, and how it impacts reuse, testing and debugging activities. Traditionally, specifications provide a cornerstone for compositional reasoning and for the divide-and-conquer strategy of how we build large and complex systems from components, but those are hard to come by for machine-learned components. While the lack of specification seems like a fundamental new problem at first sight, in fact software engineers routinely deal with iffy specifications in practice: we face weak specifications, wrong specifications, and unanticipated interactions among components and their specifications. Machine learning may push us further, but the problems are not fundamentally new. Rethinking machine-learning model composition from the perspective of the feature interaction problem, we may even teach us a thing or two on how to move forward, including the importance of integration testing, of requirements engineering, and of design.

AIAug 17, 2020
Runtime-Safety-Guided Policy Repair

Weichao Zhou, Ruihan Gao, BaekGyu Kim et al.

We study the problem of policy repair for learning-based control policies in safety-critical settings. We consider an architecture where a high-performance learning-based control policy (e.g. one trained as a neural network) is paired with a model-based safety controller. The safety controller is endowed with the abilities to predict whether the trained policy will lead the system to an unsafe state, and take over control when necessary. While this architecture can provide added safety assurances, intermittent and frequent switching between the trained policy and the safety controller can result in undesirable behaviors and reduced performance. We propose to reduce or even eliminate control switching by `repairing' the trained policy based on runtime data produced by the safety controller in a way that deviates minimally from the original policy. The key idea behind our approach is the formulation of a trajectory optimization problem that allows the joint reasoning of policy update and safety constraints. Experimental results demonstrate that our approach is effective even when the system model in the safety controller is unknown and only approximated.

SEJan 18, 2020
Teaching Software Engineering for AI-Enabled Systems

Christian Kästner, Eunsuk Kang

Software engineers have significant expertise to offer when building intelligent systems, drawing on decades of experience and methods for building systems that are scalable, responsive and robust, even when built on unreliable components. Systems with artificial-intelligence or machine-learning (ML) components raise new challenges and require careful engineering. We designed a new course to teach software-engineering skills to students with a background in ML. We specifically go beyond traditional ML courses that teach modeling techniques under artificial conditions and focus, in lecture and assignments, on realism with large and changing datasets, robust and evolvable infrastructure, and purposeful requirements engineering that considers ethics and fairness as well. We describe the course and our infrastructure and share experience and all material from teaching the course for the first time.

LGJan 30, 2019
Reliable Smart Road Signs

Muhammed O. Sayin, Chung-Wei Lin, Eunsuk Kang et al.

In this paper, we propose a game theoretical adversarial intervention detection mechanism for reliable smart road signs. A future trend in intelligent transportation systems is ``smart road signs" that incorporate smart codes (e.g., visible at infrared) on their surface to provide more detailed information to smart vehicles. Such smart codes make road sign classification problem aligned with communication settings more than conventional classification. This enables us to integrate well-established results in communication theory, e.g., error-correction methods, into road sign classification problem. Recently, vision-based road sign classification algorithms have been shown to be vulnerable against (even) small scale adversarial interventions that are imperceptible for humans. On the other hand, smart codes constructed via error-correction methods can lead to robustness against small scale intelligent or random perturbations on them. In the recognition of smart road signs, however, humans are out of the loop since they cannot see or interpret them. Therefore, there is no equivalent concept of imperceptible perturbations in order to achieve a comparable performance with humans. Robustness against small scale perturbations would not be sufficient since the attacker can attack more aggressively without such a constraint. Under a game theoretical solution concept, we seek to ensure certain measure of guarantees against even the worst case (intelligent) attackers that can perturb the signal even at large scale. We provide a randomized detection strategy based on the distance between the decoder output and the received input, i.e., error rate. Finally, we examine the performance of the proposed scheme over various scenarios.

SEMay 10, 2017
Automated Synthesis of Secure Platform Mappings

Eunsuk Kang, Stephane Lafortune, Stavros Tripakis

System development often involves decisions about how a high-level design is to be implemented using primitives from a low-level platform. Certain decisions, however, may introduce undesirable behavior into the resulting implementation, possibly leading to a violation of a desired property that has already been established at the design level. In this paper, we introduce the problem of synthesizing a property-preserving platform mapping: A set of implementation decisions ensuring that a desired property is preserved from a high-level design into a low-level platform implementation. We provide a formalization of the synthesis problem and propose a technique for synthesizing a mapping based on symbolic constraint search. We describe our prototype implementation, and a real-world case study demonstrating the application of our technique to synthesizing secure mappings for the popular web authorization protocols OAuth 1.0 and 2.0.