Saddek Bensalem

h-index36

21papers

405citations

Novelty40%

AI Score42

Ranked #86,712 of 201,326 authors (top 43%)#19,478 in LG (top 46%)

21 Papers

LOOct 7, 2011

Algorithms for Synthesizing Priorities in Component-based Systems

Chih-Hong Cheng, Saddek Bensalem, Yu-Fang Chen et al.

We present algorithms to synthesize component-based systems that are safe and deadlock-free using priorities, which define stateless-precedence between enabled actions. Our core method combines the concept of fault-localization (using safety-game) and fault-repair (using SAT for conflict resolution). For complex systems, we propose three complementary methods as preprocessing steps for priority synthesis, namely (a) data abstraction to reduce component complexities, (b) alphabet abstraction and #-deadlock to ignore components, and (c) automated assumption learning for compositional priority synthesis.

LGJul 20, 2023

What, Indeed, is an Achievable Provable Guarantee for Learning-Enabled Safety Critical Systems

Saddek Bensalem, Chih-Hong Cheng, Wei Huang et al.

Machine learning has made remarkable advancements, but confidently utilising learning-enabled components in safety-critical domains still poses challenges. Among the challenges, it is known that a rigorous, yet practical, way of achieving safety guarantees is one of the most prominent. In this paper, we first discuss the engineering and research challenges associated with the design and verification of such systems. Then, based on the observation that existing works cannot actually achieve provable guarantees, we promote a two-step verification method for the ultimate achievement of provable statistical guarantees.

SEMay 16, 2022

Prioritizing Corners in OoD Detectors via Symbolic String Manipulation

Chih-Hong Cheng, Changshun Wu, Emmanouil Seferis et al.

For safety assurance of deep neural networks (DNNs), out-of-distribution (OoD) monitoring techniques are essential as they filter spurious input that is distant from the training dataset. This paper studies the problem of systematically testing OoD monitors to avoid cases where an input data point is tested as in-distribution by the monitor, but the DNN produces spurious output predictions. We consider the definition of "in-distribution" characterized in the feature space by a union of hyperrectangles learned from the training dataset. Thus the testing is reduced to finding corners in hyperrectangles distant from the available training data in the feature space. Concretely, we encode the abstract location of every data point as a finite-length binary string, and the union of all binary strings is stored compactly using binary decision diagrams (BDDs). We demonstrate how to use BDDs to symbolically extract corners distant from all data points within the training set. Apart from test case generation, we explain how to use the proposed corners to fine-tune the DNN to ensure that it does not predict overly confidently. The result is evaluated over examples such as number and traffic sign recognition.

SYNov 27, 2012

Distributed Priority Synthesis

Chih-Hong Cheng, Rongjie Yan, Saddek Bensalem et al.

Given a set of interacting components with non-deterministic variable update and given safety requirements, the goal of priority synthesis is to restrict, by means of priorities, the set of possible interactions in such a way as to guarantee the given safety conditions for all possible runs. In distributed priority synthesis we are interested in obtaining local sets of priorities, which are deployed in terms of local component controllers sharing intended next moves between components in local neighborhoods only. These possible communication paths between local controllers are specified by means of a communication architecture. We formally define the problem of distributed priority synthesis in terms of a multi-player safety game between players for (angelically) selecting the next transition of the components and an environment for (demonically) updating uncontrollable variables. We analyze the complexity of the problem, and propose several optimizations including a solution-space exploration based on a diagnosis method using a nested extension of the usual attractor computation in games together with a reduction to corresponding SAT problems. When diagnosis fails, the method proposes potential candidates to guide the exploration. These optimized algorithms for solving distributed priority synthesis problems have been integrated into the VissBIP framework. An experimental validation of this implementation is performed using a range of case studies including scheduling in multicore processors and modular robotics.

LGJun 14, 2023

Towards Rigorous Design of OoD Detectors

Chih-Hong Cheng, Changshun Wu, Harald Ruess et al.

Out-of-distribution (OoD) detection techniques are instrumental for safety-related neural networks. We are arguing, however, that current performance-oriented OoD detection techniques geared towards matching metrics such as expected calibration error, are not sufficient for establishing safety claims. What is missing is a rigorous design approach for developing, verifying, and validating OoD detectors. These design principles need to be aligned with the intended functionality and the operational domain. Here, we formulate some of the key technical challenges, together with a possible way forward, for developing a rigorous and safety-related design methodology for OoD detectors.

CRJun 3, 2024Code

Safeguarding Large Language Models: A Survey

Yi Dong, Ronghui Mu, Yanghao Zhang et al.

In the burgeoning field of Large Language Models (LLMs), developing a robust safety mechanism, colloquially known as "safeguards" or "guardrails", has become imperative to ensure the ethical use of LLMs within prescribed boundaries. This article provides a systematic literature review on the current status of this critical mechanism. It discusses its major challenges and how it can be enhanced into a comprehensive mechanism dealing with ethical issues in various contexts. First, the paper elucidates the current landscape of safeguarding mechanisms that major LLM service providers and the open-source community employ. This is followed by the techniques to evaluate, analyze, and enhance some (un)desirable properties that a guardrail might want to enforce, such as hallucinations, fairness, privacy, and so on. Based on them, we review techniques to circumvent these controls (i.e., attacks), to defend the attacks, and to reinforce the guardrails. While the techniques mentioned above represent the current status and the active research trends, we also discuss several challenges that cannot be easily dealt with by the methods and present our vision on how to implement a comprehensive guardrail through the full consideration of multi-disciplinary approach, neural-symbolic method, and systems development lifecycle.

NINov 21, 2024

FastRAG: Retrieval Augmented Generation for Semi-structured Data

Amar Abane, Anis Bekri, Abdella Battou et al.

Efficiently processing and interpreting network data is critical for the operation of increasingly complex networks. Recent advances in Large Language Models (LLM) and Retrieval-Augmented Generation (RAG) techniques have improved data processing in network management. However, existing RAG methods like VectorRAG and GraphRAG struggle with the complexity and implicit nature of semi-structured technical data, leading to inefficiencies in time, cost, and retrieval. This paper introduces FastRAG, a novel RAG approach designed for semi-structured data. FastRAG employs schema learning and script learning to extract and structure data without needing to submit entire data sources to an LLM. It integrates text search with knowledge graph (KG) querying to improve accuracy in retrieving context-rich information. Evaluation results demonstrate that FastRAG provides accurate question answering, while improving up to 90% in time and 85% in cost compared to GraphRAG.

CVMar 27, 2024

BAM: Box Abstraction Monitors for Real-time OoD Detection in Object Detection

Changshun Wu, Weicheng He, Chih-Hong Cheng et al.

Out-of-distribution (OoD) detection techniques for deep neural networks (DNNs) become crucial thanks to their filtering of abnormal inputs, especially when DNNs are used in safety-critical applications and interact with an open and dynamic environment. Nevertheless, integrating OoD detection into state-of-the-art (SOTA) object detection DNNs poses significant challenges, partly due to the complexity introduced by the SOTA OoD construction methods, which require the modification of DNN architecture and the introduction of complex loss functions. This paper proposes a simple, yet surprisingly effective, method that requires neither retraining nor architectural change in object detection DNN, called Box Abstraction-based Monitors (BAM). The novelty of BAM stems from using a finite union of convex box abstractions to capture the learned features of objects for in-distribution (ID) data, and an important observation that features from OoD data are more likely to fall outside of these boxes. The union of convex regions within the feature space allows the formation of non-convex and interpretable decision boundaries, overcoming the limitations of VOS-like detectors without sacrificing real-time performance. Experiments integrating BAM into Faster R-CNN-based object detection DNNs demonstrate a considerably improved performance against SOTA OoD detection techniques.

LGApr 25, 2024

Runtime Monitoring and Enforcement of Conditional Fairness in Generative AIs

Chih-Hong Cheng, Changshun Wu, Xingyu Zhao et al.

The deployment of generative AI (GenAI) models raises significant fairness concerns, addressed in this paper through novel characterization and enforcement techniques specific to GenAI. Unlike standard AI performing specific tasks, GenAI's broad functionality requires ``conditional fairness'' tailored to the context being generated, such as demographic fairness in generating images of poor people versus successful business leaders. We define two fairness levels: the first evaluates fairness in generated outputs, independent of prompts and models; the second assesses inherent fairness with neutral prompts. Given the complexity of GenAI and challenges in fairness specifications, we focus on bounding the worst case, considering a GenAI system unfair if the distance between appearances of a specific group exceeds preset thresholds. We also explore combinatorial testing for assessing relative completeness in intersectional fairness. By bounding the worst case, we develop a prompt injection scheme within an agent-based framework to enforce conditional fairness with minimal intervention, validated on state-of-the-art GenAI systems.

ROFeb 14, 2024

A Digital Twin prototype for traffic sign recognition of a learning-enabled autonomous vehicle

Mohamed AbdElSalam, Loai Ali, Saddek Bensalem et al.

In this paper, we present a novel digital twin prototype for a learning-enabled self-driving vehicle. The primary objective of this digital twin is to perform traffic sign recognition and lane keeping. The digital twin architecture relies on co-simulation and uses the Functional Mock-up Interface and SystemC Transaction Level Modeling standards. The digital twin consists of four clients, i) a vehicle model that is designed in Amesim tool, ii) an environment model developed in Prescan, iii) a lane-keeping controller designed in Robot Operating System, and iv) a perception and speed control module developed in the formal modeling language of BIP (Behavior, Interaction, Priority). These clients interface with the digital twin platform, PAVE360-Veloce System Interconnect (PAVE360-VSI). PAVE360-VSI acts as the co-simulation orchestrator and is responsible for synchronization, interconnection, and data exchange through a server. The server establishes connections among the different clients and also ensures adherence to the Ethernet protocol. We conclude with illustrative digital twin simulations and recommendations for future work.

NISep 26, 2025

Bridging Language Models and Formal Methods for Intent-Driven Optical Network Design

Anis Bekri, Amar Abane, Abdella Battou et al.

Intent-Based Networking (IBN) aims to simplify network management by enabling users to specify high-level goals that drive automated network design and configuration. However, translating informal natural-language intents into formally correct optical network topologies remains challenging due to inherent ambiguity and lack of rigor in Large Language Models (LLMs). To address this, we propose a novel hybrid pipeline that integrates LLM-based intent parsing, formal methods, and Optical Retrieval-Augmented Generation (RAG). By enriching design decisions with domain-specific optical standards and systematically incorporating symbolic reasoning and verification techniques, our pipeline generates explainable, verifiable, and trustworthy optical network designs. This approach significantly advances IBN by ensuring reliability and correctness, essential for mission-critical networking tasks.

LGSep 19, 2025

Randomized Smoothing Meets Vision-Language Models

Emmanouil Seferis, Changshun Wu, Stefanos Kollias et al.

Randomized smoothing (RS) is one of the prominent techniques to ensure the correctness of machine learning models, where point-wise robustness certificates can be derived analytically. While RS is well understood for classification, its application to generative models is unclear, since their outputs are sequences rather than labels. We resolve this by connecting generative outputs to an oracle classification task and showing that RS can still be enabled: the final response can be classified as a discrete action (e.g., service-robot commands in VLAs), as harmful vs. harmless (content moderation or toxicity detection in VLMs), or even applying oracles to cluster answers into semantically equivalent ones. Provided that the error rate for the oracle classifier comparison is bounded, we develop the theory that associates the number of samples with the corresponding robustness radius. We further derive improved scaling laws analytically relating the certified radius and accuracy to the number of samples, showing that the earlier result of 2 to 3 orders of magnitude fewer samples sufficing with minimal loss remains valid even under weaker assumptions. Together, these advances make robustness certification both well-defined and computationally feasible for state-of-the-art VLMs, as validated against recent jailbreak-style adversarial attacks.

LGJun 1, 2025

LoRA-BAM: Input Filtering for Fine-tuned LLMs via Boxed Abstraction Monitors over LoRA Layers

Changshun Wu, Tianyi Duan, Saddek Bensalem et al.

Fine-tuning large language models (LLMs) improves performance on domain-specific tasks but can lead to overfitting, making them unreliable on out-of-distribution (OoD) queries. We propose LoRA-BAM - a method that adds OoD detection monitors to the LoRA layer using boxed abstraction to filter questions beyond the model's competence. Feature vectors from the fine-tuning data are extracted via the LLM and clustered. Clusters are enclosed in boxes; a question is flagged as OoD if its feature vector falls outside all boxes. To improve interpretability and robustness, we introduce a regularization loss during fine-tuning that encourages paraphrased questions to stay close in the feature space, and the enlargement of the decision boundary is based on the feature variance within a cluster. Our method complements existing defenses by providing lightweight and interpretable OoD detection.

CVMar 10, 2025

Revisiting Out-of-Distribution Detection in Real-time Object Detection: From Benchmark Pitfalls to a New Mitigation Paradigm

Changshun Wu, Weicheng He, Chih-Hong Cheng et al.

Out-of-distribution (OoD) inputs pose a persistent challenge to deep learning models, often triggering overconfident predictions on non-target objects. While prior work has primarily focused on refining scoring functions and adjusting test-time thresholds, such algorithmic improvements offer only incremental gains. We argue that a rethinking of the entire development lifecycle is needed to mitigate these risks effectively. This work addresses two overlooked dimensions of OoD detection in object detection. First, we reveal fundamental flaws in widely used evaluation benchmarks: contrary to their design intent, up to 13% of objects in the OoD test sets actually belong to in-distribution classes, and vice versa. These quality issues severely distort the reported performance of existing methods and contribute to their high false positive rates. Second, we introduce a novel training-time mitigation paradigm that operates independently of external OoD detectors. Instead of relying solely on post-hoc scoring, we fine-tune the detector using a carefully synthesized OoD dataset that semantically resembles in-distribution objects. This process shapes a defensive decision boundary by suppressing objectness on OoD objects, leading to a 91% reduction in hallucination error of a YOLO model on BDD-100K. Our methodology generalizes across detection paradigms such as YOLO, Faster R-CNN, and RT-DETR, and supports few-shot adaptation. Together, these contributions offer a principled and effective way to reduce OoD-induced hallucination in object detectors. Code and data are available at: https://gricad-gitlab.univ-grenoble-alpes.fr/dnn-safety/m-hood.

AIMay 19, 2023

A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation

Xiaowei Huang, Wenjie Ruan, Wei Huang et al.

Large Language Models (LLMs) have exploded a new heatwave of AI for their ability to engage end-users in human-level conversations with detailed and articulate answers across many knowledge domains. In response to their fast adoption in many industrial applications, this survey concerns their safety and trustworthiness. First, we review known vulnerabilities and limitations of the LLMs, categorising them into inherent issues, attacks, and unintended bugs. Then, we consider if and how the Verification and Validation (V&V) techniques, which have been widely developed for traditional software and deep learning models such as convolutional neural networks as independent processes to check the alignment of their implementations against the specifications, can be integrated and further extended throughout the lifecycle of the LLMs to provide rigorous analysis to the safety and trustworthiness of LLMs and their applications. Specifically, we consider four complementary techniques: falsification and evaluation, verification, runtime monitoring, and regulations and ethical use. In total, 370+ references are considered to support the quick understanding of the safety and trustworthiness issues from the perspective of V&V. While intensive research has been conducted to identify the safety and trustworthiness issues, rigorous yet practical methods are called for to ensure the alignment of LLMs with safety and trustworthiness requirements.

LGApr 25, 2021

Customizable Reference Runtime Monitoring of Neural Networks using Resolution Boxes

Changshun Wu, Yliès Falcone, Saddek Bensalem

Classification neural networks fail to detect inputs that do not fall inside the classes they have been trained for. Runtime monitoring techniques on the neuron activation pattern can be used to detect such inputs. We present an approach for monitoring classification systems via data abstraction. Data abstraction relies on the notion of box with a resolution. Box-based abstraction consists in representing a set of values by its minimal and maximal values in each dimension. We augment boxes with a notion of resolution and define their clustering coverage, which is intuitively a quantitative metric that indicates the abstraction quality. This allows studying the effect of different clustering parameters on the constructed boxes and estimating an interval of sub-optimal parameters. Moreover, we automatically construct monitors that leverage both the correct and incorrect behaviors of a system. This allows checking the size of the monitor abstractions and analyzing the separability of the network. Monitors are obtained by combining the sub-monitors of each class of the system placed at some selected layers. Our experiments demonstrate the effectiveness of our clustering coverage estimation and show how to assess the effectiveness and precision of monitors according to the selected clustering parameter and monitored layers.

SENov 1, 2020

Institution-based Encoding and Verification of Simple UML State Machines in CASL/SPASS

Tobias Rosenberger, Saddek Bensalem, Alexander Knapp et al.

This paper provides the first correct semantical representation of UML state-machines within the logical framework of an institution (previous attempts were flawed). A novel encoding of this representation into first-order logic enables symbolic analyses through a multitude of theorem-provers. UML state-machines are central to model-based systems-engineering. Till now, state-machine analysis has been mostly restricted to model checking, which for state-machines suffers heavily from the state-space explosion problem. Symbolic reasoning, as enabled and demonstrated here, provides a powerful alternative, which can deal with large or even infinite state spaces. Full proofs are given.

SEJun 25, 2018

Proceedings of the 1st International Workshop on Methods and Tools for Rigorous System Design

Simon Bliudze, Saddek Bensalem

This volume contains the proceedings of the 1st International Workshop on Methods and Tools for Rigorous System Design (MeTRiD 2018), held on the 15th of April, 2018 in Thessaloniki, Greece as part of ETAPS 2018, the European Joint Conferences on Theory and Practice of Software. The term "Rigorous System Design" (RSD) denotes the design approach that is based on a formal, accountable and iterative process for deriving trustworthy and optimised implementations from models of application software, its execution platform and its external environment. Ideally, a system implementation is derived from a set of appropriate high-level models by applying a sequence of semantics-preserving transformations. The ambition of MeTRiD is to promote the use of formal methods, in general, and the RSD approach, in particular, in the industrial applications and, reciprocally, bring the attention of the formal methods researchers to such industrial applications in order to develop realistic case-studies and guide the evolution of tools. Striving towards this ambitious goal, we have solicited contributions of three types: - regular papers, presenting original research - case study papers, reporting the evaluation of existing modelling, analysis, transformation and code generation formalisms and tools on realistic examples of significant size - tool papers, describing new tool prototypes supporting the RSD flow and enhancements of existing ones We have received 13 submissions (7 regular, 4 tool and 2 case study papers), whereof 8 have been accepted for presentation at the workshop: - 5 regular papers - 2 tool papers - 1 case study paper In this volume, these papers are complemented by an invited paper by Joseph Sifakis.

SEMay 15, 2017

Monitoring Distributed Component-Based Systems

Hosein Nazarpour, Yliès Falcone, Mohamad Jaber et al.

This paper addresses the online monitoring of distributed component-based systems with multi-party interactions against user-provided properties expressed in linear-temporal logic and referring to global states. We consider intrinsically independent components whose interactions are partitioned on distributed controllers. In this context, the problem that arises is that a global state of the system is not available to the monitor. Instead, we attach local controllers to schedulers to retrieve the concurrent local traces. Local traces are sent to a global observer which reconstructs the set of global traces that are compatible with the local ones, in a concurrency-preserving fashion. In this context, the reconstruction of the global traces is done on-the-fly using a lattice of partial states encoding the global traces compatible with the locally-observed traces. We implemented our monitoring approach in a prototype tool called RVDIST. RVDIST executes in parallel with the distributed model and takes as input the events generated from each scheduler and outputs the evaluated computation lattice. Our experiments show that, thanks to the optimisation applied in the online monitoring algorithm, i) the size of the constructed computation lattice is insensitive to the the number of received events, (ii) the lattice size is kept reasonable and (iii) the overhead of the monitoring process is cheap.

SEDec 19, 2016

Concurrency-Preserving and Sound Monitoring of Multi-Threaded Component-Based Systems

Hosein Nazarpour, Yliès Falcone, Saddek Bensalem et al.

This paper addresses the monitoring of logic-independent linear-time user-provided properties in multi-threaded component-based systems. We consider intrinsically independent components that can be executed concurrently with a centralized coordination for multiparty interactions. In this context, the problem that arises is that a global state of the system is not available to the monitor. A naive solution to this problem would be to plug in a monitor which would force the system to synchronize in order to obtain the sequence of global states at runtime. Such a solution would defeat the whole purpose of having concurrent components. Instead, we reconstruct on-the-fly the global states by accumulating the partial states traversed by the system at runtime. We define transformations of components that preserve their semantics and concurrency and, at the same time, allow to monitor global-state properties. Moreover, we present RVMT-BIP, a prototype tool implementing the transformations for monitoring multi-threaded systems described in the BIP (Behavior, Interaction, Priority) framework, an expressive framework for the formal construction of heterogeneous systems. Our experiments on several multi-threaded BIP systems show that RVMT-BIP induces a cheap runtime overhead.

ROSep 2, 2013

A Verifiable and Correct-by-Construction Controller for Robot Functional Levels

Saddek Bensalem, Lavindra de Silva, Félix Ingrand et al.

Autonomous robots are complex systems that require the interaction and cooperation between numerous heterogeneous software components. In recent times, robots are being increasingly used for complex and safety-critical tasks, such as exploring Mars and assisting/replacing humans. Consequently, robots are becoming critical systems that must meet safety properties, in particular, logical, temporal and real-time constraints. To this end, we present an evolution of the LAAS architecture for autonomous systems, in particular its GenoM tool. This evolution relies on the BIP component-based design framework, which has been successfully used in other domains such as embedded systems. We show how we integrate BIP into our existing methodology for developing the lowest (functional) level of robots. Particularly, we discuss the componentization of the functional level, the synthesis of an execution controller for it, and how we verify whether the resulting functional level conforms to properties such as deadlock-freedom. We also show through experimentation that the verification is feasible and usable for complex, real world robotic systems, and that the BIP-based functional levels resulting from our new methodology are, despite an overhead during execution, still practical on real world robotic platforms. Our approach has been fully implemented in the LAAS architecture, and the implementation has been used in several experiments on a real robot.