Marsha Chechik

SE
h-index7
9papers
79citations
Novelty54%
AI Score46

9 Papers

SEJun 2
Automated Repair of Requirements for Cyber-Physical Systems in Simulink Requirements Tables

Aren A. Babikian, Alessio Di Sandro, Federico Formica et al.

The development of complex software systems, e.g., cyber-physical systems (CPSs), involves continuous evolution of both system implementations and their requirements. These two artifacts often proceed independently, creating a risk of misalignment. For example, a system may be updated due to implementation-level concerns, yielding a new version that no longer satisfies its original requirements. Traditional compliance recovery techniques, e.g., automated program repair, address this problem by modifying the system while assuming that requirements are correct. However, faulty, outdated or inadequate requirements are a well-documented challenge in practice, motivating the complementary task of requirement repair. In this paper, we propose a framework that leverages system execution data to repair misaligned CPS requirements, thereby restoring requirement-to-system compliance. Our approach evaluates the correctness of declarative requirements over time-based, real-valued signals expressed using the MATLAB Simulink Requirements Tables language. We evaluate seven variants of our framework on six real-world case studies covering 12 requirements. Results confirm the effectiveness of the proposed framework in producing correct and useful repaired requirements.

AIMar 12
Social, Legal, Ethical, Empathetic and Cultural Norm Operationalisation for AI Agents

Radu Calinescu, Ana Cavalcanti, Marsha Chechik et al.

As AI agents are increasingly used in high-stakes domains like healthcare and law enforcement, aligning their behaviour with social, legal, ethical, empathetic, and cultural (SLEEC) norms has become a critical engineering challenge. While international frameworks have established high-level normative principles for AI, a significant gap remains in translating these abstract principles into concrete, verifiable requirements. To address this gap, we propose a systematic SLEEC-norm operationalisation process for determining, validating, implementing, and verifying normative requirements. Furthermore, we survey the landscape of methods and tools supporting this process, and identify key remaining challenges and research avenues for addressing them. We thus establish a framework - and define a research and policy agenda - for developing AI agents that are not only functionally useful but also demonstrably aligned with human norms and values.

CVFeb 29, 2024
Assessing Visually-Continuous Corruption Robustness of Neural Networks Relative to Human Performance

Huakun Shen, Boyue Caroline Hu, Krzysztof Czarnecki et al.

While Neural Networks (NNs) have surpassed human accuracy in image classification on ImageNet, they often lack robustness against image corruption, i.e., corruption robustness. Yet such robustness is seemingly effortless for human perception. In this paper, we propose visually-continuous corruption robustness (VCR) -- an extension of corruption robustness to allow assessing it over the wide and continuous range of changes that correspond to the human perceptive quality (i.e., from the original image to the full distortion of all perceived visual information), along with two novel human-aware metrics for NN evaluation. To compare VCR of NNs with human perception, we conducted extensive experiments on 14 commonly used image corruptions with 7,718 human participants and state-of-the-art robust NN models with different training objectives (e.g., standard, adversarial, corruption robustness), different architectures (e.g., convolution NNs, vision transformers), and different amounts of training data augmentation. Our study showed that: 1) assessing robustness against continuous corruption can reveal insufficient robustness undetected by existing benchmarks; as a result, 2) the gap between NN and human robustness is larger than previously known; and finally, 3) some image corruptions have a similar impact on human perception, offering opportunities for more cost-effective robustness assessments. Our validation set with 14 image corruptions, human robustness data, and the evaluation code is provided as a toolbox and a benchmark.

SEFeb 8, 2022
If a Human Can See It, So Should Your System: Reliability Requirements for Machine Vision Components

Boyue Caroline Hu, Lina Marsso, Krzysztof Czarnecki et al.

Machine Vision Components (MVC) are becoming safety-critical. Assuring their quality, including safety, is essential for their successful deployment. Assurance relies on the availability of precisely specified and, ideally, machine-verifiable requirements. MVCs with state-of-the-art performance rely on machine learning (ML) and training data but largely lack such requirements. In this paper, we address the need for defining machine-verifiable reliability requirements for MVCs against transformations that simulate the full range of realistic and safety-critical changes in the environment. Using human performance as a baseline, we define reliability requirements as: 'if the changes in an image do not affect a human's decision, neither should they affect the MVC's.' To this end, we provide: (1) a class of safety-related image transformations; (2) reliability requirement classes to specify correctness-preservation and prediction-preservation for MVCs; (3) a method to instantiate machine-verifiable requirements from these requirements classes using human performance experiment data; (4) human performance experiment data for image recognition involving eight commonly used transformations, from about 2000 human participants; and (5) a method for automatically checking whether an MVC satisfies our requirements. Further, we show that our reliability requirements are feasible and reusable by evaluating our methods on 13 state-of-the-art pre-trained image classification models. Finally, we demonstrate that our approach detects reliability gaps in MVCs that other existing methods are unable to detect.

SEJul 16, 2021
Applying Declarative Analysis to Software Product Line Models: An Industrial Study

Ramy Shahin, Robert Hackman, Rafael Toledo et al.

Software Product Lines (SPLs) are families of related software products developed from a common set of artifacts. Most existing analysis tools can be applied to a single product at a time, but not to an entire SPL. Some tools have been redesigned/re-implemented to support the kind of variability exhibited in SPLs, but this usually takes a lot of effort, and is error-prone. Declarative analyses written in languages like Datalog have been collectively lifted to SPLs in prior work, which makes the process of applying an existing declarative analysis to a product line more straightforward. In this paper, we take an existing declarative analysis (behaviour alteration) written in the Grok declarative language, port it to Datalog, and apply it to a set of automotive software product lines from General Motors. We discuss the design of the analysis pipeline used in this process, present its scalability results, and provide a means to visualize the analysis results for a subset of products filtered by feature expression. We also reflect on some of the lessons learned throughout this project.

SEApr 30, 2021
Towards Certified Analysis of Software Product Line Safety Cases

Ramy Shahin, Sahar Kokaly, Marsha Chechik

Safety-critical software systems are in many cases designed and implemented as families of products, usually referred to as Software Product Lines (SPLs). Products within an SPL vary from each other in terms of which features they include. Applying existing analysis techniques to SPLs and their safety cases is usually challenging because of the potentially exponential number of products with respect to the number of supported features. In this paper, we present a methodology and infrastructure for certified \emph{lifting} of existing single-product safety analyses to product lines. To ensure certified safety of our infrastructure, we implement it in an interactive theorem prover, including formal definitions, lemmas, correctness criteria theorems, and proofs. We apply this infrastructure to formalize and lift a Change Impact Assessment (CIA) algorithm. We present a formal definition of the lifted algorithm, outline its correctness proof (with the full machine-checked proof available online), and discuss its implementation within a model management framework.

PLOct 1, 2020
Automatic and Efficient Variability-Aware Lifting of Functional Programs

Ramy Shahin, Marsha Chechik

A software analysis is a computer program that takes some representation of a software product as input and produces some useful information about that product as output. A software product line encompasses \emph{many} software product variants, and thus existing analyses can be applied to each of the product variations individually, but not to the entire product line as a whole. Enumerating all product variants and analyzing them one by one is usually intractable due to the combinatorial explosion of the number of product variants with respect to product line features. Several software analyses (e.g., type checkers, model checkers, data flow analyses) have been redesigned/re-implemented to support variability. This usually requires a lot of time and effort, and the variability-aware version of the analysis might have new errors/bugs that do not exist in the original one. Given an analysis program written in a functional language based on PCF, in this paper we present two approaches to transforming (lifting) it into a semantically equivalent variability-aware analysis. A light-weight approach (referred to as \emph{shallow lifting}) wraps the analysis program into a variability-aware version, exploring all combinations of its input arguments. Deep lifting, on the other hand, is a program rewriting mechanism where the syntactic constructs of the input program are rewritten into their variability-aware counterparts. Compositionally this results in an efficient program semantically equivalent to the input program, modulo variability. We present the correctness criteria for functional program lifting, together with correctness proof sketches of our program transformations. We evaluate our approach on a set of program analyses applied to the BusyBox C-language product line.

PLDec 9, 2019
Variability-aware Datalog

Ramy Shahin, Marsha Chechik

Variability-aware computing is the efficient application of programs to different sets of inputs that exhibit some variability. One example is program analyses applied to Software Product Lines (SPLs). In this paper we present the design and development of a variability-aware version of the Soufflé Datalog engine. The engine can take facts annotated with Presence Conditions (PCs) as input, and compute the PCs of its inferred facts, eliminating facts that do not exist in any valid configuration. We evaluate our variability-aware Soufflé implementation on several fact sets annotated with PCs to measure the associated overhead in terms of processing time and database size.

SEJul 4, 2019
Lifting Datalog-Based Analyses to Software Product Lines

Ramy Shahin, Marsha Chechik, Rick Salay

Applying program analyses to Software Product Lines (SPLs) has been a fundamental research problem at the intersection of Product Line Engineering and software analysis. Different attempts have been made to "lift" particular product-level analyses to run on the entire product line. In this paper, we tackle the class of Datalog-based analyses (e.g., pointer and taint analyses), study the theoretical aspects of lifting Datalog inference, and implement a lifted inference algorithm inside the Soufflé Datalog engine. We evaluate our implementation on a set of benchmark product lines. We show significant savings in processing time and fact database size (billions of times faster on one of the benchmarks) compared to brute-force analysis of each product individually.