Fuyuki Ishikawa

SE
h-index48
13papers
100citations
Novelty43%
AI Score43

13 Papers

LGMar 1, 2022
NeuRecover: Regression-Controlled Repair of Deep Neural Networks with Training History

Shogo Tokui, Susumu Tokumoto, Akihito Yoshii et al.

Systematic techniques to improve quality of deep neural networks (DNNs) are critical given the increasing demand for practical applications including safety-critical ones. The key challenge comes from the little controllability in updating DNNs. Retraining to fix some behavior often has a destructive impact on other behavior, causing regressions, i.e., the updated DNN fails with inputs correctly handled by the original one. This problem is crucial when engineers are required to investigate failures in intensive assurance activities for safety or trust. Search-based repair techniques for DNNs have potentials to tackle this challenge by enabling localized updates only on "responsible parameters" inside the DNN. However, the potentials have not been explored to realize sufficient controllability to suppress regressions in DNN repair tasks. In this paper, we propose a novel DNN repair method that makes use of the training history for judging which DNN parameters should be changed or not to suppress regressions. We implemented the method into a tool called NeuRecover and evaluated it with three datasets. Our method outperformed the existing method by achieving often less than a quarter, even a tenth in some cases, number of regressions. Our method is especially effective when the repair requirements are tight to fix specific failure types. In such cases, our method showed stably low rates (<2%) of regressions, which were in many cases a tenth of regressions caused by retraining.

NEMar 6, 2023
Using a Variational Autoencoder to Learn Valid Search Spaces of Safely Monitored Autonomous Robots for Last-Mile Delivery

Peter J. Bentley, Soo Ling Lim, Paolo Arcaini et al.

The use of autonomous robots for delivery of goods to customers is an exciting new way to provide a reliable and sustainable service. However, in the real world, autonomous robots still require human supervision for safety reasons. We tackle the realworld problem of optimizing autonomous robot timings to maximize deliveries, while ensuring that there are never too many robots running simultaneously so that they can be monitored safely. We assess the use of a recent hybrid machine-learningoptimization approach COIL (constrained optimization in learned latent space) and compare it with a baseline genetic algorithm for the purposes of exploring variations of this problem. We also investigate new methods for improving the speed and efficiency of COIL. We show that only COIL can find valid solutions where appropriate numbers of robots run simultaneously for all problem variations tested. We also show that when COIL has learned its latent representation, it can optimize 10% faster than the GA, making it a good choice for daily re-optimization of robots where delivery requests for each day are allocated to robots while maintaining safe numbers of robots running at once.

LGMay 14, 2022
Practical Insights of Repairing Model Problems on Image Classification

Akihito Yoshii, Susumu Tokumoto, Fuyuki Ishikawa

Additional training of a deep learning model can cause negative effects on the results, turning an initially positive sample into a negative one (degradation). Such degradation is possible in real-world use cases due to the diversity of sample characteristics. That is, a set of samples is a mixture of critical ones which should not be missed and less important ones. Therefore, we cannot understand the performance by accuracy alone. While existing research aims to prevent a model degradation, insights into the related methods are needed to grasp their benefits and limitations. In this talk, we will present implications derived from a comparison of methods for reducing degradation. Especially, we formulated use cases for industrial settings in terms of arrangements of a data set. The results imply that a practitioner should care about better method continuously considering dataset availability and life cycle of an AI system because of a trade-off between accuracy and preventing degradation.

ROMar 18
Safety Case Patterns for VLA-based driving systems: Insights from SimLingo

Gerhard Yu, Fuyuki Ishikawa, Oluwafemi Odu et al.

Vision-Language-Action (VLA)-based driving systems represent a significant paradigm shift in autonomous driving since, by combining traffic scene understanding, linguistic interpretation, and action generation, these systems enable more flexible, adaptive, and instruction-responsive driving behaviors. However, despite their growing adoption and potential to support socially responsible autonomous driving as well as understanding high-level human instructions, VLA-based driving systems may exhibit new types of hazardous behaviors. For instance, the integration of open-ended natural language inputs (e.g., user or navigation instructions) into the multimodal control loop, may lead to unpredictable and unsafe behaviors that could endanger vehicle occupants and pedestrians. Hence, assuring the safety of these systems is crucial to help build trust in their operations. To support this, we propose a novel safety case design approach called RAISE. Our approach introduces novel patterns tailored to instruction-based driving systems such as VLA-based driving systems, an extension of Hazard Analysis and Risk Assessment (HARA) detailing safe scenarios and their outcomes, and a design technique to create the safety cases of VLA-based driving systems. A case study on SimLingo illustrates how our approach can be used to construct rigorous, evidence-based safety claims for this emerging class of autonomous driving systems.

NEJan 31, 2024
SCAPE: Searching Conceptual Architecture Prompts using Evolution

Soo Ling Lim, Peter J Bentley, Fuyuki Ishikawa

Conceptual architecture involves a highly creative exploration of novel ideas, often taken from other disciplines as architects consider radical new forms, materials, textures and colors for buildings. While today's generative AI systems can produce remarkable results, they lack the creativity demonstrated for decades by evolutionary algorithms. SCAPE, our proposed tool, combines evolutionary search with generative AI, enabling users to explore creative and good quality designs inspired by their initial input through a simple point and click interface. SCAPE injects randomness into generative AI, and enables memory, making use of the built-in language skills of GPT-4 to vary prompts via text-based mutation and crossover. We demonstrate that compared to DALL-E 3, SCAPE enables a 67% improvement in image novelty, plus improvements in quality and effectiveness of use; we show that in just three iterations SCAPE has a 24% image novelty increase enabling effective exploration, plus optimization of images by users. We use more than 20 independent architects to assess SCAPE, who provide markedly positive feedback.

LGAug 11, 2025
FairFLRep: Fairness aware fault localization and repair of Deep Neural Networks

Moses Openja, Paolo Arcaini, Foutse Khomh et al.

Deep neural networks (DNNs) are being utilized in various aspects of our daily lives, including high-stakes decision-making applications that impact individuals. However, these systems reflect and amplify bias from the data used during training and testing, potentially resulting in biased behavior and inaccurate decisions. For instance, having different misclassification rates between white and black sub-populations. However, effectively and efficiently identifying and correcting biased behavior in DNNs is a challenge. This paper introduces FairFLRep, an automated fairness-aware fault localization and repair technique that identifies and corrects potentially bias-inducing neurons in DNN classifiers. FairFLRep focuses on adjusting neuron weights associated with sensitive attributes, such as race or gender, that contribute to unfair decisions. By analyzing the input-output relationships within the network, FairFLRep corrects neurons responsible for disparities in predictive quality parity. We evaluate FairFLRep on four image classification datasets using two DNN classifiers, and four tabular datasets with a DNN model. The results show that FairFLRep consistently outperforms existing methods in improving fairness while preserving accuracy. An ablation study confirms the importance of considering fairness during both fault localization and repair stages. Our findings also show that FairFLRep is more efficient than the baseline approaches in repairing the network.

SEMar 10, 2025
An Experience Report on Regression-Free Repair of Deep Neural Network Model

Takao Nakagawa, Susumu Tokumoto, Shogo Tokui et al.

Systems based on Deep Neural Networks (DNNs) are increasingly being used in industry. In the process of system operation, DNNs need to be updated in order to improve their performance. When updating DNNs, systems used in companies that require high reliability must have as few regressions as possible. Since the update of DNNs has a data-driven nature, it is difficult to suppress regressions as expected by developers. This paper identifies the requirements for DNN updating in industry and presents a case study using techniques to meet those requirements. In the case study, we worked on satisfying the requirement to update models trained on car images collected in Fujitsu assuming security applications without regression for a specific class. We were able to suppress regression by customizing the objective function based on NeuRecover, a DNN repair technique. Moreover, we discuss some of the challenges identified in the case study.

AISep 3, 2025
Situating AI Agents in their World: Aspective Agentic AI for Dynamic Partially Observable Information Systems

Peter J. Bentley, Soo Ling Lim, Fuyuki Ishikawa

Agentic LLM AI agents are often little more than autonomous chatbots: actors following scripts, often controlled by an unreliable director. This work introduces a bottom-up framework that situates AI agents in their environment, with all behaviors triggered by changes in their environments. It introduces the notion of aspects, similar to the idea of umwelt, where sets of agents perceive their environment differently to each other, enabling clearer control of information. We provide an illustrative implementation and show that compared to a typical architecture, which leaks up to 83% of the time, aspective agentic AI enables zero information leakage. We anticipate that this concept of specialist agents working efficiently in their own information niches can provide improvements to both security and efficiency.

CVJan 30, 2025
CLEAR: Cue Learning using Evolution for Accurate Recognition Applied to Sustainability Data Extraction

Peter J. Bentley, Soo Ling Lim, Fuyuki Ishikawa

Large Language Model (LLM) image recognition is a powerful tool for extracting data from images, but accuracy depends on providing sufficient cues in the prompt - requiring a domain expert for specialized tasks. We introduce Cue Learning using Evolution for Accurate Recognition (CLEAR), which uses a combination of LLMs and evolutionary computation to generate and optimize cues such that recognition of specialized features in images is improved. It achieves this by auto-generating a novel domain-specific representation and then using it to optimize suitable textual cues with a genetic algorithm. We apply CLEAR to the real-world task of identifying sustainability data from interior and exterior images of buildings. We investigate the effects of using a variable-length representation compared to fixed-length and show how LLM consistency can be improved by refactoring from categorical to real-valued estimates. We show that CLEAR enables higher accuracy compared to expert human recognition and human-authored prompts in every task with error rates improved by up to two orders of magnitude and an ablation study evincing solution concision.

SEAug 17, 2021
Robustifying Controller Specifications of Cyber-Physical Systems Against Perceptual Uncertainty

Tsutomu Kobayashi, Rick Salay, Ichiro Hasuo et al.

Formal reasoning on the safety of controller systems interacting with plants is complex because developers need to specify behavior while taking into account perceptual uncertainty. To address this, we propose an automated workflow that takes an Event-B model of an uncertainty-unaware controller and a specification of uncertainty as input. First, our workflow automatically injects the uncertainty into the original model to obtain an uncertainty-aware but potentially unsafe controller. Then, it automatically robustifies the controller so that it satisfies safety even under the uncertainty. The case study shows how our workflow helps developers to explore multiple levels of perceptual uncertainty. We conclude that our workflow makes design and analysis of uncertainty-aware controller systems easier and more systematic.

SEJul 22, 2021
Architecture-Guided Test Resource Allocation Via Logic

Clovis Eberhart, Akihisa Yamada, Stefan Klikovits et al.

We introduce a new logic named Quantitative Confidence Logic (QCL) that quantifies the level of confidence one has in the conclusion of a proof. By translating a fault tree representing a system's architecture to a proof, we show how to use QCL to give a solution to the test resource allocation problem that takes the given architecture into account. We implemented a tool called Astrahl and compared our results to other testing resource allocation strategies.

SEOct 2, 2019
A Mutation-based Approach for Assessing Weight Coverage of a Path Planner

Thomas Laurent, Paolo Arcaini, Fuyuki Ishikawa et al.

Autonomous cars are subjected to several different kind of inputs (other cars, road structure, etc.) and, therefore, testing the car under all possible conditions is impossible. To tackle this problem, scenario-based testing for automated driving defines categories of different scenarios that should be covered. Although this kind of coverage is a necessary condition, it still does not guarantee that any possible behaviour of the autonomous car is tested. In this paper, we consider the path planner of an autonomous car that decides, at each timestep, the short-term path to follow in the next few seconds; such decision is done by using a weighted cost function that considers different aspects (safety, comfort, etc.). In order to assess whether all the possible decisions that can be taken by the path planner are covered by a given test suite T, we propose a mutation-based approach that mutates the weights of the cost function and then checks if at least one scenario of T kills the mutant. Preliminary experiments on a manually designed test suite show that some weights are easier to cover as they consider aspects that more likely occur in a scenario, and that more complicated scenarios (that generate more complex paths) are those that allow to cover more weights.

CYJul 31, 2019
Adapting SQuaRE for Quality Assessment of Artificial Intelligence Systems

Hiroshi Kuwajima, Fuyuki Ishikawa

More and more software practitioners are tackling towards industrial applications of artificial intelligence (AI) systems, especially those based on machine learning (ML). However, many of existing principles and approaches to traditional systems do not work effectively for the system behavior obtained by training not by logical design. In addition, unique kinds of requirements are emerging such as fairness and explainability. To provide clear guidance to understand and tackle these difficulties, we present an analysis on what quality concepts we should evaluate for AI systems. We base our discussion on ISO/IEC 25000 series, known as SQuaRE, and identify how it should be adapted for the unique nature of ML and $\textit{Ethics guidelines for trustworthy AI}$ from European Commission. We thus provide holistic insights for quality of AI systems by incorporating the ML nature and AI ethics to the traditional software quality concepts.