h-index6
16papers
61citations
Novelty48%
AI Score44

16 Papers

AIAug 15, 2022
Sound and Relatively Complete Belief Hoare Logic for Statistical Hypothesis Testing Programs

Yusuke Kawamoto, Tetsuya Sato, Kohei Suenaga

We propose a new approach to formally describing the requirement for statistical inference and checking whether a program uses the statistical method appropriately. Specifically, we define belief Hoare logic (BHL) for formalizing and reasoning about the statistical beliefs acquired via hypothesis testing. This program logic is sound and relatively complete with respect to a Kripke model for hypothesis tests. We demonstrate by examples that BHL is useful for reasoning about practical issues in hypothesis testing. In our framework, we clarify the importance of prior beliefs in acquiring statistical beliefs through hypothesis testing, and discuss the whole picture of the justification of statistical inference inside and outside the program logic.

SEJul 15, 2023
Probabilistic Black-Box Checking via Active MDP Learning

Junya Shijubo, Masaki Waga, Kohei Suenaga

We introduce a novel methodology for testing stochastic black-box systems, frequently encountered in embedded systems. Our approach enhances the established black-box checking (BBC) technique to address stochastic behavior. Traditional BBC primarily involves iteratively identifying an input that breaches the system's specifications by executing the following three phases: the learning phase to construct an automaton approximating the black box's behavior, the synthesis phase to identify a candidate counterexample from the learned automaton, and the validation phase to validate the obtained candidate counterexample and the learned automaton against the original black-box system. Our method, ProbBBC, refines the conventional BBC approach by (1) employing an active Markov Decision Process (MDP) learning method during the learning phase, (2) incorporating probabilistic model checking in the synthesis phase, and (3) applying statistical hypothesis testing in the validation phase. ProbBBC uniquely integrates these techniques rather than merely substituting each method in the traditional BBC; for instance, the statistical hypothesis testing and the MDP learning procedure exchange information regarding the black-box system's observation with one another. The experiment results suggest that ProbBBC outperforms an existing method, especially for systems with limited observation.

AIOct 30, 2022
Formalizing Statistical Causality via Modal Logic

Yusuke Kawamoto, Tetsuya Sato, Kohei Suenaga

We propose a formal language for describing and explaining statistical causality. Concretely, we define Statistical Causality Language (StaCL) for expressing causal effects and specifying the requirements for causal inference. StaCL incorporates modal operators for interventions to express causal properties between probability distributions in different possible worlds in a Kripke model. We formalize axioms for probability distributions, interventions, and causal predicates using StaCL formulas. These axioms are expressive enough to derive the rules of Pearl's do-calculus. Finally, we demonstrate by examples that StaCL can be used to specify and explain the correctness of statistical causal inference.

CVOct 31, 2022
BOREx: Bayesian-Optimization--Based Refinement of Saliency Map for Image- and Video-Classification Models

Atsushi Kikuchi, Kotaro Uchida, Masaki Waga et al.

Explaining a classification result produced by an image- and video-classification model is one of the important but challenging issues in computer vision. Many methods have been proposed for producing heat-map--based explanations for this purpose, including ones based on the white-box approach that uses the internal information of a model (e.g., LRP, Grad-CAM, and Grad-CAM++) and ones based on the black-box approach that does not use any internal information (e.g., LIME, SHAP, and RISE). We propose a new black-box method BOREx (Bayesian Optimization for Refinement of visual model Explanation) to refine a heat map produced by any method. Our observation is that a heat-map--based explanation can be seen as a prior for an explanation method based on Bayesian optimization. Based on this observation, BOREx conducts Gaussian process regression (GPR) to estimate the saliency of each pixel in a given image starting from the one produced by another explanation method. Our experiments statistically demonstrate that the refinement by BOREx improves low-quality heat maps for image- and video-classification results.

PLApr 24
Ownership Refinement Types for Pointer Arithmetic and Nested Arrays

Yusuke Fujiwara, Yusuke Matsushita, Kohei Suenaga et al.

Tanaka et al. proposed a type system for verifying functional correctness properties of programs that use arrays and pointer arithmetic. Their system extends ConSORT -- a type system combining fractional ownership and refinement types for imperative program verification -- with support for pointer arithmetic. Their idea was to extend fractional ownership so that it can depend on an array index. Their formulation, however, does not handle nested arrays, which are essential for representing practical data structures such as matrices. We extend Tanaka et al.'s type system to support nested arrays by generalizing the notion of ownership to be able to refer to the indices of the outer arrays and prove the soundness of the extended type system. We have implemented a verifier based on the proposed type system and demonstrated that it can verify the correctness of programs that manipulate nested arrays, which were beyond the reach of Tanaka et al.

CLFeb 11
SoftMatcha 2: A Fast and Soft Pattern Matcher for Trillion-Scale Corpora

Masataka Yoneda, Yusuke Matsushita, Go Kamoda et al.

We present an ultra-fast and flexible search algorithm that enables search over trillion-scale natural language corpora in under 0.3 seconds while handling semantic variations (substitution, insertion, and deletion). Our approach employs string matching based on suffix arrays that scales well with corpus size. To mitigate the combinatorial explosion induced by the semantic relaxation of queries, our method is built on two key algorithmic ideas: fast exact lookup enabled by a disk-aware design, and dynamic corpus-aware pruning. We theoretically show that the proposed method suppresses exponential growth in the search space with respect to query length by leveraging statistical properties of natural language. In experiments on FineWeb-Edu (Lozhkov et al., 2024) (1.4T tokens), we show that our method achieves significantly lower search latency than existing methods: infini-gram (Liu et al., 2024), infini-gram mini (Xu et al., 2025), and SoftMatcha (Deguchi et al., 2025). As a practical application, we demonstrate that our method identifies benchmark contamination in training corpora, unidentified by existing approaches. We also provide an online demo of fast, soft search across corpora in seven languages.

CLMar 5, 2025
SoftMatcha: A Soft and Fast Pattern Matcher for Billion-Scale Corpus Searches

Hiroyuki Deguchi, Go Kamoda, Yusuke Matsushita et al.

Researchers and practitioners in natural language processing and computational linguistics frequently observe and analyze the real language usage in large-scale corpora. For that purpose, they often employ off-the-shelf pattern-matching tools, such as grep, and keyword-in-context concordancers, which is widely used in corpus linguistics for gathering examples. Nonetheless, these existing techniques rely on surface-level string matching, and thus they suffer from the major limitation of not being able to handle orthographic variations and paraphrasing -- notable and common phenomena in any natural language. In addition, existing continuous approaches such as dense vector search tend to be overly coarse, often retrieving texts that are unrelated but share similar topics. Given these challenges, we propose a novel algorithm that achieves \emph{soft} (or semantic) yet efficient pattern matching by relaxing a surface-level matching with word embeddings. Our algorithm is highly scalable with respect to the size of the corpus text utilizing inverted indexes. We have prepared an efficient implementation, and we provide an accessible web tool. Our experiments demonstrate that the proposed method (i) can execute searches on billion-scale corpora in less than a second, which is comparable in speed to surface-level string matching and dense vector search; (ii) can extract harmful instances that semantically match queries from a large set of English and Japanese Wikipedia articles; and (iii) can be effectively applied to corpus-linguistic analyses of Latin, a language with highly diverse inflections.

PLAug 30, 2021
HELMHOLTZ: A Verifier for Tezos Smart Contracts Based on Refinement Types

Yuki Nishida, Hiromasa Saito, Ran Chen et al.

A smart contract is a program executed on a blockchain, based on which many cryptocurrencies are implemented, and is being used for automating transactions. Due to the large amount of money that smart contracts deal with, there is a surging demand for a method that can statically and formally verify them. This article describes our type-based static verification tool HELMHOLTZ for Michelson, which is a statically typed stack-based language for writing smart contracts that are executed on the blockchain platform Tezos. HELMHOLTZ is designed on top of our extension of Michelson's type system with refinement types. HELMHOLTZ takes a Michelson program annotated with a user-defined specification written in the form of a refinement type as input; it then typechecks the program against the specification based on the refinement type system, discharging the generated verification conditions with the SMT solver Z3. We briefly introduce our refinement type system for the core calculus Mini-Michelson of Michelson, which incorporates the characteristic features such as compound datatypes (e.g., lists and pairs), higher-order functions, and invocation of another contract. \HELMHOLTZ{} successfully verifies several practical Michelson programs, including one that transfers money to an account and that checks a digital signature.

AIJul 16, 2021
Learning Heuristics for Template-based CEGIS of Loop Invariants with Reinforcement Learning

Minchao Wu, Takeshi Tsukada, Hiroshi Unno et al.

Loop-invariant synthesis is the basis of program verification. Due to the undecidability of the problem in general, a tool for invariant synthesis necessarily uses heuristics. Despite the common belief that the design of heuristics is vital for the performance of a synthesizer, heuristics are often engineered by their developers based on experience and intuition, sometimes in an \emph{ad-hoc} manner. In this work, we propose an approach to systematically learning heuristics for template-based CounterExample-Guided Inductive Synthesis (CEGIS) with reinforcement learning. As a concrete example, we implement the approach on top of PCSat, which is an invariant synthesizer based on template-based CEGIS. Experiments show that PCSat guided by the heuristics learned by our framework not only outperforms existing state-of-the-art CEGIS-based solvers such as HoICE and the neural solver Code2Inv, but also has slight advantages over non-CEGIS-based solvers such as Eldarica and Spacer in linear Constrained Horn Clause (CHC) solving.

PLJun 9, 2021
Verification of a Merkle Patricia Tree Library Using F*

Sota Sato, Ryotaro Banno, Jun Furuse et al.

A Merkle tree is a data structure for representing a key-value store as a tree. Each node of a Merkle tree is equipped with a hash value computed from those of their descendants. A Merkle tree is often used for representing a state of a blockchain system since it can be used for efficiently auditing the state in a trustless manner. Due to the safety-critical nature of blockchains, ensuring the correctness of their implementation is paramount. We show our formally verified implementation of the core part of Plebeia using F*. Plebeia is a library to manipulate an extension of Merkle trees (called Plebeia trees). It is being implemented as a part of the storage system of the Tezos blockchain system. To this end, we gradually ported Plebeia to F*; the OCaml code extracted from the modules ported to F* is linked with the unverified part of Plebeia. By this gradual porting process, we can obtain a working code from our partially verified implementation of Plebeia; we confirmed that the binary passes all the unit tests of Plebeia. More specifically, we verified the following properties on the implementation of Plebeia: (1) Each tree-manipulating function preserves the invariants on the data structure of a Plebeia tree and satisfies the functional requirements as a nested key-value store; (2) Each function for serializing/deserializing a Plebeia tree to/from the low-level storage is implemented correctly; and (3) The hash function for a Plebeia tree is relatively collision-resistant with respect to the cryptographic safety of the blake2b hash function. During porting Plebeia to F*, we found a bug in an old version of Plebeia, which was overlooked by the tests bundled with the original implementation. To the best of our knowledge, this is the first work that verifies a production-level implementation of a Merkle-tree library by F*.

LGJan 5, 2021
Control-Data Separation and Logical Condition Propagation for Efficient Inference on Probabilistic Programs

Ichiro Hasuo, Yuichiro Oyabu, Clovis Eberhart et al.

We present a novel sampling framework for probabilistic programs. The framework combines two recent ideas -- \emph{control-data separation} and \emph{logical condition propagation} -- in a nontrivial manner so that the two ideas boost the benefits of each other. We implemented our algorithm on top of Anglican. The experimental results demonstrate our algorithm's efficiency, especially for programs with while loops and rare observations.

CVOct 6, 2020
Visualizing Color-wise Saliency of Black-Box Image Classification Models

Yuhki Hatakeyama, Hiroki Sakuma, Yoshinori Konishi et al.

Image classification based on machine learning is being commonly used. However, a classification result given by an advanced method, including deep learning, is often hard to interpret. This problem of interpretability is one of the major obstacles in deploying a trained model in safety-critical systems. Several techniques have been proposed to address this problem; one of which is RISE, which explains a classification result by a heatmap, called a saliency map, which explains the significance of each pixel. We propose MC-RISE (Multi-Color RISE), which is an enhancement of RISE to take color information into account in an explanation. Our method not only shows the saliency of each pixel in a given image as the original RISE does, but the significance of color components of each pixel; a saliency map with color information is useful especially in the domain where the color information matters (e.g., traffic-sign recognition). We implemented MC-RISE and evaluate them using two datasets (GTSRB and ImageNet) to demonstrate the effectiveness of our methods in comparison with existing techniques for interpreting image classification results.

PLOct 9, 2019
Generalized Property-Directed Reachability for Hybrid Systems

Kohei Suenaga, Takuya Ishizawa

Generalized property-directed reachability (GPDR) belongs to the family of the model-checking techniques called IC3/PDR. It has been successfully applied to software verification; for example, it is the core of Spacer, a state-of-the-art Horn-clause solver bundled with Z3. However, it has yet to be applied to hybrid systems, which involve a continuous evolution of values over time. As the first step towards GPDR- based model checking for hybrid systems, this paper formalizes HGPDR, an adaptation of GPDR to hybrid systems, and proves its soundness. We also implemented a semi-automated proof-of-concept verifier, which allows a user to provide hints to guide verification steps.

AIMay 30, 2018
Automated proof synthesis for propositional logic with deep neural networks

Taro Sekiyama, Kohei Suenaga

This work explores the application of deep learning, a machine learning technique that uses deep neural networks (DNN) in its core, to an automated theorem proving (ATP) problem. To this end, we construct a statistical model which quantifies the likelihood that a proof is indeed a correct one of a given proposition. Based on this model, we give a proof-synthesis procedure that searches for a proof in the order of the likelihood. This procedure uses an estimator of the likelihood of an inference rule being applied at each step of a proof. As an implementation of the estimator, we propose a proposition-to-proof architecture, which is a DNN tailored to the automated proof synthesis problem. To empirically demonstrate its usefulness, we apply our model to synthesize proofs of propositional logic. We train the proposition-to-proof model using a training dataset of proposition-proof pairs. The evaluation against a benchmark set shows the very high accuracy and an improvement to the recent work of neural proof synthesis.

PLJun 20, 2017
Towards Proof Synthesis Guided by Neural Machine Translation for Intuitionistic Propositional Logic

Taro Sekiyama, Akifumi Imanishi, Kohei Suenaga

Inspired by the recent evolution of deep neural networks (DNNs) in machine learning, we explore their application to PL-related topics. This paper is the first step towards this goal; we propose a proof-synthesis method for the negation-free propositional logic in which we use a DNN to obtain a guide of proof search. The idea is to view the proof-synthesis problem as a translation from a proposition to its proof. We train seq2seq, which is a popular network in neural machine translation, so that it generates a proof encoded as a $λ$-term of a given proposition. We implement the whole framework and empirically observe that a generated proof term is close to a correct proof in terms of the tree edit distance of AST. This observation justifies using the output from a trained seq2seq model as a guide for proof search.

PLApr 25, 2016
Generalized Homogeneous Polynomials for Efficient Template-Based Nonlinear Invariant Synthesis

Kensuke Kojima, Minoru Kinoshita, Kohei Suenaga

The template-based method is one of the most successful approaches to algebraic invariant synthesis. In this method, an algorithm designates a template polynomial p over program variables, generates constraints for p=0 to be an invariant, and solves the generated constraints. However, this approach often suffers from an increasing template size if the degree of a template polynomial is too high. We propose a technique to make template-based methods more efficient. Our technique is based on the following finding: If an algebraic invariant exists, then there is a specific algebraic invariant that we call a generalized homogeneous algebraic invariant that is often smaller. This finding justifies using only a smaller template that corresponds to a generalized homogeneous algebraic invariant. Concretely, we state our finding above formally based on the abstract semantics of an imperative program proposed by Cachera et al. Then, we modify their template-based invariant synthesis so that it generates only generalized homogeneous algebraic invariants. This modification is proved to be sound. Furthermore, we also empirically demonstrate the merit of the restriction to generalized homogeneous algebraic invariants. Our implementation outperforms that of Cachera et al. for programs that require a higher-degree template.