Tomohiro I

CR
h-index1
5papers
19citations
Novelty45%
AI Score38

5 Papers

31.4DSMay 14
Almost succinct representation of maximal palindromes

Takuya Mieno, Tomohiro I

Palindromes are strings that read the same forward and backward. The computation of palindromic structures within strings is a fundamental problem in string algorithms, being motivated by potential applications in formal language theory and bioinformatics. Although the number of palindromic factors in a string of length $n$ can be quadratic, they can be implicitly represented in $O(n \log n)$ bits of space by storing the lengths of all maximal palindromes in an integer array, which can be computed in $O(n)$ time [Manacher, 1975]. In this paper, for any positive constant $ε< 1$, we propose a novel $(3(1+ε)n + o(n))$-bit representation of all maximal palindromes in a string, which enables $O(1)$-time retrieval of the length of the maximal palindrome centered at any given position. The data structure can be constructed in $O(n)$ time from the input string of length $n$. Since Manacher's algorithm and the notion of maximal palindromes are widely utilized for solving numerous problems involving palindromic structures, our compact representation will accelerate the development of more space-efficient solutions to such problems. Indeed, as the first application of our compact representation of maximal palindromes, we present a data structure of size $O(n)$ bits that can compute the longest palindrome appearing in any given factor of a string of length $n$ in $O(\log n)$ time.

CRMay 19, 2025
Outsourced Privacy-Preserving Feature Selection Based on Fully Homomorphic Encryption

Koki Wakiyama, Tomohiro I, Hiroshi Sakamoto

Feature selection is a technique that extracts a meaningful subset from a set of features in training data. When the training data is large-scale, appropriate feature selection enables the removal of redundant features, which can improve generalization performance, accelerate the training process, and enhance the interpretability of the model. This study proposes a privacy-preserving computation model for feature selection. Generally, when the data owner and analyst are the same, there is no need to conceal the private information. However, when they are different parties or when multiple owners exist, an appropriate privacy-preserving framework is required. Although various private feature selection algorithms, they all require two or more computing parties and do not guarantee security in environments where no external party can be fully trusted. To address this issue, we propose the first outsourcing algorithm for feature selection using fully homomorphic encryption. Compared to a prior two-party algorithm, our result improves the time and space complexity O(kn^2) to O(kn log^3 n) and O(kn), where k and n denote the number of features and data samples, respectively. We also implemented the proposed algorithm and conducted comparative experiments with the naive one. The experimental result shows the efficiency of our method even with small datasets.

CLFeb 28, 2022
LCP-dropout: Compression-based Multiple Subword Segmentation for Neural Machine Translation

Keita Nonaka, Kazutaka Yamanouchi, Tomohiro I et al.

In this study, we propose a simple and effective preprocessing method for subword segmentation based on a data compression algorithm. Compression-based subword segmentation has recently attracted significant attention as a preprocessing method for training data in Neural Machine Translation. Among them, BPE/BPE-dropout is one of the fastest and most effective method compared to conventional approaches. However, compression-based approach has a drawback in that generating multiple segmentations is difficult due to the determinism. To overcome this difficulty, we focus on a probabilistic string algorithm, called locally-consistent parsing (LCP), that has been applied to achieve optimum compression. Employing the probabilistic mechanism of LCP, we propose LCP-dropout for multiple subword segmentation that improves BPE/BPE-dropout, and show that it outperforms various baselines in learning from especially small training data.

CROct 11, 2021
Privacy-Preserving Feature Selection with Fully Homomorphic Encryption

Shinji Ono, Jun Takata, Masaharu Kataoka et al.

For the feature selection problem, we propose an efficient privacy-preserving algorithm. Let $D$, $F$, and $C$ be data, feature, and class sets, respectively, where the feature value $x(F_i)$ and the class label $x(C)$ are given for each $x\in D$ and $F_i \in F$. For a triple $(D,F,C)$, the feature selection problem is to find a consistent and minimal subset $F' \subseteq F$, where `consistent' means that, for any $x,y\in D$, $x(C)=y(C)$ if $x(F_i)=y(F_i)$ for $F_i\in F'$, and `minimal' means that any proper subset of $F'$ is no longer consistent. On distributed datasets, we consider feature selection as a privacy-preserving problem: Assume that semi-honest parties $\textsf A$ and $\textsf B$ have their own personal $D_{\textsf A}$ and $D_{\textsf B}$. The goal is to solve the feature selection problem for $D_{\textsf A}\cup D_{\textsf B}$ without revealing their privacy. In this paper, we propose a secure and efficient algorithm based on fully homomorphic encryption, and we implement our algorithm to show its effectiveness for various practical data. The proposed algorithm is the first one that can directly simulate the CWC (Combination of Weakest Components) algorithm on ciphertext, which is one of the best performers for the feature selection problem on the plaintext.

CRNov 25, 2019
Faster Privacy-Preserving Computation of Edit Distance with Moves

Yohei Yoshimoto, Masaharu Kataoka, Yoshimasa Takabatake et al.

We consider an efficient two-party protocol for securely computing the similarity of strings w.r.t. an extended edit distance measure. Here, two parties possessing strings $x$ and $y$, respectively, want to jointly compute an approximate value for $\mathrm{EDM}(x,y)$, the minimum number of edit operations including substring moves needed to transform $x$ into $y$, without revealing any private information. Recently, the first secure two-party protocol for this was proposed, based on homomorphic encryption, but this approach is not suitable for long strings due to its high communication and round complexities. In this paper, we propose an improved algorithm that significantly reduces the round complexity without sacrificing its cryptographic strength. We examine the performance of our algorithm for DNA sequences compared to previous one.