Sayaka Shiota

CV
h-index18
20papers
334citations
Novelty40%
AI Score27

20 Papers

CVJun 11, 2022
Access Control of Semantic Segmentation Models Using Encrypted Feature Maps

Hiroki Ito, AprilPyone MaungMaung, Sayaka Shiota et al.

In this paper, we propose an access control method with a secret key for semantic segmentation models for the first time so that unauthorized users without a secret key cannot benefit from the performance of trained models. The method enables us not only to provide a high segmentation performance to authorized users but to also degrade the performance for unauthorized users. We first point out that, for the application of semantic segmentation, conventional access control methods which use encrypted images for classification tasks are not directly applicable due to performance degradation. Accordingly, in this paper, selected feature maps are encrypted with a secret key for training and testing models, instead of input images. In an experiment, the protected models allowed authorized users to obtain almost the same performance as that of non-protected models but also with robustness against unauthorized access without a key.

CRJul 26, 2023
Enhanced Security against Adversarial Examples Using a Random Ensemble of Encrypted Vision Transformer Models

Ryota Iijima, Miki Tanaka, Sayaka Shiota et al.

Deep neural networks (DNNs) are well known to be vulnerable to adversarial examples (AEs). In addition, AEs have adversarial transferability, which means AEs generated for a source model can fool another black-box model (target model) with a non-trivial probability. In previous studies, it was confirmed that the vision transformer (ViT) is more robust against the property of adversarial transferability than convolutional neural network (CNN) models such as ConvMixer, and moreover encrypted ViT is more robust than ViT without any encryption. In this article, we propose a random ensemble of encrypted ViT models to achieve much more robust models. In experiments, the proposed scheme is verified to be more robust against not only black-box attacks but also white-box ones than convention methods.

CVSep 5, 2023
Domain Adaptation for Efficiently Fine-tuning Vision Transformer with Encrypted Images

Teru Nagamori, Sayaka Shiota, Hitoshi Kiya

In recent years, deep neural networks (DNNs) trained with transformed data have been applied to various applications such as privacy-preserving learning, access control, and adversarial defenses. However, the use of transformed data decreases the performance of models. Accordingly, in this paper, we propose a novel method for fine-tuning models with transformed images under the use of the vision transformer (ViT). The proposed domain adaptation method does not cause the accuracy degradation of models, and it is carried out on the basis of the embedding structure of ViT. In experiments, we confirmed that the proposed method prevents accuracy degradation even when using encrypted images with the CIFAR-10 and CIFAR-100 datasets.

SDDec 17, 2021Code
JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification

Shinnosuke Takamichi, Ludwig Kürzinger, Takaaki Saeki et al.

In this paper, we construct a new Japanese speech corpus called "JTubeSpeech." Although recent end-to-end learning requires large-size speech corpora, open-sourced such corpora for languages other than English have not yet been established. In this paper, we describe the construction of a corpus from YouTube videos and subtitles for speech recognition and speaker verification. Our method can automatically filter the videos and subtitles with almost no language-dependent processes. We consistently employ Connectionist Temporal Classification (CTC)-based techniques for automatic speech recognition (ASR) and a speaker variation-based method for automatic speaker verification (ASV). We build 1) a large-scale Japanese ASR benchmark with more than 1,300 hours of data and 2) 900 hours of data for Japanese ASV.

AIFeb 11, 2024
A Random Ensemble of Encrypted Vision Transformers for Adversarially Robust Defense

Ryota Iijima, Sayaka Shiota, Hitoshi Kiya

Deep neural networks (DNNs) are well known to be vulnerable to adversarial examples (AEs). In previous studies, the use of models encrypted with a secret key was demonstrated to be robust against white-box attacks, but not against black-box ones. In this paper, we propose a novel method using the vision transformer (ViT) that is a random ensemble of encrypted models for enhancing robustness against both white-box and black-box attacks. In addition, a benchmark attack method, called AutoAttack, is applied to models to test adversarial robustness objectively. In experiments, the method was demonstrated to be robust against not only white-box attacks but also black-box ones in an image classification task on the CIFAR-10 and ImageNet datasets. The method was also compared with the state-of-the-art in a standardized benchmark for adversarial robustness, RobustBench, and it was verified to outperform conventional defenses in terms of clean accuracy and robust accuracy.

CVJan 10, 2024
Efficient Fine-Tuning with Domain Adaptation for Privacy-Preserving Vision Transformer

Teru Nagamori, Sayaka Shiota, Hitoshi Kiya

We propose a novel method for privacy-preserving deep neural networks (DNNs) with the Vision Transformer (ViT). The method allows us not only to train models and test with visually protected images but to also avoid the performance degradation caused from the use of encrypted images, whereas conventional methods cannot avoid the influence of image encryption. A domain adaptation method is used to efficiently fine-tune ViT with encrypted images. In experiments, the method is demonstrated to outperform conventional methods in an image classification task on the CIFAR-10 and ImageNet datasets in terms of classification accuracy.

CLJun 2, 2024
YODAS: Youtube-Oriented Dataset for Audio and Speech

Xinjian Li, Shinnosuke Takamichi, Takaaki Saeki et al.

In this study, we introduce YODAS (YouTube-Oriented Dataset for Audio and Speech), a large-scale, multilingual dataset comprising currently over 500k hours of speech data in more than 100 languages, sourced from both labeled and unlabeled YouTube speech datasets. The labeled subsets, including manual or automatic subtitles, facilitate supervised model training. Conversely, the unlabeled subsets are apt for self-supervised learning applications. YODAS is distinctive as the first publicly available dataset of its scale, and it is distributed under a Creative Commons license. We introduce the collection methodology utilized for YODAS, which contributes to the large-scale speech dataset construction. Subsequently, we provide a comprehensive analysis of speech, text contained within the dataset. Finally, we describe the speech recognition baselines over the top-15 languages.

CRJan 5, 2024
A Random Ensemble of Encrypted models for Enhancing Robustness against Adversarial Examples

Ryota Iijima, Sayaka Shiota, Hitoshi Kiya

Deep neural networks (DNNs) are well known to be vulnerable to adversarial examples (AEs). In addition, AEs have adversarial transferability, which means AEs generated for a source model can fool another black-box model (target model) with a non-trivial probability. In previous studies, it was confirmed that the vision transformer (ViT) is more robust against the property of adversarial transferability than convolutional neural network (CNN) models such as ConvMixer, and moreover encrypted ViT is more robust than ViT without any encryption. In this article, we propose a random ensemble of encrypted ViT models to achieve much more robust models. In experiments, the proposed scheme is verified to be more robust against not only black-box attacks but also white-box ones than convention methods.

CVFeb 5, 2022
Adversarial Detector with Robust Classifier

Takayuki Osakabe, Maungmaung Aprilpyone, Sayaka Shiota et al.

Deep neural network (DNN) models are wellknown to easily misclassify prediction results by using input images with small perturbations, called adversarial examples. In this paper, we propose a novel adversarial detector, which consists of a robust classifier and a plain one, to highly detect adversarial examples. The proposed adversarial detector is carried out in accordance with the logits of plain and robust classifiers. In an experiment, the proposed detector is demonstrated to outperform a state-of-the-art detector without any robust classifier.

CVJan 26, 2022
An Overview of Compressible and Learnable Image Transformation with Secret Key and Its Applications

Hitoshi Kiya, AprilPyone MaungMaung, Yuma Kinoshita et al.

This article presents an overview of image transformation with a secret key and its applications. Image transformation with a secret key enables us not only to protect visual information on plain images but also to embed unique features controlled with a key into images. In addition, numerous encryption methods can generate encrypted images that are compressible and learnable for machine learning. Various applications of such transformation have been developed by using these properties. In this paper, we focus on a class of image transformation referred to as learnable image encryption, which is applicable to privacy-preserving machine learning and adversarially robust defense. Detailed descriptions of both transformation algorithms and performances are provided. Moreover, we discuss robustness against various attacks.

CVAug 4, 2021
A universal detector of CNN-generated images using properties of checkerboard artifacts in the frequency domain

Miki Tanaka, Sayaka Shiota, Hitoshi Kiya

We propose a novel universal detector for detecting images generated by using CNNs. In this paper, properties of checkerboard artifacts in CNN-generated images are considered, and the spectrum of images is enhanced in accordance with the properties. Next, a classifier is trained by using the enhanced spectrums to judge a query image to be a CNN-generated ones or not. In addition, an ensemble of the proposed detector with emphasized spectrums and a conventional detector is proposed to improve the performance of these methods. In an experiment, the proposed ensemble is demonstrated to outperform a state-of-the-art method under some conditions.

CRJul 17, 2020
A Privacy-Preserving Machine Learning Scheme Using EtC Images

Ayana Kawamura, Yuma Kinoshita, Takayuki Nakachi et al.

We propose a privacy-preserving machine learning scheme with encryption-then-compression (EtC) images, where EtC images are images encrypted by using a block-based encryption method proposed for EtC systems with JPEG compression. In this paper, a novel property of EtC images is first discussed, although EtC ones was already shown to be compressible as a property. The novel property allows us to directly apply EtC images to machine learning algorithms non-specialized for computing encrypted data. In addition, the proposed scheme is demonstrated to provide no degradation in the performance of some typical machine learning algorithms including the support vector machine algorithm with kernel trick and random forests under the use of z-score normalization. A number of facial recognition experiments with are carried out to confirm the effectiveness of the proposed scheme.

IVAug 22, 2019
An Image Fusion Scheme for Single-Shot High Dynamic Range Imaging with Spatially Varying Exposures

Chihiro Go, Yuma Kinoshita, Sayaka Shiota et al.

This paper proposes a novel multi-exposure image fusion (MEF) scheme for single-shot high dynamic range imaging with spatially varying exposures (SVE). Single-shot imaging with SVE enables us not only to produce images without color saturation regions from a single-shot image, but also to avoid ghost artifacts in the producing ones. However, the number of exposures is generally limited to two, and moreover it is difficult to decide the optimum exposure values before the photographing. In the proposed scheme, a scene segmentation method is applied to input multi-exposure images, and then the luminance of the input images is adjusted according to both of the number of scenes and the relationship between exposure values and pixel values. The proposed method with the luminance adjustment allows us to improve the above two issues. In this paper, we focus on dual-ISO imaging as one of single-shot imaging. In an experiment, the proposed scheme is demonstrated to be effective for single-shot high dynamic range imaging with SVE, compared with conventional MEF schemes with exposure compensation.

IVAug 1, 2019
Single-Shot High Dynamic Range Imaging with Spatially Varying Exposures Considering Hue Distortion

Chihiro Go, Yuma Kinoshita, Sayaka Shiota et al.

We proposes a novel single-shot high dynamic range imaging scheme with spatially varying exposures (SVE) considering hue distortion. Single-shot imaging with SVE enables us to capture multi-exposure images from a single-shot image, so high dynamic range images can be produced without ghost artifacts. However, SVE images have some pixels at which a range supported by camera sensors is exceeded. Therefore, generated images have some color distortion, so that conventional imaging with SVE has never considered the influence of this range limitation. To overcome this issue, we consider estimating the correct hue of a scene from raw images, and propose a method with the estimated hue information for correcting the hue of SVE images on the constant hue plain in the RGB color space.

CVNov 8, 2018
A Retinex-based Image Enhancement Scheme with Noise Aware Shadow-up Function

Chien Cheng Chien, Yuma Kinoshita, Sayaka Shiota et al.

This paper proposes a novel image contrast enhancement method based on both a noise aware shadow-up function and Retinex (retina and cortex) decomposition. Under low light conditions, images taken by digital cameras have low contrast in dark or bright regions. This is due to a limited dynamic range that imaging sensors have. For this reason, various contrast enhancement methods have been proposed. Our proposed method can enhance the contrast of images without not only over-enhancement but also noise amplification. In the proposed method, an image is decomposed into illumination layer and reflectance layer based on the retinex theory, and lightness information of the illumination layer is adjusted. A shadow-up function is used for preventing over-enhancement. The proposed mapping function, designed by using a noise aware histogram, allows not only to enhance contrast of dark region, but also to avoid amplifying noise, even under strong noise environments.

CRSep 19, 2018
Privacy-Preserving SVM Computing by Using Random Unitary Transformation

Takahiro Maekawa, Takayuki Nakachi, Sayaka Shiota et al.

A privacy-preserving Support Vector Machine (SVM) computing scheme is proposed in this paper. Cloud computing has been spreading in many fields. However, the cloud computing has some serious issues for end users, such as unauthorized use and leak of data, and privacy compromise. We focus on templates protected by using a random unitary transformation, and consider some properties of the protected templates for secure SVM computing, where templates mean features extracted from data. The proposed scheme enables us not only to protect templates, but also to have the same performance as that of unprotected templates under some useful kernel functions. Moreover, it can be directly carried out by using well-known SVM algorithms, without preparing any algorithms specialized for secure SVM computing. In the experiments, the proposed scheme is applied to a face-based authentication algorithm with SVM classifiers to confirm the effectiveness.

CVAug 1, 2018
A Pseudo Multi-Exposure Fusion Method Using Single Image

Yuma Kinoshita, Sayaka Shiota, Hitoshi Kiya

This paper proposes a novel pseudo multi-exposure image fusion method based on a single image. Multi-exposure image fusion is used to produce images without saturation regions, by using photos with different exposures. However, it is difficult to take photos suited for the multi-exposure image fusion when we take a photo of dynamic scenes or record a video. In addition, the multi-exposure image fusion cannot be applied to existing images with a single exposure or videos. The proposed method enables us to produce pseudo multi-exposure images from a single image. To produce multi-exposure images, the proposed method utilizes the relationship between the exposure values and pixel values, which is obtained by assuming that a digital camera has a linear response function. Moreover, it is shown that the use of a local contrast enhancement method allows us to produce pseudo multi-exposure images with higher quality. Most of conventional multi-exposure image fusion methods are also applicable to the proposed multi-exposure images. Experimental results show the effectiveness of the proposed method by comparing the proposed one with conventional ones.

CVJun 23, 2018
Multi-Exposure Image Fusion Based on Exposure Compensation

Yuma Kinoshita, Taichi Yoshida, Sayaka Shiota et al.

This paper proposes a novel multi-exposure image fusion method based on exposure compensation. Multi-exposure image fusion is a method to produce images without color saturation regions, by using photos with different exposures. However, in conventional works, it is unclear how to determine appropriate exposure values, and moreover, it is difficult to set appropriate exposure values at the time of photographing due to time constraints. In the proposed method, the luminance of the input multi-exposure images is adjusted on the basis of the relationship between exposure values and pixel values, where the relationship is obtained by assuming that a digital camera has a linear response function. The use of a local contrast enhancement method is also considered to improve input multi-exposure images. The compensated images are finally combined by one of existing multi-exposure image fusion methods. In some experiments, the effectiveness of the proposed method are evaluated in terms of the tone mapped image quality index, statistical naturalness, and discrete entropy, by comparing the proposed one with conventional ones.

CVJun 7, 2018
Super-Resolution using Convolutional Neural Networks without Any Checkerboard Artifacts

Yusuke Sugawara, Sayaka Shiota, Hitoshi Kiya

It is well-known that a number of excellent super-resolution (SR) methods using convolutional neural networks (CNNs) generate checkerboard artifacts. A condition to avoid the checkerboard artifacts is proposed in this paper. So far, checkerboard artifacts have been mainly studied for linear multirate systems, but the condition to avoid checkerboard artifacts can not be applied to CNNs due to the non-linearity of CNNs. We extend the avoiding condition for CNNs, and apply the proposed structure to some typical SR methods to confirm the effectiveness of the new scheme. Experiment results demonstrate that the proposed structure can perfectly avoid to generate checkerboard artifacts under two loss conditions: mean square error and perceptual loss, while keeping excellent properties that the SR methods have.

CVMay 29, 2018
Automatic Exposure Compensation for Multi-Exposure Image Fusion

Yuma Kinoshita, Sayaka Shiota, Hitoshi Kiya

This paper proposes a novel luminance adjustment method based on automatic exposure compensation for multi-exposure image fusion. Multi-exposure image fusion is a method to produce images without saturation regions, by using photos with different exposures. In conventional works, it has been pointed out that the quality of those multi-exposure images can be improved by adjusting the luminance of them. However, how to determine the degree of adjustment has never been discussed. This paper therefore proposes a way to automatically determines the degree on the basis of the luminance distribution of input multi-exposure images. Moreover, new weights, called "simple weights", for image fusion are also considered for the proposed luminance adjustment method. Experimental results show that the multi-exposure images adjusted by the proposed method have better quality than the input multi-exposure ones in terms of the well-exposedness. It is also confirmed that the proposed simple weights provide the highest score of statistical naturalness and discrete entropy in all fusion methods.