LGNov 22, 2022
Compiler Provenance Recovery for Multi-CPU Architectures Using a Centrifuge MechanismYuhei Otsubo, Akira Otsuka, Mamoru Mimura
Bit-stream recognition (BSR) has many applications, such as forensic investigations, detection of copyright infringement, and malware analysis. We propose the first BSR that takes a bare input bit-stream and outputs a class label without any preprocessing. To achieve our goal, we propose a centrifuge mechanism, where the upstream layers (sub-net) capture global features and tell the downstream layers (main-net) to switch the focus, even if a part of the input bit-stream has the same value. We applied the centrifuge mechanism to compiler provenance recovery, a type of BSR, and achieved excellent classification. Additionally, downstream transfer learning (DTL), one of the learning methods we propose for the centrifuge mechanism, pre-trains the main-net using the sub-net's ground truth instead of the sub-net's output. We found that sub-predictions made by DTL tend to be highly accurate when the sub-label classification contributes to the essence of the main prediction.
LGMar 4, 2024
Robustness bounds on the successful adversarial examples in probabilistic models: Implications from Gaussian processesHiroaki Maeshima, Akira Otsuka
Adversarial example (AE) is an attack method for machine learning, which is crafted by adding imperceptible perturbation to the data inducing misclassification. In the current paper, we investigated the upper bound of the probability of successful AEs based on the Gaussian Process (GP) classification, a probabilistic inference model. We proved a new upper bound of the probability of a successful AE attack that depends on AE's perturbation norm, the kernel function used in GP, and the distance of the closest pair with different labels in the training dataset. Surprisingly, the upper bound is determined regardless of the distribution of the sample dataset. We showed that our theoretical result was confirmed through the experiment using ImageNet. In addition, we showed that changing the parameters of the kernel function induces a change of the upper bound of the probability of successful AEs.
CROct 30, 2020
Evaluation of vulnerability reproducibility in container-based Cyber RangeRyotaro Nakata, Akira Otsuka
A cyber range, a practical and highly educational information security exercise system, is difficult to implement in educational institutions because of the high cost of implementing and maintaining it. Therefore, there is a need for a cyber range that can be adopted and maintained at a low cost. Recently, container type virtualization is gaining attention as it can create a high-speed and high-density exercise environment. However, existing researches have not clearly shown the advantages of container virtualization for building exercise environments. And it is not clear whether the sufficient vulnerabilities are reproducible, which is required to conduct incident scenarios in cyber range. In this paper, we compare container virtualization with existing virtualization type and confirm that the amount of memory, CPU, and storage consumption can be reduced to less than 1/10 of the conventional virtualization methods. We also compare and verify the reproducibility of the vulnerabilities used in common exercise scenarios and confirm that 99.3% of the vulnerabilities are reproducible. The container-based cyber range can be used as a new standard to replace existing methods.
CRJun 14, 2018
o-glasses: Visualizing x86 Code from Binary Using a 1d-CNNYuhei Otsubo, Akira Otsuka, Mamoru Mimura et al.
Malicious document files used in targeted attacks often contain a small program called shellcode. It is often hard to prepare a runnable environment for dynamic analysis of these document files because they exploit specific vulnerabilities. In these cases, it is necessary to identify the position of the shellcode in each document file to analyze it. If the exploit code uses executable scripts such as JavaScript and Flash, it is not so hard to locate the shellcode. On the other hand, it is sometimes almost impossible to locate the shellcode when it does not contain any JavaScript or Flash but consists of native x86 code only. Binary fragment classification is often applied to visualize the location of regions of interest, and shellcode must contain at least a small fragment of x86 native code even if most of it is obfuscated, such as, a decoder for the obfuscated body of the shellcode. In this paper, we propose a novel method, o-glasses, to visualize the shellcode by recognizing the x86 native code using a specially designed one-dimensional convolutional neural network (1d-CNN). The fragment size needs to be as small as the minimum size of the x86 native code in the whole shellcode. Our results show that a 16-instruction-sequence (approximately 48 bytes on average) is sufficient for the code fragment visualization. Our method, o-glasses (1d-CNN), outperforms other methods in that it recognizes x86 native code with a surprisingly high F-measure rate (about 99.95%).