Masaki Hashimoto

6.6CLMar 24

Foundational Study on Authorship Attribution of Japanese Web Reviews for Actor Analysis

Hiroshi Matsubara, Shingo Matsugaya, Taichi Aoki et al.

This study investigates the applicability of authorship attribution based on stylistic features to support actor analysis in threat intelligence. As a foundational step toward future application to dark web forums, we conducted experiments using Japanese review data from clear web sources. We constructed datasets from Rakuten Ichiba reviews and compared four methods: TF-IDF with logistic regression (TF-IDF+LR), BERT embeddings with logistic regression (BERT-Emb+LR), BERT fine-tuning (BERT-FT), and metric learning with $k$-nearest neighbors (Metric+kNN). Results showed that BERT-FT achieved the best performance; however, training became unstable as the number of authors scaled to several hundred, where TF-IDF+LR proved superior in terms of accuracy, stability, and computational cost. Furthermore, Top-$k$ evaluation demonstrated the utility of candidate screening, and error analysis revealed that boilerplate text, topic dependency, and short text length were primary factors causing misclassification.

CRMay 27, 2025

Uncovering Black-hat SEO based fake E-commerce scam groups from their redirectors and websites

Makoto Shimamura, Shingo Matsugaya, Keisuke Sakai et al.

While law enforcements agencies and cybercrime researchers are working hard, fake E-commerce scam is still a big threat to Internet users. One of the major techniques to victimize users is luring them by black-hat search-engine-optimization (SEO); making search engines display their lure pages as if these were placed on compromised websites and then redirecting visitors to malicious sites. In this study, we focus on the threat actors conduct fake E-commerce scam with this strategy. Our previous study looked at the connection between some malware families used for black-hat SEO to enlighten threat actors and their infrastructures, however it shows only a limited part of the whole picture because we could not find all SEO malware samples from limited sources. In this paper, we aim to identify and analyze threat actor groups using a large dataset of fake E-commerce sites collected by Japan Cybercrime Control Center, which we believe is of higher quality. It includes 692,865 fake EC sites gathered from redirectors over two and a half years, from May 20, 2022 to Dec. 31, 2024. We analyzed the links between these sites using Maltego, a well-known link analysis tool, and tailored programs. We also conducted time series analysis to track group changes in the groups. According to the analysis, we estimate that 17 relatively large groups were active during the dataset period and some of them were active throughout the period.

Masaki Hashimoto

2 Papers