Enhanced Consistency Bi-directional GAN (CBiGAN) for Malware Anomaly Detection
For cybersecurity practitioners, this provides a practical and scalable anomaly detection approach for diverse malware file formats without requiring handcrafted features or dynamic analysis.
This work applies a Consistency Bi-directional GAN (CBiGAN) for malware anomaly detection by transforming executables into visual encodings and using reconstruction discrepancies to detect anomalies. The method achieves stable AUC performance across multiple datasets including 214 malware families, offering a scalable and lightweight pipeline.
Static malware analysis remains a core technique in cybersecurity due to its ability to assess potentially malicious software without execution. Nevertheless, many existing static approaches rely on handcrafted features or curated datasets that may not generalize well to evolving malware distributions. In this work, we investigate an alternative representation that operates directly on raw binary content. Executable files are transformed into visual encodings that preserve local structural relationships, enabling the use of deep learning models without requiring semantic disassembly or dynamic behavior profiling. This study explores the use of a Consistency Bi-directional Generative Adversarial Network (CBi-GAN) as an anomaly detection framework rather than as a generative model. The method enforces consistency between latent encodings and reconstructions, allowing deviations from learned benign structure to be quantified through reconstruction discrepancies. Importantly, the approach does not introduce a new generative architecture, instead, it evaluates how consistency based generative modeling can be applied at scale to heterogeneous malware data. The proposed framework is evaluated across multiple datasets comprising both Portable Executable (PE) and Object Linking and Embedding (OLE) files, including a large self-collected corpus spanning 214 malware families. Results demonstrate stable detection performance in terms of Area Under the Curve (AUC) while maintaining a unified and computationally lightweight processing pipeline. These findings suggest that consistency based generative modeling provides a practical and scalable direction for malware anomaly detection across diverse file formats and threat families.