CVApr 13

Byte-level generative predictions for forensics multimedia carving

arXiv:2604.1101015.4h-index: 2
AI Analysis

For digital forensic investigators, this work introduces a generative method for multimedia carving, addressing the limitation of traditional discriminative models that cannot reconstruct missing data.

This paper tackles the challenge of recovering fragmented multimedia files without file system metadata by proposing a generative approach using bGPT, a byte-level transformer for next-byte prediction. The model generates likely fragment continuations, achieving effective byte-level pattern prediction for fragment matching in unallocated disk space.

Digital forensic investigations often face significant challenges when recovering fragmented multimedia files that lack file system metadata. While traditional file carving relies on signatures and discriminative deep learning models for fragment classification, these methods cannot reconstruct or predict missing data. We propose a generative approach to multimedia carving using bGPT, a byte-level transformer designed for next-byte prediction. By feeding partial BMP image data into the model, we simulate the generation of likely fragment continuations. We evaluate the fidelity of these predictions using different metrics, namely, cosine similarity, structural similarity index (SSIM), chi-square distance, and Jensen-Shannon divergence (JSD). Our findings demonstrate that generative models can effectively predict byte-level patterns to support fragment matching in unallocated disk space.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes