CRAISEFeb 18, 2025

Innamark: A Whitespace Replacement Information-Hiding Method

arXiv:2502.12710v33 citationsh-index: 6IEEE Access
Originality Incremental advance
AI Analysis

This addresses the need for imperceptible information-hiding in text for applications like watermarking, though it is incremental as it builds on existing whitespace-based techniques.

The paper tackles the problem of distinguishing human-written from LLM-generated text by introducing Innamark, a method that hides information by replacing whitespace with similar Unicode characters, preserving semantics and character count, and demonstrates its robustness and imperceptibility in experiments on 1,000,000 Wikipedia articles.

Large language models (LLMs) have gained significant popularity in recent years. Differentiating between a text written by a human and one generated by an LLM has become almost impossible. Information-hiding techniques such as digital watermarking or steganography can help by embedding information inside text in a form that is unlikely to be noticed. However, existing techniques, such as linguistic-based or format-based methods, change the semantics or cannot be applied to pure, unformatted text. In this paper, we introduce a novel method for information hiding called Innamark, which can conceal any byte-encoded sequence within a sufficiently long cover text. This method is implemented as a multi-platform library using the Kotlin programming language, which is accompanied by a command-line tool and a web interface. By substituting conventional whitespace characters with visually similar Unicode whitespace characters, our proposed scheme preserves the semantics of the cover text without changing the number of characters. Furthermore, we propose a specified structure for secret messages that enables configurable compression, encryption, hashing, and error correction. An experimental benchmark comparison on a dataset of 1 000 000 Wikipedia articles compares ten algorithms. The results demonstrate the robustness of our proposed Innamark method in various applications and the imperceptibility of its watermarks to humans. We discuss the limits to the embedding capacity and robustness of the algorithm and how these could be addressed in future work.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes