Embedding Trust at Scale: Physics-Aware Neural Watermarking for Secure and Verifiable Data Pipelines
This provides a scalable tool for data provenance and auditability in scientific domains, though it is incremental as it builds on existing watermarking methods with a neural approach.
The paper tackles the problem of ensuring data integrity in scientific workflows by developing a neural watermarking framework that invisibly embeds binary messages into high-dimensional fields like climate and fluid data, achieving >98% bit accuracy and sub-1% MSE fidelity under lossy transformations.
We present a robust neural watermarking framework for scientific data integrity, targeting high-dimensional fields common in climate modeling and fluid simulations. Using a convolutional autoencoder, binary messages are invisibly embedded into structured data such as temperature, vorticity, and geopotential. Our method ensures watermark persistence under lossy transformations - including noise injection, cropping, and compression - while maintaining near-original fidelity (sub-1\% MSE). Compared to classical singular value decomposition (SVD)-based watermarking, our approach achieves $>$98\% bit accuracy and visually indistinguishable reconstructions across ERA5 and Navier-Stokes datasets. This system offers a scalable, model-compatible tool for data provenance, auditability, and traceability in high-performance scientific workflows, and contributes to the broader goal of securing AI systems through verifiable, physics-aware watermarking. We evaluate on physically grounded scientific datasets as a representative stress-test; the framework extends naturally to other structured domains such as satellite imagery and autonomous-vehicle perception streams.