Prach Chantasantitam, Adam Ilyas Caulfield, Vasisht Duddu et al.
Machine learning property attestations allow provers (e.g., model providers or owners) to attest properties of their models/datasets to verifiers (e.g., regulators, customers), enabling accountability towards regulations and policies. But, current approaches do not support generative models or large datasets. We present PAL*M, a property attestation framework for large generative models, illustrated using large language models. PAL*M defines properties across training and inference, leverages confidential virtual machines with security-aware GPUs for coverage of CPU-GPU operations, and proposes using incremental multiset hashing over memory-mapped datasets to efficiently track their integrity. We implement PAL*M on Intel TDX+NVIDIA H100 and evaluate it using state-of-the-art models and datasets, showing PAL*M is efficient, incurring < 11% overhead for common operations. Finally, we use the Tamarin Prover symbolic verification tool to formally model PAL*M's property attestation protocol, confirming that its security guarantees are upheld under the defined threat model.