Trusting What You Cannot See: Auditable Fine-Tuning and Inference for Proprietary AI
This work is significant for clients and organizations that delegate LLM fine-tuning and inference to cloud providers, as it provides a practical solution to the fundamental trust gap and security risks arising from the inability to audit these proprietary processes.
The paper addresses the challenge of verifying the integrity of fine-tuning and inference processes for large language models (LLMs) when delegated to cloud providers. They introduce AFTUNE, a framework that uses lightweight recording and spot-check mechanisms to generate verifiable execution traces, enabling clients to audit whether the processes adhered to agreed configurations with practical computation overhead.
Cloud-based infrastructures have become the dominant platform for deploying large models, particularly large language models (LLMs). Fine-tuning and inference are increasingly delegated to cloud providers for simplified deployment and access to proprietary models, yet this creates a fundamental trust gap: although cryptographic and TEE-based verification exist, the scale of modern LLMs renders them prohibitive, leaving clients unable to practically audit these processes. This lack of transparency creates concrete security risks that can silently compromise service integrity. We present AFTUNE, an auditable and verifiable framework that ensures the computation integrity of cloud-based fine-tuning and inference. AFTUNE incorporates a lightweight recording and spot-check mechanism that produces verifiable traces of execution. These traces enable clients to later audit whether the training and inference processes followed the agreed configurations. Our evaluation shows that AFTUNE imposes practical computation overhead while enabling selective and efficient verification, demonstrating that trustworthy model services are achievable in today's cloud environments.