AI LODec 18, 2025

TOGGLE: Temporal Logic-Guided Large Language Model Compression for Edge

arXiv:2512.16855v13.3h-index: 152025 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Originality Incremental advance

AI Analysis

This work addresses the deployment of LLMs on resource-constrained edge devices by providing a novel, verifiable compression method, though it is incremental in applying formal methods to a specific domain.

The paper tackles the problem of compressing large language models for edge deployment by proposing TOGGLE, a framework that uses Signal Temporal Logic to enforce linguistic properties during compression, achieving up to 3.3x reduction in computational costs and 68.8% reduction in model size while preserving specified constraints.

Large Language Models (LLMs) deliver exceptional performance across natural language tasks but demand substantial computational resources, limiting their deployment on resource-constrained edge devices. Existing compression techniques, such as quantization and pruning, often degrade critical linguistic properties and lack formal guarantees for preserving model behavior. We propose Temporal Logic-Guided Large Language Model Compression (TOGGLE), a novel framework that leverages Signal Temporal Logic (STL) to formally specify and enforce linguistic properties during compression. TOGGLE employs an STL robustness-guided Bayesian optimization to systematically explore layer-wise quantization and pruning configurations, generating compressed models that formally satisfy specified linguistic constraints without retraining or fine-tuning. Evaluating TOGGLE on four LLM architectures (GPT-2, DeepSeek-V2 7B, LLaMA 3 8B, and Mistral 7B), we achieve up to 3.3x reduction in computational costs (FLOPs) and up to a 68.8% reduction in model size while satisfying all linguistic properties. TOGGLE represents the first integration of formal methods into LLM compression, enabling efficient, verifiable deployment of LLMs on edge hardware.

View on arXiv PDF

Similar