Energy-Efficient Transformer Inference: Optimization Strategies for Time Series Classification
This addresses energy efficiency for transformer deployment in resource-constrained environments, but it is incremental as it applies existing optimization methods to a specific domain.
The study tackled the computational demands of transformer models in time series classification by investigating optimization techniques like pruning and quantization, finding that static quantization reduced energy consumption by 29.14% and L1 pruning improved inference speed by 63% with minimal accuracy loss.
The increasing computational demands of transformer models in time series classification necessitate effective optimization strategies for energy-efficient deployment. Our study presents a systematic investigation of optimization techniques, focusing on structured pruning and quantization methods for transformer architectures. Through extensive experimentation on three distinct datasets (RefrigerationDevices, ElectricDevices, and PLAID), we quantitatively evaluate model performance and energy efficiency across different transformer configurations. Our experimental results demonstrate that static quantization reduces energy consumption by 29.14% while maintaining classification performance, and L1 pruning achieves a 63% improvement in inference speed with minimal accuracy degradation. Our findings provide valuable insights into the effectiveness of optimization strategies for transformer-based time series classification, establishing a foundation for efficient model deployment in resource-constrained environments.