LG AIMay 16, 2024

HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models

Rhea Sanjay Sukthanker, Arber Zela, Benedikt Staffler, Aaron Klein, Lennart Purucker, Joerg K. H. Franke, Frank Hutter

arXiv:2405.10299v312.510 citationsh-index: 18Has CodeNIPS

Originality Incremental advance

AI Analysis

This addresses the problem of hardware-aware model selection for researchers and practitioners, offering a tool to efficiently evaluate trade-offs, though it is incremental as it builds on existing NAS and benchmarking methods.

The paper tackled the challenge of optimizing language model configurations under hardware constraints by introducing HW-GPT-Bench, a benchmark that uses surrogate predictions to approximate hardware metrics like latency and energy across 13 devices for GPT-2 architectures up to 1.55B parameters, enabling multi-objective optimization simulations in seconds.

The increasing size of language models necessitates a thorough analysis across multiple dimensions to assess trade-offs among crucial hardware metrics such as latency, energy consumption, GPU memory usage, and performance. Identifying optimal model configurations under specific hardware constraints is becoming essential but remains challenging due to the computational load of exhaustive training and evaluation on multiple devices. To address this, we introduce HW-GPT-Bench, a hardware-aware benchmark that utilizes surrogate predictions to approximate various hardware metrics across 13 devices of architectures in the GPT-2 family, with architectures containing up to 1.55B parameters. Our surrogates, via calibrated predictions and reliable uncertainty estimates, faithfully model the heteroscedastic noise inherent in the energy and latency measurements. To estimate perplexity, we employ weight-sharing techniques from Neural Architecture Search (NAS), inheriting pretrained weights from the largest GPT-2 model. Finally, we demonstrate the utility of HW-GPT-Bench by simulating optimization trajectories of various multi-objective optimization algorithms in just a few seconds.

View on arXiv PDF Code

Similar