DC AIJan 25, 2024

The Case for Co-Designing Model Architectures with Hardware

Quentin Anthony, Jacob Hatef, Deepak Narayanan, Stella Biderman, Stas Bekman, Junqi Yin, Aamir Shafi, Hari Subramoni, Dhabaleswar Panda

arXiv:2401.14489v210.314 citationsHas CodeICPP

Originality Incremental advance

AI Analysis

This work addresses performance bottlenecks for users training and deploying transformer models on GPUs, offering incremental improvements through hardware-aware design.

The paper tackles the problem of inefficient deep learning model designs that overlook hardware implications, providing guidelines for optimizing transformer model shapes to improve runtime performance, resulting in up to 39% higher throughput while preserving accuracy.

While GPUs are responsible for training the vast majority of state-of-the-art deep learning models, the implications of their architecture are often overlooked when designing new deep learning (DL) models. As a consequence, modifying a DL model to be more amenable to the target hardware can significantly improve the runtime performance of DL training and inference. In this paper, we provide a set of guidelines for users to maximize the runtime performance of their transformer models. These guidelines have been created by carefully considering the impact of various model hyperparameters controlling model shape on the efficiency of the underlying computation kernels executed on the GPU. We find the throughput of models with efficient model shapes is up to 39\% higher while preserving accuracy compared to models with a similar number of parameters but with unoptimized shapes.

View on arXiv PDF Code

Similar