LG AI PFSep 3, 2025

Estudio de la eficiencia en la escalabilidad de GPUs para el entrenamiento de Inteligencia Artificial

David Cortes, Carlos Juiz, Belen Bermejo

arXiv:2509.03263v1h-index: 1

Originality Synthesis-oriented

AI Analysis

This addresses efficiency challenges in GPU-based AI training for researchers and industry, but appears incremental as it builds on existing benchmarks like MLPerf.

The study analyzed GPU scalability efficiency for training large-scale deep learning models, finding configurations that optimize performance, GPU usage, and efficiency, with results indicating a break-even point to reduce training times while maximizing efficiency.

Training large-scale deep learning models has become a key challenge for the scientific community and industry. While the massive use of GPUs can significantly speed up training times, this approach has a negative impact on efficiency. In this article, we present a detailed analysis of the times reported by MLPerf Training v4.1 on four workloads: BERT, Llama2 LoRA, RetinaNet, and Stable Diffusion, showing that there are configurations that optimise the relationship between performance, GPU usage, and efficiency. The results point to a break-even point that allows training times to be reduced while maximising efficiency.

View on arXiv PDF

Similar