LGAIDCNov 26, 2025

GPU Memory Prediction for Multimodal Model Training

arXiv:2512.07853v1
Originality Incremental advance
AI Analysis

This addresses GPU memory prediction for multimodal models in agentic AI systems, which is an incremental improvement over existing unimodal methods.

The paper tackles the problem of GPU out-of-memory errors during training by predicting peak GPU memory usage for multimodal models, achieving an average MAPE of ~8.7%.

As deep learning models in agentic AI systems grow in scale and complexity, GPU memory requirements increase and often exceed the available GPU memory capacity, so that out-of-memory (OoM) errors occur. It is well known that OoM interrupts the whole training itself and wastes substantial computational resources. Therefore, to prevent OoM, accurate prediction of GPU memory usage is essential. However, previous studies focus only on unimodal architectures and fail to generalize to multimodal models, even though the multimodal models are a common choice in agentic AI systems. To address this limitation, we propose a framework that predicts the peak GPU memory usage by analyzing the model architecture and training behavior of multimodal models. Specifically, the framework decomposes the multimodal model into its constituent layers and applies factorization to estimate the memory usage of each layer. Our evaluation shows that our framework achieves high prediction accuracy of ~8.7% average MAPE.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes