LG AI OCApr 24, 2025

A multilevel approach to accelerate the training of Transformers

Guillaume Lauga, Maël Chaumette, Edgar Desainte-Maréville, Étienne Lasalle, Arthur Lebeurrier

arXiv:2504.18590v14.1h-index: 3

Originality Incremental advance

AI Analysis

This work addresses training speed issues for researchers and practitioners using Transformers, but appears incremental as it builds on existing ODE interpretations.

The paper tackled the problem of slow training for Transformers by proposing a multilevel approach based on an ODE interpretation to vary discretization, and experimentally validated it against standard training.

In this article, we investigate the potential of multilevel approaches to accelerate the training of transformer architectures. Using an ordinary differential equation (ODE) interpretation of these architectures, we propose an appropriate way of varying the discretization of these ODE Transformers in order to accelerate the training. We validate our approach experimentally by a comparison with the standard training procedure.

View on arXiv PDF

Similar