Nikolai Solmsdorf

2.5DCMay 8

A Scalable Recipe on SuperMUC-NG Phase 2: Efficient Large-Scale Training of Language Models

Ajay Navilarekal Rajgopal, Nikolai Solmsdorf

Large Language Models (LLMs) continue to demonstrate superior performance with increasing scale, yet training models with billions to trillions of parameters requires staggering computational resources, e.g. a one-trillion-parameter GPT-style model requires an estimated 120 million exaflops. This challenge necessitates efficient distributed training strategies on cutting-edge High-Performance Computing (HPC) infrastructure. In this work, we explore the SuperMUC-NG Phase 2 (SMNG-P2) system at the Leibniz Supercomputing Centre (LRZ) in Garching, Germany, equipped with Intel Data Center GPU Max 1550 accelerators to extract the necessary computational power. We enable and investigate a comprehensive recipe of parallel training techniques, including tensor parallelism, pipeline parallelism, and sharded data parallelism, essential for facilitating the training of LLMs up to 175 billion-parameter scale on SMNG-P2. Through empirical assessment and extensive hyperparameter tuning, we analyze the complex interplay among these techniques and determine their impact on GPU computational efficiency. We identify an optimized combined strategy that yields high throughput and enables the efficient training of LLMs of varying sizes. Specifically, for the 175B model, we achieved per-tile throughput of 10% of theoretical peak per-tile bf16 FLOPs, employing an out-of-the-box publicly available software stack, utilizing standard distributions without further modification. This approach ensures broad accessibility, as our methodology can be replicated by any user on SMNG-P2 system without need for porting or specialized software engineering. Furthermore, we achieved 93% weak scaling efficiency and strong scaling efficiency of 82% on 128 nodes. This scalable recipe provides a crucial blueprint for efficiently utilizing advanced exascale systems for next-generation foundational model development.

CLSep 28, 2021

Active Learning for Argument Mining: A Practical Approach

Nikolai Solmsdorf, Dietrich Trautmann, Hinrich Schütze

Despite considerable recent progress, the creation of well-balanced and diverse resources remains a time-consuming and costly challenge in Argument Mining. Active Learning reduces the amount of data necessary for the training of machine learning models by querying the most informative samples for annotation and therefore is a promising method for resource creation. In a large scale comparison of several Active Learning methods, we show that Active Learning considerably decreases the effort necessary to get good deep learning performance on the task of Argument Unit Recognition and Classification (AURC).

Nikolai Solmsdorf

2 Papers