AIJul 11, 2024

SoupLM: Model Integration in Large Language and Multi-Modal Models

arXiv:2407.08196v11 citationsh-index: 13
Originality Incremental advance
AI Analysis

This work addresses the resource-intensive training problem for AI researchers and practitioners, but it is incremental as it builds on existing models with a novel assembly strategy.

The authors tackled the high computational cost and complexity of training multiple large language model (LLM) variants by proposing SoupLM, a method to assemble existing LLMs like LLaMA, Vicuna, and LLaVA into a single multimodal model, achieving cost-efficient integration without repetitive training.

Training large language models (LLMs) and multimodal LLMs necessitates significant computing resources, and existing publicly available LLMs are typically pre-trained on diverse, privately curated datasets spanning various tasks. For instance, LLaMA, Vicuna, and LLaVA are three LLM variants trained with LLaMA base models using very different training recipes, tasks, and data modalities. The training cost and complexity for such LLM variants grow rapidly. In this study, we propose to use a soup strategy to assemble these LLM variants into a single well-generalized multimodal LLM (SoupLM) in a cost-efficient manner. Assembling these LLM variants efficiently brings knowledge and specialities trained from different domains and data modalities into an integrated one (e.g., chatbot speciality from user-shared conversations for Vicuna, and visual capacity from vision-language data for LLaVA), therefore, to avoid computing costs of repetitive training on several different domains. We propose series of soup strategies to systematically benchmark performance gains across various configurations, and probe the soup behavior across base models in the interpolation space.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes