CCoE: A Compact and Efficient LLM Framework with Multi-Expert Collaboration for Resource-Limited Settings
This provides a resource-efficient solution for deploying multi-domain LLMs in resource-limited settings, though it is incremental as it builds on existing multi-expert and modular approaches.
The paper tackles the challenge of scaling large language models for multiple downstream domains under resource constraints by introducing the CCoE architecture, which integrates domain-specific experts into a unified model, achieving comparable performance to domain-specific models while reducing memory usage by 61.3% and improving inference efficiency by 0.76x.
Large Language Models (LLMs) have achieved exceptional performance across diverse domains through training on massive datasets. However, scaling LLMs to support multiple downstream domain applications remains a significant challenge, especially under resource constraints. Existing approaches often struggle to balance performance across multiple domains with resource efficiency, limiting their broader applicability. To address this, we introduce the CCoE architecture, a modular framework that seamlessly integrates domain-specific experts into a unified LLM. By leveraging independently trained expert subnetworks on a shared backbone partition, CCoE achieves state-of-the-art performance while significantly reducing the resource requirements for multi-expert deployments. Furthermore, rule-based gating and expert planning in CCoE enable flexible task allocation, promoting expert collaboration to handle complex reasoning tasks. CCoE not only reduces inference costs but also provides a flexible and scalable solution for integrating domain expertise across diverse applications. Experiments on five domains demonstrate that CCoE achieves comparable performance to current domain-specific LLMs. Moreover, compared to existing multi-domain model ensemble methods, CCoE reduces memory usage by 61.3%, while improving inference efficiency by 0.76x over parameter-efficient multi-expert integration approaches.