Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning
This addresses safety generalization for multilingual LLMs, though it appears incremental as it builds on existing merging techniques.
The paper tackled the problem of ensuring safe use of Large Language Models in multilingual settings by exploring model merging approaches for diverse multi-task learning, finding that objective-based merging improved general performance by up to 8% and safety by 10%, and language-based merging added further gains of 4% in performance and 7% harm reduction.
Large Language Models (LLMs) have been adopted and deployed worldwide for a broad variety of applications. However, ensuring their safe use remains a significant challenge. Preference training and safety measures often overfit to harms prevalent in Western-centric datasets, and safety protocols frequently fail to extend to multilingual settings. In this work, we explore model merging in a diverse multi-task setting, combining safety and general-purpose tasks within a multilingual context. Each language introduces unique and varied learning challenges across tasks. We find that objective-based merging is more effective than mixing data, with improvements of up to 8% and 10% in general performance and safety respectively. We also find that language-based merging is highly effective -- by merging monolingually fine-tuned models, we achieve a 4% increase in general performance and 7% reduction in harm across all languages on top of the data mixtures method using the same available data. Overall, our comprehensive study of merging approaches provides a useful framework for building strong and safe multilingual models.