Chengyu Duan

LGAug 30, 2024

MoRe Fine-Tuning with 10x Fewer Parameters

Wenxuan Tan, Nicholas Roberts, Tzu-Heng Huang et al.

Parameter-efficient fine-tuning (PEFT) techniques have unlocked the potential to cheaply and easily specialize large pretrained models. However, the most prominent approaches, like low-rank adapters (LoRA), depend on heuristics or rules-of-thumb for their architectural choices -- potentially limiting their performance for new models and architectures. This limitation suggests that techniques from neural architecture search could be used to obtain optimal adapter architectures, but these are often expensive and difficult to implement. We address this challenge with Monarch Rectangular Fine-tuning (MoRe), a simple framework to search over adapter architectures that relies on the Monarch matrix class. Theoretically, we show that MoRe is more expressive than LoRA. Empirically, our approach is more parameter-efficient and performant than state-of-the-art PEFTs on a range of tasks and models, with as few as 5\% of LoRA's parameters.

LGJun 2, 2024

Pretrained Hybrids with MAD Skills

Nicholas Roberts, Samuel Guo, Zhiqi Gao et al.

While Transformers underpin modern large language models (LMs), there is a growing list of alternative architectures with new capabilities, promises, and tradeoffs. This makes choosing the right LM architecture challenging. Recently proposed hybrid architectures seek a best-of-all-worlds approach that reaps the benefits of all architectures. Hybrid design is difficult for two reasons: it requires manual expert-driven search, and new hybrids must be trained from scratch. We propose Manticore, a framework that addresses these challenges by automating the design of hybrid architectures while reusing pretrained models to create pretrained hybrids. Our approach augments ideas from differentiable Neural Architecture Search (NAS) by incorporating simple projectors that translate features between pretrained blocks from different architectures. We then fine-tune hybrids that combine pretrained models from different architecture families -- such as the GPT series and Mamba -- end-to-end. With Manticore, we enable LM selection without training multiple models, the construction of pretrained hybrids from existing pretrained models, and the ability to program pretrained hybrids to have certain capabilities. Manticore hybrids match existing manually designed hybrids, achieve strong performance on Long Range Arena, and improve on pretrained transformers and state space models on various natural language tasks.

Chengyu Duan

2 Papers