GATS: Gather-Attend-Scatter
This addresses the need for flexible tools to combine foundation models in AI systems, though it appears incremental as it builds on existing integration methods.
The paper tackles the problem of integrating large-scale pretrained models into multimodal networks by introducing GATS, a module that enables seamless combination of frozen or trainable models, allowing processing and generation across modalities at different rates without fine-tuning.
As the AI community increasingly adopts large-scale models, it is crucial to develop general and flexible tools to integrate them. We introduce Gather-Attend-Scatter (GATS), a novel module that enables seamless combination of pretrained foundation models, both trainable and frozen, into larger multimodal networks. GATS empowers AI systems to process and generate information across multiple modalities at different rates. In contrast to traditional fine-tuning, GATS allows for the original component models to remain frozen, avoiding the risk of them losing important knowledge acquired during the pretraining phase. We demonstrate the utility and versatility of GATS with a few experiments across games, robotics, and multimodal input-output systems.