Efficient Multitask Learning in Small Language Models Through Upside-Down Reinforcement Learning
This provides an efficient alternative to large language models for resource-constrained and real-time applications, though it is incremental as it builds on existing methods like reinforcement learning and data distillation.
The paper tackled the problem of enabling small language models (SLMs) to perform competitively in multitask prompt generation tasks with reduced computational resources, achieving relevance scores within 5% of state-of-the-art models while being up to 80 times smaller.
In this work, we demonstrate that small language models (SLMs), specifically a 100M parameter GPT-2 model, can achieve competitive performance in multitask prompt generation tasks while requiring only a fraction of the computational resources needed by large language models (LLMs). Through a novel combination of upside-down reinforcement learning and synthetic data distillation from a powerful LLM, Llama-3, we train an SLM that achieves relevance scores within 5% of state-of-the-art models, including Llama-3, Qwen2, and Mistral, despite being up to 80 times smaller, making it highly suitable for resource-constrained and real-time applications. This study highlights the potential of SLMs as efficient multitask learners in multimodal settings, providing a promising alternative to LLMs for scalable, low-latency deployments.