Functionality learning through specification instructions
This work addresses the challenge of fine-tuning language models for targeted capabilities without negative impacts on other functionalities, which is incremental in applying instruction-based methods to model optimization.
The paper tackles the problem of improving language models' performance on specific functionalities like robustness and fairness by introducing specification instructions, and finds that larger models (>3B parameters) can benefit from and generalize these behaviors, while smaller models struggle.
Test suites assess natural language processing models' performance on specific functionalities: cases of interest involving model robustness, fairness, or particular linguistic capabilities. This paper introduces specification instructions: text descriptions specifying fine-grained task-specific behaviors. For each functionality in a suite, we generate an instruction that describes it. We combine the specification instructions to create specification-augmented prompts, which we feed to language models pre-trained on natural instruction data. We conduct experiments to measure how optimizing for some functionalities may negatively impact functionalities that are not covered by the specification set. Our analyses across four tasks and models of diverse sizes and families show that smaller models struggle to follow specification instructions. However, larger models (>~3B params.) can benefit from specifications and -- surprisingly -- even generalize certain desirable behaviors across functionalities.