DoubleCCA: Improving Foundation Model Group Robustness with Random Sentence Embeddings
This addresses robustness issues in foundation models for AI applications, though it appears incremental as it builds on existing embedding and CCA techniques.
The paper tackles group-based biases in foundation models by proposing DoubleCCA, a method that uses random sentence embeddings and Canonical Correlation Analysis to enrich text embeddings, resulting in improved performance and robustness across various tasks and datasets.
This paper presents a novel method to improve the robustness of foundation models to group-based biases. We propose a simple yet effective method, called DoubleCCA, that leverages random sentences and Canonical Correlation Analysis (CCA) to enrich the text embeddings of the foundation model. First, we generate various random sentences that augment the original prompts, which extends the original prompts with random words or character sequences. Second, we use an additional sentence embedding model to generate different text embeddings with respect to these random sentences. We then use CCA double twice to align the representations and reconstruct them back to the original representation space. We demonstrate the effectiveness of our method on a variety of tasks and datasets, showing that it outperforms existing methods in terms of both performance and robustness. Our method is simple to implement and can be easily integrated into existing models, making it a practical solution for improving the robustness of foundation models to group-based biases.