Contrast Is All You Need
This work addresses data-scarce classification in the legal domain, offering an incremental improvement for practitioners by enhancing model efficiency and interpretability.
The study tackled legal provision classification with scarce and imbalanced labeled data by comparing contrastive learning (SetFit) to vanilla fine-tuning, finding that SetFit performed better using fewer training samples and based decisions more confidently on legally informative features.
In this study, we analyze data-scarce classification scenarios, where available labeled legal data is small and imbalanced, potentially hurting the quality of the results. We focused on two finetuning objectives; SetFit (Sentence Transformer Finetuning), a contrastive learning setup, and a vanilla finetuning setup on a legal provision classification task. Additionally, we compare the features that are extracted with LIME (Local Interpretable Model-agnostic Explanations) to see which particular features contributed to the model's classification decisions. The results show that a contrastive setup with SetFit performed better than vanilla finetuning while using a fraction of the training samples. LIME results show that the contrastive learning approach helps boost both positive and negative features which are legally informative and contribute to the classification results. Thus a model finetuned with a contrastive objective seems to base its decisions more confidently on legally informative features.