IRAIJan 9, 2025

Finding Needles in Emb(a)dding Haystacks: Legal Document Retrieval via Bagging and SVR Ensembles

arXiv:2501.05018v11 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses legal information retrieval for practitioners, but it is incremental as it builds on existing methods without major breakthroughs.

The paper tackled legal document retrieval by using Support Vector Regression ensembles and bagging on the German Dataset for Legal Information Retrieval, achieving improved recall of 0.849 compared to baselines of 0.803 and 0.829 without training deep learning models.

We introduce a retrieval approach leveraging Support Vector Regression (SVR) ensembles, bootstrap aggregation (bagging), and embedding spaces on the German Dataset for Legal Information Retrieval (GerDaLIR). By conceptualizing the retrieval task in terms of multiple binary needle-in-a-haystack subtasks, we show improved recall over the baselines (0.849 > 0.803 | 0.829) using our voting ensemble, suggesting promising initial results, without training or fine-tuning any deep learning models. Our approach holds potential for further enhancement, particularly through refining the encoding models and optimizing hyperparameters.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes