CLOct 15, 2024

A Cross-Lingual Statutory Article Retrieval Dataset for Taiwan Legal Studies

arXiv:2410.11450v1h-index: 2
Originality Synthesis-oriented
AI Analysis

This work addresses improving legal information access for non-native speakers in Taiwan, but it is incremental as it focuses on dataset creation and baseline methods.

The paper tackles the problem of legal information retrieval in multilingual settings by introducing a cross-lingual statutory article retrieval dataset for Taiwanese laws, with spoken-language-style inquiries in English and Chinese, and provides LLM-based baselines for evaluation.

This paper introduces a cross-lingual statutory article retrieval (SAR) dataset designed to enhance legal information retrieval in multilingual settings. Our dataset features spoken-language-style legal inquiries in English, paired with corresponding Chinese versions and relevant statutes, covering all Taiwanese civil, criminal, and administrative laws. This dataset aims to improve access to legal information for non-native speakers, particularly for foreign nationals in Taiwan. We propose several LLM-based methods as baselines for evaluating retrieval effectiveness, focusing on mitigating translation errors and improving cross-lingual retrieval performance. Our work provides a valuable resource for developing inclusive legal information retrieval systems.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes