CLAIApr 8

Luwen Technical Report

arXiv:2604.0673782.71 citationsh-index: 5Has Code
AI Analysis

This work addresses the problem of adapting general-purpose language models to the specialized legal domain, representing an incremental improvement with domain-specific applications.

The paper tackles the challenge of applying large language models to the legal domain by developing Luwen, an open-source Chinese legal language model, which outperforms strong baselines on five legal tasks, including legal judgment prediction and judicial examination.

Large language models have demonstrated remarkable capabilities across a wide range of natural language processing tasks, yet their application in the legal domain remains challenging due to the specialized terminology, complex reasoning requirements, and rapidly evolving legal knowledge involved. In this paper, we present Luwen, an open-source Chinese legal language model built upon the Baichuan foundation model through three key techniques: continual pre-training on a large-scale legal corpus, supervised fine-tuning with carefully curated legal instruction data, and retrieval-augmented generation integrated with a comprehensive legal knowledge base. We evaluate Luwen on five representative legal tasks spanning both prediction and generation settings, including legal judgment prediction, judicial examination, legal text summarization, law article question answering, and judicial decision reasoning. Experimental results show that Luwen outperforms several strong baselines, demonstrating the effectiveness of our approach in adapting general-purpose language models to the legal domain.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes