CLJun 7, 2024

The Russian Legislative Corpus

arXiv:2406.04855v2
AI Analysis

This provides a large-scale dataset for legal and linguistic research on Russian law, but it is incremental as it applies existing corpus-building methods to new data.

The researchers compiled a comprehensive corpus of Russian legislation from 1991 to 2023, containing 281,413 texts and 176,523,268 tokens, with two versions for raw and linguistic analysis.

We present the comprehensive Russian primary and secondary legislation corpus covering 1991 to 2023. The corpus collects all 281,413 texts (176,523,268 tokens) of non-secret federal regulations and acts, along with their metadata. The corpus has two versions the original text with minimal preprocessing and a version prepared for linguistic analysis with morphosyntactic markup.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes