The State of Commercial Automatic French Legal Speech Recognition Systems and their Impact on Court Reporters et al
This addresses the problem of high costs and limited availability of court reporters for transcribing legal proceedings in Quebec and Canada, but it is incremental as it focuses on evaluating existing ASR systems rather than proposing new methods.
The paper benchmarks three Automatic Speech Recognition (ASR) models on French legal speech, finding that while they show promise, they require further refinement to meet legal domain needs, with performance evaluated using Word Error Rate (WER) and a new Sonnex Distance metric.
In Quebec and Canadian courts, the transcription of court proceedings is a critical task for appeal purposes and must be certified by an official court reporter. The limited availability of qualified reporters and the high costs associated with manual transcription underscore the need for more efficient solutions. This paper examines the potential of Automatic Speech Recognition (ASR) systems to assist court reporters in transcribing legal proceedings. We benchmark three ASR models, including commercial and open-source options, on their ability to recognize French legal speech using a curated dataset. Our study evaluates the performance of these systems using the Word Error Rate (WER) metric and introduces the Sonnex Distance to account for phonetic accuracy. We also explore the broader implications of ASR adoption on court reporters, copyists, the legal system, and litigants, identifying both positive and negative impacts. The findings suggest that while current ASR systems show promise, they require further refinement to meet the specific needs of the legal domain.