CLAIMar 5

AI-Assisted Moot Courts: Simulating Justice-Specific Questioning in Oral Arguments

arXiv:2603.04718v1
Originality Incremental advance
AI Analysis

This research addresses the problem of effectively preparing law students and practicing attorneys for judicial questioning in oral arguments, offering an AI-assisted training tool. It is an incremental step in applying AI to legal education.

This paper explores the use of AI models to simulate justice-specific questioning for moot court training, leveraging a dataset of U.S. Supreme Court oral argument transcripts. The authors found that simulated questions were often perceived as realistic by human annotators and achieved high recall of ground truth substantive legal issues, despite limitations in question diversity and a tendency towards sycophancy.

In oral arguments, judges probe attorneys with questions about the factual record, legal claims, and the strength of their arguments. To prepare for this questioning, both law schools and practicing attorneys rely on moot courts: practice simulations of appellate hearings. Leveraging a dataset of U.S. Supreme Court oral argument transcripts, we examine whether AI models can effectively simulate justice-specific questioning for moot court-style training. Evaluating oral argument simulation is challenging because there is no single correct question for any given turn. Instead, effective questioning should reflect a combination of desirable qualities, such as anticipating substantive legal issues, detecting logical weaknesses, and maintaining an appropriately adversarial tone. We introduce a two-layer evaluation framework that assesses both the realism and pedagogical usefulness of simulated questions using complementary proxy metrics. We construct and evaluate both prompt-based and agentic oral argument simulators. We find that simulated questions are often perceived as realistic by human annotators and achieve high recall of ground truth substantive legal issues. However, models still face substantial shortcomings, including low diversity in question types and sycophancy. Importantly, these shortcomings would remain undetected under naive evaluation approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes