Laura Stegner

CYJan 15

The Conversational Exam: A Scalable Assessment Design for the AI Era

Lorena A. Barba, Laura Stegner

Traditional assessment methods collapse when students use generative AI to complete work without genuine engagement, creating an illusion of competence where they believe they're learning but aren't. This paper presents the conversational exam -- a scalable oral examination format that restores assessment validity by having students code live while explaining their reasoning. Drawing on human-computer interaction principles, we examined 58 students in small groups across just two days, demonstrating that oral exams can scale to typical class sizes. The format combines authentic practice (students work with documentation and supervised AI access) with inherent validity (real-time performance cannot be faked). We provide detailed implementation guidance to help instructors adapt this approach, offering a practical path forward when many educators feel paralyzed between banning AI entirely or accepting that valid assessment is impossible.

SEFeb 4, 2019

Paracosm: A Language and Tool for Testing Autonomous Driving Systems

Rupak Majumdar, Aman Mathur, Marcus Pirron et al.

Systematic testing of autonomous vehicles operating in complex real-world scenarios is a difficult and expensive problem. We present Paracosm, a reactive language for writing test scenarios for autonomous driving systems. Paracosm allows users to programmatically describe complex driving situations with specific visual features, e.g., road layout in an urban environment, as well as reactive temporal behaviors of cars and pedestrians. Paracosm programs are executed on top of a game engine that provides realistic physics simulation and visual rendering. The infrastructure allows systematic exploration of the state space, both for visual features (lighting, shadows, fog) and for reactive interactions with the environment (pedestrians, other traffic). We define a notion of test coverage for Paracosm configurations based on combinatorial testing and low dispersion sequences. Paracosm comes with an automatic test case generator that uses random sampling for discrete parameters and deterministic quasi-Monte Carlo generation for continuous parameters. Through an empirical evaluation, we demonstrate the modeling and testing capabilities of Paracosm on a suite of autonomous driving systems implemented using deep neural networks developed in research and education. We show how Paracosm can expose incorrect behaviors or degraded performance.

Laura Stegner

2 Papers