Bita Behrouzi

h-index59
2papers

2 Papers

AISep 15, 2025
Advancing Medical Artificial Intelligence Using a Century of Cases

Thomas A. Buckley, Riccardo Conci, Peter G. Brodeur et al.

BACKGROUND: For over a century, the New England Journal of Medicine Clinicopathological Conferences (CPCs) have tested the reasoning of expert physicians and, recently, artificial intelligence (AI). However, prior AI evaluations have focused on final diagnoses without addressing the multifaceted reasoning and presentation skills required of expert discussants. METHODS: Using 7102 CPCs (1923-2025) and 1021 Image Challenges (2006-2025), we conducted extensive physician annotation and automated processing to create CPC-Bench, a physician-validated benchmark spanning 10 text-based and multimodal tasks, against which we evaluated leading large language models (LLMs). Then, we developed "Dr. CaBot," an AI discussant designed to produce written and slide-based video presentations using only the case presentation, modeling the role of the human expert in these cases. RESULTS: When challenged with 377 contemporary CPCs, o3 (OpenAI) ranked the final diagnosis first in 60% of cases and within the top ten in 84% of cases, outperforming a 20-physician baseline; next-test selection accuracy reached 98%. Event-level physician annotations quantified AI diagnostic accuracy per unit of information. Performance was lower on literature search and image tasks; o3 and Gemini 2.5 Pro (Google) achieved 67% accuracy on image challenges. In blinded comparisons of CaBot vs. human expert-generated text, physicians misclassified the source of the differential in 46 of 62 (74%) of trials, and scored CaBot more favorably across quality dimensions. To promote research, we are releasing CaBot and CPC-Bench. CONCLUSIONS: LLMs exceed physician performance on complex text-based differential diagnosis and convincingly emulate expert medical presentations, but image interpretation and literature retrieval remain weaker. CPC-Bench and CaBot may enable transparent and continued tracking of progress in medical AI.

LGNov 15, 2017
Costate-focused models for reinforcement learning

Bita Behrouzi, Xuefei Liu, Douglas Tweed

Many recent algorithms for reinforcement learning are model-free and founded on the Bellman equation. Here we present a method founded on the costate equation and models of the state dynamics. We use the costate -- the gradient of cost with respect to state -- to improve the policy and also to "focus" the model, training it to detect and mimic those features of the environment that are most relevant to its task. We show that this method can handle difficult time-optimal control problems, driving deterministic or stochastic mechanical systems quickly to a target. On these tasks it works well compared to deep deterministic policy gradient, a recent Bellman method. And because it creates a model, the costate method can also learn from mental practice.