Jarod Lévy

LG
h-index13
4papers
29citations
Novelty46%
AI Score44

4 Papers

LGMay 8Code
NeuralBench: A Unifying Framework to Benchmark NeuroAI Models

Hubert Banville, Stéphane d'Ascoli, Simon Dahan et al.

Deep learning and large public datasets have recently catalyzed the proliferation of AI models for processing brain recordings. However, systematically evaluating these models remains a challenge: not only do the preprocessing pipelines, training and finetuning approaches largely vary across studies, but their downstream evaluation is often limited to small sets of tasks and/or datasets. Here, we present NeuralBench: a unified framework for benchmarking AI models of brain activity. We accompany this framework with NeuralBench-EEG v1.0 -- a large EEG benchmark that includes 36 electroencephalography (EEG) tasks and 14 deep learning architectures, and is evaluated on 94 datasets accessed through a standardized interface. This first EEG-focused release already highlights two main findings. First, current foundation models only marginally outperform task-specific models. Second, a large set of tasks (e.g. cognitive decoding, clinical predictions) remain highly challenging, even for the best models. Critically, NeuralBench is designed for the integration of new tasks, datasets, models, and neuroimaging modalities, as illustrated by preliminary extensions to MEG and fMRI datasets and models. Through this white paper, we invite the community to expand this open-source framework and work together toward a unified benchmarking standard for neuroimaging models.

LGMay 11
DANCE: Detect and Classify Events in EEG

Jarod Lévy, Hubert Banville, Jérémy Rapin et al.

Event identification in continuous neural recordings is a critical task in neuroscience. Decoding in EEG is dominated by classifying windows aligned to known event onsets. However, while available in controlled experiments, such onsets are absent in continuous real-world monitoring. Here, we introduce DANCE, a deep learning pipeline that frames neural decoding as a set-prediction problem and jointly detects and classifies events directly from raw, unaligned signals. Evaluated separately on ten datasets curated from the literature with a wide variety of event types (ranging from milliseconds to minutes in duration), our model outperforms existing methods on a broad range of cognitive, clinical and BCI tasks. This single architecture establishes a new state of the art in the competitive task of seizure monitoring and matches the accuracy of onset-informed models for BCI tasks. Overall, our method marks a step towards end-to-end asynchronous neural decoding models

SPFeb 18, 2025
Brain-to-Text Decoding: A Non-invasive Approach via Typing

Jarod Lévy, Mingfang Zhang, Svetlana Pinet et al.

Modern neuroprostheses can now restore communication in patients who have lost the ability to speak or move. However, these invasive devices entail risks inherent to neurosurgery. Here, we introduce a non-invasive method to decode the production of sentences from brain activity and demonstrate its efficacy in a cohort of 35 healthy volunteers. For this, we present Brain2Qwerty, a new deep learning architecture trained to decode sentences from either electro- (EEG) or magneto-encephalography (MEG), while participants typed briefly memorized sentences on a QWERTY keyboard. With MEG, Brain2Qwerty reaches, on average, a character-error-rate (CER) of 32% and substantially outperforms EEG (CER: 67%). For the best participants, the model achieves a CER of 19%, and can perfectly decode a variety of sentences outside of the training set. While error analyses suggest that decoding depends on motor processes, the analysis of typographical errors suggests that it also involves higher-level cognitive factors. Overall, these results narrow the gap between invasive and non-invasive methods and thus open the path for developing safe brain-computer interfaces for non-communicating patients.

AIDec 20, 2024
Benchmarking LLMs and SLMs for patient reported outcomes

Matteo Marengo, Jarod Lévy, Jean-Emmanuel Bibault

LLMs have transformed the execution of numerous tasks, including those in the medical domain. Among these, summarizing patient-reported outcomes (PROs) into concise natural language reports is of particular interest to clinicians, as it enables them to focus on critical patient concerns and spend more time in meaningful discussions. While existing work with LLMs like GPT-4 has shown impressive results, real breakthroughs could arise from leveraging SLMs as they offer the advantage of being deployable locally, ensuring patient data privacy and compliance with healthcare regulations. This study benchmarks several SLMs against LLMs for summarizing patient-reported Q\&A forms in the context of radiotherapy. Using various metrics, we evaluate their precision and reliability. The findings highlight both the promise and limitations of SLMs for high-stakes medical tasks, fostering more efficient and privacy-preserving AI-driven healthcare solutions.