Tim Bary

LG
h-index36
4papers
2citations
Novelty53%
AI Score39

4 Papers

LGOct 4, 2025Code
Optimizing Resources for On-the-Fly Label Estimation with Multiple Unknown Medical Experts

Tim Bary, Tiffanie Godelaine, Axel Abels et al.

Accurate ground truth estimation in medical screening programs often relies on coalitions of experts and peer second opinions. Algorithms that efficiently aggregate noisy annotations can enhance screening workflows, particularly when data arrive continuously and expert proficiency is initially unknown. However, existing algorithms do not meet the requirements for seamless integration into screening pipelines. We therefore propose an adaptive approach for real-time annotation that (I) supports on-the-fly labeling of incoming data, (II) operates without prior knowledge of medical experts or pre-labeled data, and (III) dynamically queries additional experts based on the latent difficulty of each instance. The method incrementally gathers expert opinions until a confidence threshold is met, providing accurate labels with reduced annotation overhead. We evaluate our approach on three multi-annotator classification datasets across different modalities. Results show that our adaptive querying strategy reduces the number of expert queries by up to 50% while achieving accuracy comparable to a non-adaptive baseline. Our code is available at https://github.com/tbary/MEDICS

SPSep 23, 2024
Designing Pre-training Datasets from Unlabeled Data for EEG Classification with Transformers

Tim Bary, Benoit Macq

Transformer neural networks require a large amount of labeled data to train effectively. Such data is often scarce in electroencephalography, as annotations made by medical experts are costly. This is why self-supervised training, using unlabeled data, has to be performed beforehand. In this paper, we present a way to design several labeled datasets from unlabeled electroencephalogram (EEG) data. These can then be used to pre-train transformers to learn representations of EEG signals. We tested this method on an epileptic seizure forecasting task on the Temple University Seizure Detection Corpus using a Multi-channel Vision Transformer. Our results suggest that 1) Models pre-trained using our approach demonstrate significantly faster training times, reducing fine-tuning duration by more than 50% for the specific task, and 2) Pre-trained models exhibit improved accuracy, with an increase from 90.93% to 92.16%, as well as a higher AUC, rising from 0.9648 to 0.9702 when compared to non-pre-trained models.

LGSep 16, 2025
No Need for Learning to Defer? A Training Free Deferral Framework to Multiple Experts through Conformal Prediction

Tim Bary, Benoît Macq, Louis Petit

AI systems often fail to deliver reliable predictions across all inputs, prompting the need for hybrid human-AI decision-making. Existing Learning to Defer (L2D) approaches address this by training deferral models, but these are sensitive to changes in expert composition and require significant retraining if experts change. We propose a training-free, model- and expert-agnostic framework for expert deferral based on conformal prediction. Our method uses the prediction set generated by a conformal predictor to identify label-specific uncertainty and selects the most discriminative expert using a segregativity criterion, measuring how well an expert distinguishes between the remaining plausible labels. Experiments on CIFAR10-H and ImageNet16-H show that our method consistently outperforms both the standalone model and the strongest expert, with accuracies attaining $99.57\pm0.10\%$ and $99.40\pm0.52\%$, while reducing expert workload by up to a factor of $11$. The method remains robust under degraded expert performance and shows a gradual performance drop in low-information settings. These results suggest a scalable, retraining-free alternative to L2D for real-world human-AI collaboration.

CVSep 24, 2025
Data-Efficient Stream-Based Active Distillation for Scalable Edge Model Deployment

Dani Manjah, Tim Bary, Benoît Gérin et al.

Edge camera-based systems are continuously expanding, facing ever-evolving environments that require regular model updates. In practice, complex teacher models are run on a central server to annotate data, which is then used to train smaller models tailored to the edge devices with limited computational power. This work explores how to select the most useful images for training to maximize model quality while keeping transmission costs low. Our work shows that, for a similar training load (i.e., iterations), a high-confidence stream-based strategy coupled with a diversity-based approach produces a high-quality model with minimal dataset queries.