Sebastian Otalora

CV
h-index4
3papers
5citations
Novelty42%
AI Score38

3 Papers

CVOct 22, 2025Code
XBench: A Comprehensive Benchmark for Visual-Language Explanations in Chest Radiography

Haozhe Luo, Shelley Zixin Shu, Ziyu Zhou et al.

Vision-language models (VLMs) have recently shown remarkable zero-shot performance in medical image understanding, yet their grounding ability, the extent to which textual concepts align with visual evidence, remains underexplored. In the medical domain, however, reliable grounding is essential for interpretability and clinical adoption. In this work, we present the first systematic benchmark for evaluating cross-modal interpretability in chest X-rays across seven CLIP-style VLM variants. We generate visual explanations using cross-attention and similarity-based localization maps, and quantitatively assess their alignment with radiologist-annotated regions across multiple pathologies. Our analysis reveals that: (1) while all VLM variants demonstrate reasonable localization for large and well-defined pathologies, their performance substantially degrades for small or diffuse lesions; (2) models that are pretrained on chest X-ray-specific datasets exhibit improved alignment compared to those trained on general-domain data. (3) The overall recognition ability and grounding ability of the model are strongly correlated. These findings underscore that current VLMs, despite their strong recognition ability, still fall short in clinically reliable grounding, highlighting the need for targeted interpretability benchmarks before deployment in medical practice. XBench code is available at https://github.com/Roypic/Benchmarkingattention

CVAug 4, 2020Code
Learning Interpretable Microscopic Features of Tumor by Multi-task Adversarial CNNs To Improve Generalization

Mara Graziani, Sebastian Otalora, Stephane Marchand-Maillet et al.

Adopting Convolutional Neural Networks (CNNs) in the daily routine of primary diagnosis requires not only near-perfect precision, but also a sufficient degree of generalization to data acquisition shifts and transparency. Existing CNN models act as black boxes, not ensuring to the physicians that important diagnostic features are used by the model. Building on top of successfully existing techniques such as multi-task learning, domain adversarial training and concept-based interpretability, this paper addresses the challenge of introducing diagnostic factors in the training objectives. Here we show that our architecture, by learning end-to-end an uncertainty-based weighting combination of multi-task and adversarial losses, is encouraged to focus on pathology features such as density and pleomorphism of nuclei, e.g. variations in size and appearance, while discarding misleading features such as staining differences. Our results on breast lymph node tissue show significantly improved generalization in the detection of tumorous tissue, with best average AUC 0.89 (0.01) against the baseline AUC 0.86 (0.005). By applying the interpretability technique of linearly probing intermediate representations, we also demonstrate that interpretable pathology features such as nuclei density are learned by the proposed CNN architecture, confirming the increased transparency of this model. This result is a starting point towards building interpretable multi-task architectures that are robust to data heterogeneity. Our code is available at https://github.com/maragraziani/multitask_adversarial

NIJun 11, 2019
DeepFloat: Resource-Efficient Dynamic Management of Vehicular Floating Content

Gaetano Manzo, Sebastian Otalora, Marco Ajmone Marsan et al.

Opportunistic communications are expected to playa crucial role in enabling context-aware vehicular services. A widely investigated opportunistic communication paradigm for storing a piece of content probabilistically in a geographica larea is Floating Content (FC). A key issue in the practical deployment of FC is how to tune content replication and caching in a way which achieves a target performance (in terms of the mean fraction of users possessing the content in a given region of space) while minimizing the use of bandwidth and host memory. Fully distributed, distance-based approaches prove highly inefficient, and may not meet the performance target,while centralized, model-based approaches do not perform well in realistic, inhomogeneous settings. In this work, we present a data-driven centralized approach to resource-efficient, QoS-aware dynamic management of FC.We propose a Deep Learning strategy, which employs a Convolutional Neural Network (CNN) to capture the relationships between patterns of users mobility, of content diffusion and replication, and FC performance in terms of resource utilization and of content availability within a given area. Numerical evaluations show the effectiveness of our approach in deriving strategies which efficiently modulate the FC operation in space and effectively adapt to mobility pattern changes over time.