62.2LGMay 11
Remember to Forget: Gated Adaptive Positional EncodingRiccardo Ali, Alessio Borgi, Christopher Irwin et al.
Rotary Positional Encoding (RoPE) is widely used in modern large language models. However, when sequences are extended beyond the range seen during training, rotary phases can enter out-of-distribution regimes, leading to spurious long-range alignments, diffuse attention, and degraded retrieval. Existing remedies only partially address these failures, as they often trade local positional resolution for long-context stability. We propose GAPE (Gated Adaptive Positional Encoding), a drop-in augmentation for positional encodings that introduces a content-aware bias directly into the attention logits while preserving the rotary geometry. GAPE decouples distance-based suppression from token importance through a query-dependent gate that contracts irrelevant context and a key-dependent gate that preserves salient distant tokens. We prove that protected tokens remain accessible, while the attention mass assigned to unprotected distant tokens decays as a function of the query gate. We further show that GAPE can be implemented within standard scaled dot-product attention. We validate these properties empirically, finding that GAPE consistently yields sharper attention and improved long-context robustness over rotary baselines across both synthetic retrieval and long-context benchmarks.
LGFeb 23, 2025
Entropy-Lens: The Information Signature of Transformer ComputationsRiccardo Ali, Francesco Caso, Christopher Irwin et al.
Transformer models map input token sequences to output token distributions, layer by layer. While most interpretability work focuses on internal latent representations, we study the evolution of these token-level distributions directly in vocabulary space. However, such distributions are high-dimensional and defined on an unordered support, making common descriptors like moments or cumulants ill-suited. We address this by computing the Shannon entropy of each intermediate predicted distribution, yielding one interpretable scalar per layer. The resulting sequence, the entropy profile, serves as a compact, information-theoretic signature of the model's computation. We introduce Entropy-Lens, a model-agnostic framework that extracts entropy profiles from frozen, off-the-shelf transformers. We show that these profiles (i) reveal family-specific computation patterns invariant under depth rescaling, (ii) are predictive of prompt type and task format, and (iii) correlate with output correctness. We further show that Rényi entropies yield similar results within a broad range of $α$ values, justifying the use of Shannon entropy as a stable and principled summary. Our results hold across different transformers, without requiring gradients, fine-tuning, or access to model internals.
CVOct 15, 2024
Beyond Labels: A Self-Supervised Framework with Masked Autoencoders and Random Cropping for Breast Cancer Subtype ClassificationAnnalisa Chiocchetti, Marco Dossena, Christopher Irwin et al.
This work contributes to breast cancer sub-type classification using histopathological images. We utilize masked autoencoders (MAEs) to learn a self-supervised embedding tailored for computer vision tasks in this domain. This embedding captures informative representations of histopathological data, facilitating feature learning without extensive labeled datasets. During pre-training, we investigate employing a random crop technique to generate a large dataset from WSIs automatically. Additionally, we assess the performance of linear probes for multi-class classification tasks of cancer sub-types using the representations learnt by the MAE. Our approach aims to achieve strong performance on downstream tasks by leveraging the complementary strengths of ViTs and autoencoders. We evaluate our model's performance on the BRACS dataset and compare it with existing benchmarks.
CLAug 10, 2025
HealthBranches: Synthesizing Clinically-Grounded Question Answering Datasets via Decision PathwaysCristian Cosentino, Annamaria Defilippo, Marco Dossena et al.
HealthBranches is a novel benchmark dataset for medical Question-Answering (Q&A), specifically designed to evaluate complex reasoning in Large Language Models (LLMs). This dataset is generated through a semi-automated pipeline that transforms explicit decision pathways from medical source into realistic patient cases with associated questions and answers. Covering 4,063 case studies across 17 healthcare topics, each data point is based on clinically validated reasoning chains. HealthBranches supports both open-ended and multiple-choice question formats and uniquely includes the full reasoning path for each Q&A. Its structured design enables robust evaluation of LLMs' multi-step inference capabilities, including their performance in structured Retrieval-Augmented Generation (RAG) contexts. HealthBranches establishes a foundation for the development of more trustworthy, interpretable, and clinically reliable LLMs in high-stakes domains while also serving as a valuable resource for educational purposes.
LGJun 28, 2024
Graph Neural Networks for Gut Microbiome Metaomic data: A preliminary workChristopher Irwin, Flavio Mignone, Stefania Montani et al.
The gut microbiome, crucial for human health, presents challenges in analyzing its complex metaomic data due to high dimensionality and sparsity. Traditional methods struggle to capture its intricate relationships. We investigate graph neural networks (GNNs) for this task, aiming to derive meaningful representations of individual gut microbiomes. Unlike methods relying solely on taxa abundance, we directly leverage phylogenetic relationships, in order to obtain a generalized encoder for taxa networks. The representation learnt from the encoder are then used to train a model for phenotype prediction such as Inflammatory Bowel Disease (IBD).
LGMar 13, 2024
Structural Positional Encoding for knowledge integration in transformer-based medical process monitoringChristopher Irwin, Marco Dossena, Giorgio Leonardi et al.
Predictive process monitoring is a process mining task aimed at forecasting information about a running process trace, such as the most correct next activity to be executed. In medical domains, predictive process monitoring can provide valuable decision support in atypical and nontrivial situations. Decision support and quality assessment in medicine cannot ignore domain knowledge, in order to be grounded on all the available information (which is not limited to data) and to be really acceptable by end users. In this paper, we propose a predictive process monitoring approach relying on the use of a {\em transformer}, a deep learning architecture based on the attention mechanism. A major contribution of our work lies in the incorporation of ontological domain-specific knowledge, carried out through a graph positional encoding technique. The paper presents and discusses the encouraging experimental result we are collecting in the domain of stroke management.