Francesco Caso

LG
h-index7
5papers
13citations
Novelty56%
AI Score47

5 Papers

97.8CLMay 23Code
The Path Matters: Learning a Token-Commitment Policy for Diffusion Language Models

Bohang Sun, Max Zhu, Francesco Caso et al.

Diffusion large language models promise faster generation by refining many token positions in parallel, but this parallelism introduces a hidden control problem: which proposed tokens should be transferred into the partially decoded sequence at each step? We refer to this decision as token commitment. Existing frozen-generator decoders largely rely on hand-designed confidence rules or block-specific acceptance filters. We argue that token commitment can instead be learned as a reusable trace-state policy. We introduce TraceLock, a lightweight plug-in controller that instantiates this policy for a frozen diffusion language model. Since oracle commitment times are unavailable, TraceLock derives self-supervision from future stability: at decoding step t, a proposed token for position i is labeled stable if it matches the final token at position i after the full decoding trace completes. The controller scores variable-length trace states and decides which active token proposals should be committed to the partially decoded sequence. Once trained for a given frozen backbone, the controller can be deployed across local-window widths, generation lengths, and step budgets without retraining or per-setting calibration. Experiments on question answering, mathematical reasoning, and code generation show that TraceLock improves the quality-step tradeoff over heuristic and learned baselines, with particularly stable behavior under cross-setting deployment. Diagnostic analyses show that its decisions are not reducible to scalar confidence, suggesting that frozen diffusion language models expose a learnable space of commitment trajectories beyond confidence-based decoding. Code is available at https://github.com/BobSun98/TraceLock.

LGJun 1, 2023
Renormalized Graph Representations for Node Classification

Francesco Caso, Giovanni Trappolini, Andrea Bacciu et al.

Graph neural networks process information on graphs represented at a given resolution scale. We analyze the effect of using different coarse-grained graph resolutions, obtained through the Laplacian renormalization group theory, on node classification tasks. At the theory's core is grouping nodes connected by significant information flow at a given time scale. Representations of the graph at different scales encode interaction information at different ranges. We specifically experiment using representations at the characteristic scale of the graph's mesoscopic structures. We provide the models with the original graph and the graph represented at the characteristic resolution scale and compare them to models that can only access the original graph. Our results showed that models with access to both the original graph and the characteristic scale graph can achieve statistically significant improvements in test accuracy.

LGFeb 22, 2024Code
Link Prediction with Physics-Inspired Graph Neural Networks

Andrea Giuseppe Di Francesco, Francesco Caso, Maria Sofia Bucarelli et al.

The message-passing mechanism underlying Graph Neural Networks (GNNs) is not naturally suited for heterophilic datasets, where adjacent nodes often have different labels. Most solutions to this problem remain confined to the task of node classification. In this article, we focus on the valuable task of link prediction under heterophily, an interesting problem for recommendation systems, social network analysis, and other applications. GNNs like GRAFF have improved node classification under heterophily by incorporating physics biases in the architecture. Similarly, we propose GRAFF-LP, an extension of GRAFF for link prediction. We show that GRAFF-LP effectively discriminates existing from non-existing edges by learning implicitly to separate the edge gradients. Based on this information, we propose a new readout function inspired by physics. Remarkably, this new function not only enhances the performance of GRAFF-LP but also improves that of other baseline models, leading us to reconsider how every link prediction experiment has been conducted so far. Finally, we provide evidence that even simple GNNs did not experience greater difficulty in predicting heterophilic links compared to homophilic ones. This leads us to believe in the necessity for heterophily measures specifically tailored for link prediction, distinct from those used in node classification. The code and appendix are available at https://github.com/difra100/Link_Prediction_with_PIGNN_IJCNN.

LGFeb 23, 2025
Entropy-Lens: The Information Signature of Transformer Computations

Riccardo Ali, Francesco Caso, Christopher Irwin et al.

Transformer models map input token sequences to output token distributions, layer by layer. While most interpretability work focuses on internal latent representations, we study the evolution of these token-level distributions directly in vocabulary space. However, such distributions are high-dimensional and defined on an unordered support, making common descriptors like moments or cumulants ill-suited. We address this by computing the Shannon entropy of each intermediate predicted distribution, yielding one interpretable scalar per layer. The resulting sequence, the entropy profile, serves as a compact, information-theoretic signature of the model's computation. We introduce Entropy-Lens, a model-agnostic framework that extracts entropy profiles from frozen, off-the-shelf transformers. We show that these profiles (i) reveal family-specific computation patterns invariant under depth rescaling, (ii) are predictive of prompt type and task format, and (iii) correlate with output correctness. We further show that Rényi entropies yield similar results within a broad range of $α$ values, justifying the use of Shannon entropy as a stable and principled summary. Our results hold across different transformers, without requiring gradients, fine-tuning, or access to model internals.

LGOct 18, 2025
Symmetry and Generalisation in Neural Approximations of Renormalisation Transformations

Cassidy Ashworth, Pietro Liò, Francesco Caso

Deep learning models have proven enormously successful at using multiple layers of representation to learn relevant features of structured data. Encoding physical symmetries into these models can improve performance on difficult tasks, and recent work has motivated the principle of parameter symmetry breaking and restoration as a unifying mechanism underlying their hierarchical learning dynamics. We evaluate the role of parameter symmetry and network expressivity in the generalisation behaviour of neural networks when learning a real-space renormalisation group (RG) transformation, using the central limit theorem (CLT) as a test case map. We consider simple multilayer perceptrons (MLPs) and graph neural networks (GNNs), and vary weight symmetries and activation functions across architectures. Our results reveal a competition between symmetry constraints and expressivity, with overly complex or overconstrained models generalising poorly. We analytically demonstrate this poor generalisation behaviour for certain constrained MLP architectures by recasting the CLT as a cumulant recursion relation and making use of an established framework to propagate cumulants through MLPs. We also empirically validate an extension of this framework from MLPs to GNNs, elucidating the internal information processing performed by these more complex models. These findings offer new insight into the learning dynamics of symmetric networks and their limitations in modelling structured physical transformations.