Matteo Zavatteri

CR
h-index21
4papers
23citations
Novelty45%
AI Score42

4 Papers

AIMay 27
Refusal Before Decoding: Detecting and Exploiting Refusal Signals in Intermediate LLM Activations

Matteo Gioele Collu, Riccardo Conte, Alberto Giaretta et al.

In this paper, we investigate whether refusal behavior can be predicted from LLM intermediate activations before decoding using linear probes trained on residual stream activations at each transformer block. We find that refusal is linearly decodable well before the final layer, indicating that safety-relevant behavior is represented in intermediate activations before output generation. To test whether this signal is actionable, we introduce Mechanistic AutoDAN, a probe-guided variant of AutoDAN that replaces full-model fitness evaluation with partial forward passes and probe-based scoring inside a genetic prompt search loop. Across the evaluated models, our method achieves attack success rates competitive with vanilla AutoDAN while reducing per-iteration search time by up to 72%, and probe-guided prompts match or exceed AutoDAN's cross-model transfer in several configurations. We further find that the usefulness of probe guidance increases with model scale. Our results show that refusal is not only observable at the output level, but is encoded as a structured and actionable signal in intermediate LLM activations.

LGSep 1, 2025
Iterative In-Context Learning to Enhance LLMs Abstract Reasoning: The Case-Study of Algebraic Tasks

Stefano Fioravanti, Matteo Zavatteri, Roberto Confalonieri et al.

LLMs face significant challenges in systematic generalization, particularly when dealing with reasoning tasks requiring compositional rules and handling out-of-distribution examples. To address these challenges, we introduce an in-context learning methodology that improves the generalization capabilities of general purpose LLMs. Our approach employs an iterative example selection strategy, which incrementally constructs a tailored set of few-shot examples optimized to enhance model's performance on a given task. As a proof of concept, we apply this methodology to the resolution of algebraic expressions involving non-standard simplification rules, according to which the priority of addition and multiplication is changed. Our findings indicate that LLMs exhibit limited proficiency in these mathematical tasks. We further demonstrate that LLMs reasoning benefits from our iterative shot selection prompting strategy integrated with explicit reasoning instructions. Crucially, our experiments reveal that some LLMs achieve better generalization performances when prompted with simpler few-shot examples rather than complex ones following the test data distribution.

CRDec 20, 2015
Security Constraints in Temporal Role-Based Access-Controlled Workflows (Extended Version)

Carlo Combi, Luca Viganó, Matteo Zavatteri

Workflows and role-based access control models need to be suitably merged, in order to allow users to perform processes in a correct way, according to the given data access policies and the temporal constraints. Given a mapping between workflow models and simple temporal networks with uncertainty, we discuss a mapping between role temporalities and simple temporal networks, and how to connect the two resulting networks to make explicit who can do what, when. If the connected network is still executable, we show how to compute the set of authorized users for each task. Finally, we define security constraints (to prevent users from doing unauthorized actions) and security constraint propagation rules (to propagate security constraints at runtime). We also provide an algorithm to check whether a set of propagation rules is safe, and we extend an existing execution algorithm to take into account these new security aspects.

CRMay 27, 2014
Non-collaborative Attackers and How and Where to Defend Flawed Security Protocols (Extended Version)

Michele Peroli, Luca Viganò, Matteo Zavatteri

Security protocols are often found to be flawed after their deployment. We present an approach that aims at the neutralization or mitigation of the attacks to flawed protocols: it avoids the complete dismissal of the interested protocol and allows honest agents to continue to use it until a corrected version is released. Our approach is based on the knowledge of the network topology, which we model as a graph, and on the consequent possibility of creating an interference to an ongoing attack of a Dolev-Yao attacker, by means of non-collaboration actuated by ad-hoc benign attackers that play the role of network guardians. Such guardians, positioned in strategical points of the network, have the task of monitoring the messages in transit and discovering at runtime, through particular types of inference, whether an attack is ongoing, interrupting the run of the protocol in the positive case. We study not only how but also where we can attempt to defend flawed security protocols: we investigate the different network topologies that make security protocol defense feasible and illustrate our approach by means of concrete examples.