Zoran Tiganj

h-index14

8papers

55citations

Novelty57%

AI Score54

Ranked #10,725 of 194,257 authors (top 6%)#2,786 in LG (top 7%)

8 Papers

8.6CLApr 1Code

Temporal Dependencies in In-Context Learning: The Role of Induction Heads

Anooshka Bajaj, Deven Mahesh Mistry, Sahaj Singh Maini et al.

Large language models (LLMs) exhibit strong in-context learning capabilities, but how they track and retrieve information from context remains underexplored. Drawing on the free recall paradigm in cognitive science (where participants recall list items in any order), we show that several open-source LLMs consistently display a serial-recall-like pattern, assigning peak probability to tokens that immediately follow a repeated token in the input sequence. Through systematic ablation experiments, we show that induction heads, specialized attention heads that attend to the token following a previous occurrence of the current token, play an important role in this phenomenon. Removing heads with a high induction score substantially reduces the +1 lag bias, whereas ablating random heads does not reproduce the same reduction. We also show that removing heads with high induction scores impairs the performance of models prompted to do serial recall using few-shot learning to a larger extent than removing random heads. Our findings highlight a mechanistically specific connection between induction heads and temporal context processing in transformers, suggesting that these heads are especially important for ordered retrieval and serial-recall-like behavior during in-context learning.

12.1MAApr 2

High Volatility and Action Bias Distinguish LLMs from Humans in Group Coordination

Sahaj Singh Maini, Robert L. Goldstone, Zoran Tiganj

Humans exhibit remarkable abilities to coordinate in groups. As large language models (LLMs) become more capable, it remains an open question whether they can demonstrate comparable adaptive coordination and whether they use the same strategies as humans. To investigate this, we compare LLM and human performance on a common-interest game with imperfect monitoring: Group Binary Search. In this n-player game, participants need to coordinate their actions to achieve a common objective. Players independently submit numerical values in an effort to collectively sum to a randomly assigned target number. Without direct communication, they rely on group feedback to iteratively adjust their submissions until they reach the target number. Our findings show that, unlike humans who adapt and stabilize their behavior over time, LLMs often fail to improve across games and exhibit excessive switching, which impairs group convergence. Moreover, richer feedback (e.g., numerical error magnitude) benefits humans substantially but has small effects on LLMs. Taken together, by grounding the analysis in human baselines and mechanism-level metrics, including reactivity scaling, switching dynamics, and learning across games, we point to differences in human and LLM groups and provide a behaviorally grounded diagnostic for closing the coordination gap.

9.6CRApr 20

Different Paths to Harmful Compliance: Behavioral Side Effects and Mechanistic Divergence Across LLM Jailbreaks

Md Rysul Kabir, Zoran Tiganj

Open-weight language models can be rendered unsafe through several distinct interventions, but the resulting models may differ substantially in capabilities, behavioral profile, and internal failure mode. We study behavioral and mechanistic properties of jailbroken models across three unsafe routes: harmful supervised fine-tuning (SFT), harmful reinforcement learning with verifiable rewards (RLVR), and refusal-suppressing abliteration. All three routes achieve near-ceiling harmful compliance, but they diverge once we move beyond direct harmfulness. RLVR-jailbroken models show minimal degradation and preserve explicit harm recognition in a structured self-audit: they are able to identify harmful prompts and describe how a safe LLM should respond, yet they comply with the harmful request. With RLVR, harmful behavior is strongly suppressed by a reflective safety scaffold: when a harmful prompt is prepended with an instruction to reflect on safety standards, harmful behavior drops close to the baseline. Category-specific RLVR jailbreaks generalize broadly across harmfulness domains. Models jailbroken with SFT show the largest collapse in explicit safety judgments, the highest behavioral drift, and a substantial capability loss on standard benchmarks. Abliteration is family-dependent in both self-audit and response to a reflective safety scaffold. Mechanistic and repair analyses further separate the routes: abliteration is consistent with localized refusal-feature deletion, RLVR with preserved safety geometry but retargeted policy behavior, and SFT with broader distributed drift. Targeted repair partially recovers RLVR-jailbroken models, but has little effect on SFT-jailbroken models. Together, these results show that jailbreaks can produce vastly different properties despite similar harmfulness, with models jailbroken via RLVR showing remarkable similarity to the base model.

23.3LGFeb 9, 2025

Emergence of Episodic Memory in Transformers: Characterizing Changes in Temporal Structure of Attention Scores During Training

Deven Mahesh Mistry, Anooshka Bajaj, Yash Aggarwal et al.

We investigate in-context temporal biases in attention heads and transformer outputs. Using cognitive science methodologies, we analyze attention scores and outputs of the GPT-2 models of varying sizes. Across attention heads, we observe effects characteristic of human episodic memory, including temporal contiguity, primacy and recency. Transformer outputs demonstrate a tendency toward in-context serial recall. Importantly, this effect is eliminated after the ablation of the induction heads, which are the driving force behind the contiguity effect. Our findings offer insights into how transformers organize information temporally during in-context learning, shedding light on their similarities and differences with human memory and learning.

4.2AIDec 19, 2024Code

Deep reinforcement learning with time-scale invariant memory

Md Rysul Kabir, James Mochizuki-Freeman, Zoran Tiganj

The ability to estimate temporal relationships is critical for both animals and artificial agents. Cognitive science and neuroscience provide remarkable insights into behavioral and neural aspects of temporal credit assignment. In particular, scale invariance of learning dynamics, observed in behavior and supported by neural data, is one of the key principles that governs animal perception: proportional rescaling of temporal relationships does not alter the overall learning efficiency. Here we integrate a computational neuroscience model of scale invariant memory into deep reinforcement learning (RL) agents. We first provide a theoretical analysis and then demonstrate through experiments that such agents can learn robustly across a wide range of temporal scales, unlike agents built with commonly used recurrent memory architectures such as LSTM. This result illustrates that incorporating computational principles from neuroscience and cognitive science into deep neural networks can enhance adaptability to complex temporal dynamics, mirroring some of the core properties of human learning.

6.5LGJul 9, 2021Code

A deep convolutional neural network that is invariant to time rescaling

Brandon G. Jacques, Zoran Tiganj, Aakash Sarkar et al.

Human learners can readily understand speech, or a melody, when it is presented slower or faster than usual. Although deep convolutional neural networks (CNNs) are extremely powerful in extracting information from time series, they require explicit training to generalize to different time scales. This paper presents a deep CNN that incorporates a temporal representation inspired by recent findings from neuroscience. In the mammalian brain, time is represented by populations of neurons with temporal receptive fields. Critically, the peaks of the receptive fields form a geometric series, such that the population codes a set of temporal basis functions over log time. Because memory for the recent past is a function of log time, rescaling the input results in translation of the memory. The Scale-Invariant Temporal History Convolution network (SITHCon) builds a convolutional layer over this logarithmically-distributed temporal memory. A max-pool operation results in a network that is invariant to rescalings of time modulo edge effects. We compare performance of SITHCon to a Temporal Convolution Network (TCN). Although both networks can learn classification and regression problems on both univariate and multivariate time series f(t), only SITHCon generalizes to rescalings f(at). This property, inspired by findings from contemporary neuroscience and consistent with findings from cognitive psychology, may enable networks that learn with fewer training examples, fewer weights and that generalize more robustly to out of sample data.

4.4LGApr 9, 2021Code

DeepSITH: Efficient Learning via Decomposition of What and When Across Time Scales

Brandon Jacques, Zoran Tiganj, Marc W. Howard et al.

Extracting temporal relationships over a range of scales is a hallmark of human perception and cognition -- and thus it is a critical feature of machine learning applied to real-world problems. Neural networks are either plagued by the exploding/vanishing gradient problem in recurrent neural networks (RNNs) or must adjust their parameters to learn the relevant time scales (e.g., in LSTMs). This paper introduces DeepSITH, a network comprising biologically-inspired Scale-Invariant Temporal History (SITH) modules in series with dense connections between layers. SITH modules respond to their inputs with a geometrically-spaced set of time constants, enabling the DeepSITH network to learn problems along a continuum of time-scales. We compare DeepSITH to LSTMs and other recent RNNs on several time series prediction and decoding tasks. DeepSITH achieves state-of-the-art performance on these problems.

6.6AIFeb 18, 2018

Estimating scale-invariant future in continuous time

Zoran Tiganj, Samuel J. Gershman, Per B. Sederberg et al.

Natural learners must compute an estimate of future outcomes that follow from a stimulus in continuous time. Widely used reinforcement learning algorithms discretize continuous time and estimate either transition functions from one step to the next (model-based algorithms) or a scalar value of exponentially-discounted future reward using the Bellman equation (model-free algorithms). An important drawback of model-based algorithms is that computational cost grows linearly with the amount of time to be simulated. On the other hand, an important drawback of model-free algorithms is the need to select a time-scale required for exponential discounting. We present a computational mechanism, developed based on work in psychology and neuroscience, for computing a scale-invariant timeline of future outcomes. This mechanism efficiently computes an estimate of inputs as a function of future time on a logarithmically-compressed scale, and can be used to generate a scale-invariant power-law-discounted estimate of expected future reward. The representation of future time retains information about what will happen when. The entire timeline can be constructed in a single parallel operation which generates concrete behavioral and neural predictions. This computational mechanism could be incorporated into future reinforcement learning algorithms.