LGSep 29, 2022
Scaling Laws for a Multi-Agent Reinforcement Learning ModelOren Neumann, Claudius Gros
The recent observation of neural power-law scaling relations has made a significant impact in the field of deep learning. A substantial amount of attention has been dedicated as a consequence to the description of scaling laws, although mostly for supervised learning and only to a reduced extent for reinforcement learning frameworks. In this paper we present an extensive study of performance scaling for a cornerstone reinforcement learning algorithm, AlphaZero. On the basis of a relationship between Elo rating, playing strength and power-law scaling, we train AlphaZero agents on the games Connect Four and Pentago and analyze their performance. We find that player strength scales as a power law in neural network parameter count when not bottlenecked by available compute, and as a power of compute when training optimally sized agents. We observe nearly identical scaling exponents for both games. Combining the two observed scaling laws we obtain a power law relating optimal size to compute similar to the ones observed for language models. We find that the predicted scaling of optimal neural network size fits our data for both games. This scaling law implies that previously published state-of-the-art game-playing models are significantly smaller than their optimal size, given the respective compute budgets. We also show that large AlphaZero models are more sample efficient, performing better than smaller models with the same amount of training data.
LGJul 26, 2024
Reorganizing attention-space geometry with expressive attentionClaudius Gros
Attention regulates information transfer between tokens. For this, query and key vectors are compared, typically in terms of a scalar product, $\mathbf{Q}^T\mathbf{K}$, together with a subsequent softmax normalization. In geometric terms, the standard dot-product attention (DPA) leads to large/small attention weights for parallel/antiparallel queries and keys. Here we study expressive attention (EA), which is based on $(\mathbf{Q}^T\mathbf{K})^2$, the squared dot product. In this case, attention is enhanced when query and key are either parallel or antiparallel, and suppressed for orthogonal configurations. EA can be introduced into any attention-based code without additional compute costs or memory requirements. For a series of autoregressive prediction tasks, we find that expressive attention performs at least as well as vanilla DPA. Increasing task complexity, EA is observed to outperform DPA with increasing margins, which also holds for multi-task settings. For a given model size, EA manages to achieve 100% performance for a range of complexity levels not accessible to DPA. Our results show that it is possible to reorganize the geometry of the matching condition in the space of attention heads without loss of performance.
LGDec 16, 2024
AlphaZero Neural Scaling and Zipf's Law: a Tale of Board Games and Power LawsOren Neumann, Claudius Gros
Neural scaling laws are observed in a range of domains, to date with no universal understanding of why they occur. Recent theories suggest that loss power laws arise from Zipf's law, a power law observed in domains like natural language. One theory suggests that language scaling laws emerge when Zipf-distributed task quanta are learned in descending order of frequency. In this paper we examine power-law scaling in AlphaZero, a reinforcement learning algorithm, using a model of language-model scaling. We find that game states in training and inference data scale with Zipf's law, which is known to arise from the tree structure of the environment, and examine the correlation between scaling-law and Zipf's-law exponents. In agreement with the quanta scaling model, we find that agents optimize state loss in descending order of frequency, even though this order scales inversely with modelling complexity. We also find that inverse scaling, the failure of models to improve with size, is correlated with unusual Zipf curves where end-game states are among the most frequent states. We show evidence that larger models shift their focus to these less-important states, sacrificing their understanding of important early-game states.
LGAug 6, 2025
Small transformer architectures for task switchingClaudius Gros
The rapid progress seen in terms of large-scale generative AI is largely based on the attention mechanism. It is conversely non-trivial to conceive small-scale applications for which attention-based architectures outperform traditional approaches, such as multi-layer perceptrons or recurrent networks. We examine this problem in the context of 'task switching'. In this framework models work on ongoing token sequences with the current task being determined by stochastically interspersed control tokens. We show that standard transformers cannot solve a basic task switching reference model based on finite domain arithmetics which contains subtasks dedicated to increment / addition / reverse copy / context (IARC). We show that transformers, long short-term memory recurrent networks (LSTM), and plain multi-layer perceptrons (MLPs) achieve similar, but only modest prediction accuracies. We enlarge our comparative study by including an extension of the standard transformer architecture to its non-translational invariant counterpart, the cisformer, and an alternative attention mechanism, extensive attention. A combination of the latter is found to be the only model able to achieve considerable performance levels, of around 95%. Our results indicate that the workings of attention can be understood better, and even improved, when comparing qualitatively different formulations in task-switching settings.
AINov 20, 2025
From generative AI to the brain: five takeawaysClaudius Gros
The big strides seen in generative AI are not based on somewhat obscure algorithms, but due to clearly defined generative principles. The resulting concrete implementations have proven themselves in large numbers of applications. We suggest that it is imperative to thoroughly investigate which of these generative principles may be operative also in the brain, and hence relevant for cognitive neuroscience. In addition, ML research led to a range of interesting characterizations of neural information processing systems. We discuss five examples, the shortcomings of world modelling, the generation of thought processes, attention, neural scaling laws, and quantization, that illustrate how much neuroscience could potentially learn from ML research.
NCNov 30, 2021
Emotions as abstract evaluation criteria in biological and artificial intelligencesClaudius Gros
Biological as well as advanced artificial intelligences (AIs) need to decide which goals to pursue. We review nature's solution to the time allocation problem, which is based on a continuously readjusted categorical weighting mechanism we experience introspectively as emotions. One observes phylogenetically that the available number of emotional states increases hand in hand with the cognitive capabilities of animals and that raising levels of intelligence entail ever larger sets of behavioral options. Our ability to experience a multitude of potentially conflicting feelings is in this view not a leftover of a more primitive heritage, but a generic mechanism for attributing values to behavioral options that can not be specified at birth. In this view, emotions are essential for understanding the mind. For concreteness, we propose and discuss a framework which mimics emotions on a functional level. Based on time allocation via emotional stationarity (TAES), emotions are implemented as abstract criteria, such as satisfaction, challenge and boredom, which serve to evaluate activities that have been carried out. The resulting timeline of experienced emotions is compared with the `character' of the agent, which is defined in terms of a preferred distribution of emotional states. The long-term goal of the agent, to align experience with character, is achieved by optimizing the frequency for selecting individual tasks. Upon optimization, the statistics of emotion experience becomes stationary.
AIJan 26, 2021
Investment vs. reward in a competitive knapsack problemOren Neumann, Claudius Gros
Natural selection drives species to develop brains, with sizes that increase with the complexity of the tasks to be tackled. Our goal is to investigate the balance between the metabolic costs of larger brains compared to the advantage they provide in solving general and combinatorial problems. Defining advantage as the performance relative to competitors, a two-player game based on the knapsack problem is used. Within this framework, two opponents compete over shared resources, with the goal of collecting more resources than the opponent. Neural nets of varying sizes are trained using a variant of the AlphaGo Zero algorithm. A surprisingly simple relation, $N_A/(N_A+N_B)$, is found for the relative win rate of a net with $N_A$ neurons against one with $N_B$. Success increases linearly with investments in additional resources when the networks sizes are very different, i.e. when $N_A \ll N_B$, with returns diminishing when both networks become comparable in size.
AISep 25, 2019
A generic framework for task selection driven by synthetic emotionsClaudius Gros
Given a certain complexity level, humanized agents may select from a wide range of possible tasks, with each activity corresponding to a transient goal. In general there will be no overarching credit assignment scheme allowing to compare available options with respect to expected utilities. For this situation we propose a task selection framework that is based on time allocation via emotional stationarity (TAES). Emotions are argued to correspond to abstract criteria, such as satisfaction, challenge and boredom, along which activities that have been carried out can be evaluated. The resulting timeline of experienced emotions is then compared with the `character' of the agent, which is defined in terms of a preferred distribution of emotional states. The long-term goal of the agent, to align experience with character, is achieved by optimizing the frequency for selecting the individual tasks. Upon optimization, the statistics of emotion experience becomes stationary.
AOMay 17, 2019
When the goal is to generate a series of activities: A self-organized simulated robot armTim Koglin, Bulcsú Sándor, Claudius Gros
Behavior is characterized by sequences of goal-oriented conducts, such as food uptake, socializing and resting. Classically, one would define for each task a corresponding satisfaction level, with the agent engaging, at a given time, in the activity having the lowest satisfaction level. Alternatively, one may consider that the agent follows the overarching objective to generate sequences of distinct activities. To achieve a balanced distribution of activities would then be the primary goal, and not to master a specific task. In this setting, the agent would show two types of behaviors, task-oriented, and task-searching phases, with the latter interseeding the former. We study the emergence of autonomous task switching for the case of a simulated robot arm. Grasping one of several moving objects corresponds in this setting to a specific activity. Overall, the arm should follow a given object temporarily and then move away, in order to search for a new target and reengage. We show that this behavior can be generated robustly when modeling the arm as an adaptive dynamical system. The dissipation function is in this approach time dependent. The arm is in a dissipative state when searching for a nearby object, dissipating energy on approach. Once close, the dissipation function starts to increase, with the eventual sign change implying that the arm will take up energy and wander off. The resulting explorative state ends when the dissipation function becomes again negative and the arm selects a new target. We believe that our approach may be generalized to generate self-organized sequences of activities in general.
AOJun 25, 2018
Kick control: using the attracting states arising within the sensorimotor loop of self-organized robots as motor primitivesBulcsú Sándor, Michael Nowak, Tim Koglin et al.
Self-organized robots may develop attracting states within the sensorimotor loop, that is within the phase space of neural activity, body, and environmental variables. Fixpoints, limit cycles, and chaotic attractors correspond in this setting to a non-moving robot, to directed, and to irregular locomotion respectively. Short higher-order control commands may hence be used to kick the system from one self-organized attractor robustly into the basin of attraction of a different attractor, a concept termed here as kick control. The individual sensorimotor states serve in this context as highly compliant motor primitives. We study different implementations of kick control for the case of simulated and real-world wheeled robots, for which the dynamics of the distinct wheels is generated independently by local feedback loops. The feedback loops are mediated by rate-encoding neurons disposing exclusively of propriosensoric inputs in terms of projections of the actual rotational angle of the wheel. The changes of the neural activity are then transmitted into a rotational motion by a simulated transmission rod akin to the transmission rods used for steam locomotives. We find that the self-organized attractor landscape may be morphed both by higher-level control signals, in the spirit of kick control, and by interacting with the environment. Bumping against a wall destroys the limit cycle corresponding to forward motion, with the consequence that the dynamical variables are then attracted in phase space by the limit cycle corresponding to backward moving. The robot, which does not dispose of any distance or contact sensors, hence reverses direction autonomously.
NCAug 9, 2016
Closed-loop robots driven by short-term synaptic plasticity: Emergent explorative vs. limit-cycle locomotionLaura Martin, Bulcsú Sándor, Claudius Gros
We examine the hypothesis, that short-term synaptic plasticity (STSP) may generate self-organized motor patterns. We simulated sphere-shaped autonomous robots, within the LPZRobots simulation package, containing three weights moving along orthogonal internal rods. The position of a weight is controlled by a single neuron receiving excitatory input from the sensor, measuring its actual position, and inhibitory inputs from the other two neurons. The inhibitory connections are transiently plastic, following physiologically inspired STSP-rules. We find that a wide palette of motion patterns are generated through the interaction of STSP, robot, and environment (closed-loop configuration), including various forward meandering and circular motions, together with chaotic trajectories. The observed locomotion is robust with respect to additional interactions with obstacles. In the chaotic phase the robot is seemingly engaged in actively exploring its environment. We believe that our results constitute a concept of proof that transient synaptic plasticity, as described by STSP, may potentially be important for the generation of motor commands and for the emergence of complex locomotion patterns, adapting seamlessly also to unexpected environmental feedback. We observe spontaneous and collision induced mode switchings, finding in addition, that locomotion may follow transiently limit cycles which are otherwise unstable. Regular locomotion corresponds to stable limit cycles in the sensorimotor loop, which may be characterized in turn by arbitrary angles of propagation. This degeneracy is, in our analysis, one of the drivings for the chaotic wandering observed for selected parameter settings, which is induced by the smooth diffusion of the angle of propagation.
NCNov 13, 2015
The sensorimotor loop as a dynamical system: How regular motion primitives may emerge from self-organized limit cyclesBulcsú Sándor, Tim Jahn, Laura Martin et al.
We investigate the sensorimotor loop of simple robots simulated within the LPZRobots environment from the point of view of dynamical systems theory. For a robot with a cylindrical shaped body and an actuator controlled by a single proprioceptual neuron we find various types of periodic motions in terms of stable limit cycles. These are self-organized in the sense, that the dynamics of the actuator kicks in only, for a certain range of parameters, when the barrel is already rolling, stopping otherwise. The stability of the resulting rolling motions terminates generally, as a function of the control parameters, at points where fold bifurcations of limit cycles occur. We find that several branches of motion types exist for the same parameters, in terms of the relative frequencies of the barrel and of the actuator, having each their respective basins of attractions in terms of initial conditions. For low drivings stable limit cycles describing periodic and drifting back-and-forth motions are found additionally. These modes allow to generate symmetry breaking explorative behavior purely by the timing of an otherwise neutral signal with respect to the cyclic back-and-forth motion of the robot.
NCOct 2, 2014
Generating functionals for computational intelligence: the Fisher information as an objective function for self-limiting Hebbian learning rulesRodrigo Echeveste, Claudius Gros
Generating functionals may guide the evolution of a dynamical system and constitute a possible route for handling the complexity of neural networks as relevant for computational intelligence. We propose and explore a new objective function, which allows to obtain plasticity rules for the afferent synaptic weights. The adaption rules are Hebbian, self-limiting, and result from the minimization of the Fisher information with respect to the synaptic flux. We perform a series of simulations examining the behavior of the new learning rules in various circumstances. The vector of synaptic weights aligns with the principal direction of input activities, whenever one is present. A linear discrimination is performed when there are two or more principal directions; directions having bimodal firing-rate distributions, being characterized by a negative excess kurtosis, are preferred. We find robust performance and full homeostatic adaption of the synaptic weights results as a by-product of the synaptic flux minimization. This self-limiting behavior allows for stable online learning for arbitrary durations. The neuron acquires new information when the statistics of input activities is changed at a certain point of the simulation, showing however, a distinct resilience to unlearn previously acquired knowledge. Learning is fast when starting with randomly drawn synaptic weights and substantially slower when the synaptic weights are already fully adapted.
NCApr 22, 2014
Attractor Metadynamics in Adapting Neural NetworksClaudius Gros, Mathias Linkerhand, Valentin Walther
Slow adaption processes, like synaptic and intrinsic plasticity, abound in the brain and shape the landscape for the neural dynamics occurring on substantially faster timescales. At any given time the network is characterized by a set of internal parameters, which are adapting continuously, albeit slowly. This set of parameters defines the number and the location of the respective adiabatic attractors. The slow evolution of network parameters hence induces an evolving attractor landscape, a process which we term attractor metadynamics. We study the nature of the metadynamics of the attractor landscape for several continuous-time autonomous model networks. We find both first- and second-order changes in the location of adiabatic attractors and argue that the study of the continuously evolving attractor landscape constitutes a powerful tool for understanding the overall development of the neural dynamics.
NCOct 23, 2012
A Self-Organized Neural ComparatorGuillermo A. Ludueña, Claudius Gros
Learning algorithms need generally the possibility to compare several streams of information. Neural learning architectures hence need a unit, a comparator, able to compare several inputs encoding either internal or external information, like for instance predictions and sensory readings. Without the possibility of comparing the values of prediction to actual sensory inputs, reward evaluation and supervised learning would not be possible. Comparators are usually not implemented explicitly, necessary comparisons are commonly performed by directly comparing one-to-one the respective activities. This implies that the characteristics of the two input streams (like size and encoding) must be provided at the time of designing the system. It is however plausible that biological comparators emerge from self-organizing, genetically encoded principles, which allow the system to adapt to the changes in the input and in the organism. We propose an unsupervised neural circuitry, where the function of input comparison emerges via self-organization only from the interaction of the system with the respective inputs, without external influence or supervision. The proposed neural comparator adapts, unsupervised, according to the correlations present in the input streams. The system consists of a multilayer feed-forward neural network which follows a local output minimization (anti-Hebbian) rule for adaptation of the synaptic weights. The local output minimization allows the circuit to autonomously acquire the capability of comparing the neural activities received from different neural populations, which may differ in the size of the population and in the neural encoding used. The comparator is able to compare objects never encountered before in the sensory input streams and to evaluate a measure of their similarity, even when differently encoded.