AIMar 19Code
NeuroGame Transformer: Gibbs-Inspired Attention Driven by Game Theory and Statistical PhysicsDjamel Bouchaffra, Fayçal Ykhlef, Hanene Azzag et al.
Standard attention mechanisms in transformers are limited by their pairwise formulation, which hinders the modeling of higher-order dependencies among tokens. We introduce the NeuroGame Transformer (NGT) to overcome this by reconceptualizing attention through a dual perspective: tokens are treated simultaneously as players in a cooperative game and as interacting spins in a statistical physics system. Token importance is quantified using two complementary game-theoretic concepts -- Shapley values for global, permutation-based attribution and Banzhaf indices for local, coalition-level influence. These are combined via a learnable gating parameter to form an external magnetic field, while pairwise interaction potentials capture synergistic relationships. The system's energy follows an Ising Hamiltonian, with attention weights emerging as marginal probabilities under the Gibbs distribution, efficiently computed via mean-field equations. To ensure scalability despite the exponential coalition space, we develop importance-weighted Monte Carlo estimators with Gibbs-distributed weights. This approach avoids explicit exponential factors, ensuring numerical stability for long sequences. We provide theoretical convergence guarantees and characterize the fairness-sensitivity trade-off governed by the interpolation parameter. Experimental results demonstrate that the NeuroGame Transformer achieves strong performance across SNLI, and MNLI-matched, outperforming some major efficient transformer baselines. On SNLI, it attains a test accuracy of 86.4\% (with a peak validation accuracy of 86.6\%), surpassing ALBERT-Base and remaining highly competitive with RoBERTa-Base. Code is available at https://github.com/dbouchaffra/NeuroGame-Transformer.
GTMay 25
Coalition Free Energy and Adaptive Precision in Multi-Agent CooperationDjamel Bouchaffra, Faycal Ykhlef, Mustapha Lebbah et al.
Cooperative multi-agent systems require robust mechanisms for credit assignment under uncertainty. Here we introduce a variational framework, termed the Game-Theoretic Free Energy Principle (GT-FEP), that models coalition formation through a Gibbs distribution over interacting agents. Within this framework, we derive a precision-dependent formulation of cooperative credit assignment and show that an agent's Shapley value exhibits a non-monotonic relationship with sensory precision beta, reflecting a trade-off between noisy inference and overconfident local estimation. Motivated by this observation, we propose Adaptive Precision Control (APC), an online adaptation algorithm that dynamically adjusts observation precision using local estimates of cooperative contribution. We evaluate APC on real-world Swiss roundabout trajectory datasets and on a multi-agent control task derived from the same trajectories. Across both settings, APC adapts to changing noise conditions online and achieves performance comparable to the best fixed precision without prior tuning. Our results connect variational inference, cooperative game theory, and adaptive multi-agent coordination, and suggest that precision adaptation can improve robust cooperation under uncertainty.
CVSep 7, 2024
Adaptative Context Normalization: A Boost for Deep Learning in Image ProcessingBilal Faye, Hanane Azzag, Mustapha Lebbah et al.
Deep Neural network learning for image processing faces major challenges related to changes in distribution across layers, which disrupt model convergence and performance. Activation normalization methods, such as Batch Normalization (BN), have revolutionized this field, but they rely on the simplified assumption that data distribution can be modelled by a single Gaussian distribution. To overcome these limitations, Mixture Normalization (MN) introduced an approach based on a Gaussian Mixture Model (GMM), assuming multiple components to model the data. However, this method entails substantial computational requirements associated with the use of Expectation-Maximization algorithm to estimate parameters of each Gaussian components. To address this issue, we introduce Adaptative Context Normalization (ACN), a novel supervised approach that introduces the concept of "context", which groups together a set of data with similar characteristics. Data belonging to the same context are normalized using the same parameters, enabling local representation based on contexts. For each context, the normalized parameters, as the model weights are learned during the backpropagation phase. ACN not only ensures speed, convergence, and superior performance compared to BN and MN but also presents a fresh perspective that underscores its particular efficacy in the field of image processing.
CVMar 14, 2023
Context Normalization Layer with ApplicationsBilal Faye, Mohamed-Djallel Dilmi, Hanane Azzag et al.
Normalization is a pre-processing step that converts the data into a more usable representation. As part of the deep neural networks (DNNs), the batch normalization (BN) technique uses normalization to address the problem of internal covariate shift. It can be packaged as general modules, which have been extensively integrated into various DNNs, to stabilize and accelerate training, presumably leading to improved generalization. However, the effect of BN is dependent on the mini-batch size and it does not take into account any groups or clusters that may exist in the dataset when estimating population statistics. This study proposes a new normalization technique, called context normalization, for image data. This approach adjusts the scaling of features based on the characteristics of each sample, which improves the model's convergence speed and performance by adapting the data values to the context of the target task. The effectiveness of context normalization is demonstrated on various datasets, and its performance is compared to other standard normalization techniques.
MLDec 18, 2023Code
Distributed Collapsed Gibbs Sampler for Dirichlet Process Mixture Models in Federated LearningReda Khoufache, Mustapha Lebbah, Hanene Azzag et al.
Dirichlet Process Mixture Models (DPMMs) are widely used to address clustering problems. Their main advantage lies in their ability to automatically estimate the number of clusters during the inference process through the Bayesian non-parametric framework. However, the inference becomes considerably slow as the dataset size increases. This paper proposes a new distributed Markov Chain Monte Carlo (MCMC) inference method for DPMMs (DisCGS) using sufficient statistics. Our approach uses the collapsed Gibbs sampler and is specifically designed to work on distributed data across independent and heterogeneous machines, which habilitates its use in horizontal federated learning. Our method achieves highly promising results and notable scalability. For instance, with a dataset of 100K data points, the centralized algorithm requires approximately 12 hours to complete 100 iterations while our approach achieves the same number of iterations in just 3 minutes, reducing the execution time by a factor of 200 without compromising clustering performance. The code source is publicly available at https://github.com/redakhoufache/DisCGS.
AIMay 10
A Game Theoretic Free Energy Analysis of Higher Order Synergy in Attention Heads of Large Language ModelsDjamel Bouchaffra
Large language models rely on multihead attention, but interactions among heads remain poorly understood. We apply the Game Theoretic Free Energy Principle (GTFEP): a framework casting multiagent systems as distributed variational inference to analyze attention heads as bounded rational agents. According to GTFEP, each head minimizes its variational free energy, and collective behavior follows a Gibbs distribution over coalition structures whose energy is decomposed into Harsanyi dividends. Using a tractable approximation (uniform prior, deterministic dynamics), coalition free energy reduces to joint Shannon entropy of discretized head outputs (argmax key index). Pairwise dividends become mutual information (nonnegative), while triple dividends correspond to interaction information and can be negative. On BERT, GPT2, and Llama with GSM8K, triple dividends are consistently negative, revealing higher order redundancy. The Nash FEP correspondence guarantees that stationary points of collective free energy are epsilon Nash equilibria; thus, heads with negligible contribution can be pruned with minimal performance loss. Pruning heads with low marginal contribution reduces computational cost with minimal performance loss: for example, pruning 20% of heads in GPT2 reduces FLOPs by 18%, increases throughput by 22%, and raises perplexity only modestly (from 28.4 to 33.4 on GSM8K). Our work shows GTFEP provides a principled foundation for analyzing and optimizing transformer architectures.
AIApr 30
A Collective Variational Principle Unifying Bayesian Inference, Game Theory, and ThermodynamicsDjamel Bouchaffra, Faycal Ykhlef, Mustapha Lebbah et al.
Collective intelligence emerges across biological, physical, and artificial systems without central coordination, yet a unifying principle governing such behaviour remains elusive. The Free Energy Principle explains how individual agents adapt through variational inference, while game theory formalises strategic interactions. Here we introduce the Game-Theoretic Free Energy Principle, a unified framework showing that multi-agent systems performing local free-energy minimisation implicitly implement a stochastic game. We prove that, under bounded rationality and local information constraints, stationary points of collective free energy correspond to approximate Nash equilibria of an induced game. Conversely, a broad class of cooperative games admits a variational representation in which equilibria arise as Gibbs distributions over coalitions, establishing a bridge between Bayesian inference and strategic interaction. To characterise higher-order effects, we introduce a free-energy formulation of the Harsanyi dividend, isolating irreducible multi-agent synergy. This yields a predictive theory of cooperation, including a falsifiable non-monotonic relationship between sensory precision and agent influence. We validate this prediction across neural, biological, and artificial multi-agent systems. These results identify a common variational principle underlying inference, thermodynamics, and game-theoretic equilibrium.
LGMar 25, 2024
Enhancing Neural Network Representations with Prior Knowledge-Based NormalizationBilal Faye, Hanane Azzag, Mustapha Lebbah et al.
Deep learning models face persistent challenges in training, particularly due to internal covariate shift and label shift. While single-mode normalization methods like Batch Normalization partially address these issues, they are constrained by batch size dependencies and limiting distributional assumptions. Multi-mode normalization techniques mitigate these limitations but struggle with computational demands when handling diverse Gaussian distributions. In this paper, we introduce a new approach to multi-mode normalization that leverages prior knowledge to improve neural network representations. Our method organizes data into predefined structures, or "contexts", prior to training and normalizes based on these contexts, with two variants: Context Normalization (CN) and Context Normalization - Extended (CN-X). When contexts are unavailable, we introduce Adaptive Context Normalization (ACN), which dynamically builds contexts in the latent space during training. Across tasks in image classification, domain adaptation, and image generation, our methods demonstrate superior convergence and performance.
LGMar 7, 2024
Lightweight Cross-Modal Representation LearningBilal Faye, Hanane Azzag, Mustapha Lebbah et al.
Low-cost cross-modal representation learning is crucial for deriving semantic representations across diverse modalities such as text, audio, images, and video. Traditional approaches typically depend on large specialized models trained from scratch, requiring extensive datasets and resulting in high resource and time costs. To overcome these challenges, we introduce a novel approach named Lightweight Cross-Modal Representation Learning (LightCRL). This method uses a single neural network titled Deep Fusion Encoder (DFE), which projects data from multiple modalities into a shared latent representation space. This reduces the overall parameter count while still delivering robust performance comparable to more complex systems.
LGOct 16, 2024
Game Theory Meets Statistical Mechanics in Deep Learning DesignDjamel Bouchaffra, Fayçal Ykhlef, Bilal Faye et al.
We present a novel deep graphical representation that seamlessly merges principles of game theory with laws of statistical mechanics. It performs feature extraction, dimensionality reduction, and pattern classification within a single learning framework. Our approach draws an analogy between neurons in a network and players in a game theory model. Furthermore, each neuron viewed as a classical particle (subject to statistical physics' laws) is mapped to a set of actions representing specific activation value, and neural network layers are conceptualized as games in a sequential cooperative game theory setting. The feed-forward process in deep learning is interpreted as a sequential game, where each game comprises a set of players. During training, neurons are iteratively evaluated and filtered based on their contributions to a payoff function, which is quantified using the Shapley value driven by an energy function. Each set of neurons that significantly contributes to the payoff function forms a strong coalition. These neurons are the only ones permitted to propagate the information forward to the next layers. We applied this methodology to the task of facial age estimation and gender classification. Experimental results demonstrate that our approach outperforms both multi-layer perceptron and convolutional neural network models in terms of efficiency and accuracy.