Pradeep Kr. Banerjee

LG
h-index14
11papers
235citations
Novelty43%
AI Score44

11 Papers

LGAug 6, 2022
Oversquashing in GNNs through the lens of information contraction and graph expansion

Pradeep Kr. Banerjee, Kedar Karhadkar, Yu Guang Wang et al. · cmu

The quality of signal propagation in message-passing graph neural networks (GNNs) strongly influences their expressivity as has been observed in recent works. In particular, for prediction tasks relying on long-range interactions, recursive aggregation of node features can lead to an undesired phenomenon called "oversquashing". We present a framework for analyzing oversquashing based on information contraction. Our analysis is guided by a model of reliable computation due to von Neumann that lends a new insight into oversquashing as signal quenching in noisy computation graphs. Building on this, we propose a graph rewiring algorithm aimed at alleviating oversquashing. Our algorithm employs a random local edge flip primitive motivated by an expander graph construction. We compare the spectral expansion properties of our algorithm with that of an existing curvature-based non-local rewiring strategy. Synthetic experiments show that while our algorithm in general has a slower rate of expansion, it is overall computationally cheaper, preserves the node degrees exactly and never disconnects the graph.

LGOct 21, 2022
FoSR: First-order spectral rewiring for addressing oversquashing in GNNs

Kedar Karhadkar, Pradeep Kr. Banerjee, Guido Montúfar

Graph neural networks (GNNs) are able to leverage the structure of graph data by passing messages along the edges of the graph. While this allows GNNs to learn features depending on the graph structure, for certain graph topologies it leads to inefficient information propagation and a problem known as oversquashing. This has recently been linked with the curvature and spectral gap of the graph. On the other hand, adding edges to the message-passing graph can lead to increasingly similar node representations and a problem known as oversmoothing. We propose a computationally efficient algorithm that prevents oversquashing by systematically adding edges to the graph based on spectral expansion. We combine this with a relational architecture, which lets the GNN preserve the original graph structure and provably prevents oversmoothing. We find experimentally that our algorithm outperforms existing graph rewiring methods in several graph classification tasks.

LGFeb 6
Dynamics-Aligned Shared Hypernetworks for Zero-Shot Actuator Inversion

Jan Benad, Pradeep Kr. Banerjee, Frank Röder et al.

Zero-shot generalization in contextual reinforcement learning remains a core challenge, particularly when the context is latent and must be inferred from data. A canonical failure mode is actuator inversion, where identical actions produce opposite physical effects under a latent binary context. We propose DMA*-SH, a framework where a single hypernetwork, trained solely via dynamics prediction, generates a small set of adapter weights shared across the dynamics model, policy, and action-value function. This shared modulation imparts an inductive bias matched to actuator inversion, while input/output normalization and random input masking stabilize context inference, promoting directionally concentrated representations. We provide theoretical support via an expressivity separation result for hypernetwork modulation, and a variance decomposition with policy-gradient variance bounds that formalize how within-mode compression improves learning under actuator inversion. For evaluation, we introduce the Actuator Inversion Benchmark (AIB), a suite of environments designed to isolate discontinuous context-to-dynamics interactions. On AIB's held-out actuator-inversion tasks, DMA*-SH achieves zero-shot generalization, outperforming domain randomization by 111.8% and surpassing a standard context-aware baseline by 16.1%.

LGFeb 19
Learning a Latent Pulse Shape Interface for Photoinjector Laser Systems

Alexander Klemps, Denis Ilia, Pradeep Kr. Banerjee et al.

Controlling the longitudinal laser pulse shape in photoinjectors of Free-Electron Lasers is a powerful lever for optimizing electron beam quality, but systematic exploration of the vast design space is limited by the cost of brute-force pulse propagation simulations. We present a generative modeling framework based on Wasserstein Autoencoders to learn a differentiable latent interface between pulse shaping and downstream beam dynamics. Our empirical findings show that the learned latent space is continuous and interpretable while maintaining high-fidelity reconstructions. Pulse families such as higher-order Gaussians trace coherent trajectories, while standardizing the temporal pulse lengths shows a latent organization correlated with pulse energy. Analysis via principal components and Gaussian Mixture Models reveals a well behaved latent geometry, enabling smooth transitions between distinct pulse types via linear interpolation. The model generalizes from simulated data to real experimental pulse measurements, accurately reconstructing pulses and embedding them consistently into the learned manifold. Overall, the approach reduces reliance on expensive pulse-propagation simulations and facilitates downstream beam dynamics simulation and analysis.

LGAug 27, 2025
Dynamics-Aligned Latent Imagination in Contextual World Models for Zero-Shot Generalization

Frank Röder, Jan Benad, Manfred Eppe et al.

Real-world reinforcement learning demands adaptation to unseen environmental conditions without costly retraining. Contextual Markov Decision Processes (cMDP) model this challenge, but existing methods often require explicit context variables (e.g., friction, gravity), limiting their use when contexts are latent or hard to measure. We introduce Dynamics-Aligned Latent Imagination (DALI), a framework integrated within the Dreamer architecture that infers latent context representations from agent-environment interactions. By training a self-supervised encoder to predict forward dynamics, DALI generates actionable representations conditioning the world model and policy, bridging perception and control. We theoretically prove this encoder is essential for efficient context inference and robust generalization. DALI's latent space enables counterfactual consistency: Perturbing a gravity-encoding dimension alters imagined rollouts in physically plausible ways. On challenging cMDP benchmarks, DALI achieves significant gains over context-unaware baselines, often surpassing context-aware baselines in extrapolation tasks, enabling zero-shot generalization to unseen contextual variations.

LGOct 23, 2021
Learning curves for Gaussian process regression with power-law priors and targets

Hui Jin, Pradeep Kr. Banerjee, Guido Montúfar

We characterize the power-law asymptotics of learning curves for Gaussian process regression (GPR) under the assumption that the eigenspectrum of the prior and the eigenexpansion coefficients of the target function follow a power law. Under similar assumptions, we leverage the equivalence between GPR and kernel ridge regression (KRR) to show the generalization error of KRR. Infinitely wide neural networks can be related to GPR with respect to the neural network GP kernel and the neural tangent kernel, which in several cases is known to have a power-law spectrum. Hence our methods can be applied to study the generalization error of infinitely wide neural networks. We present toy experiments demonstrating the theory.

LGMay 4, 2021
Information Complexity and Generalization Bounds

Pradeep Kr. Banerjee, Guido Montúfar

We present a unifying picture of PAC-Bayesian and mutual information-based upper bounds on the generalization error of randomized learning algorithms. As we show, Tong Zhang's information exponential inequality (IEI) gives a general recipe for constructing bounds of both flavors. We show that several important results in the literature can be obtained as simple corollaries of the IEI under different assumptions on the loss function. Moreover, we obtain new bounds for data-dependent priors and unbounded loss functions. Optimizing the bounds gives rise to variants of the Gibbs algorithm, for which we discuss two practical examples for learning with neural networks, namely, Entropy- and PAC-Bayes- SGD. Further, we use an Occam's factor argument to show a PAC-Bayesian bound that incorporates second-order curvature information of the training loss.

ITJan 23, 2019
Unique Information and Secret Key Decompositions

Johannes Rauh, Pradeep Kr. Banerjee, Eckehard Olbrich et al.

The unique information ($UI$) is an information measure that quantifies a deviation from the Blackwell order. We have recently shown that this quantity is an upper bound on the one-way secret key rate. In this paper, we prove a triangle inequality for the $UI$, which implies that the $UI$ is never greater than one of the best known upper bounds on the two-way secret key rate. We conjecture that the $UI$ lower bounds the two-way rate and discuss implications of the conjecture.

ITOct 27, 2018
The Variational Deficiency Bottleneck

Pradeep Kr. Banerjee, Guido Montúfar

We introduce a bottleneck method for learning data representations based on information deficiency, rather than the more traditional information sufficiency. A variational upper bound allows us to implement this method efficiently. The bound itself is bounded above by the variational information bottleneck objective, and the two methods coincide in the regime of single-shot Monte Carlo approximations. The notion of deficiency provides a principled way of approximating complicated channels by relatively simpler ones. We show that the deficiency of one channel with respect to another has an operational interpretation in terms of the optimal risk gap of decision problems, capturing classification as a special case. Experiments demonstrate that the deficiency bottleneck can provide advantages in terms of minimal sufficiency as measured by information bottleneck curves, while retaining robust test performance in classification tasks.

ITJul 13, 2018
Unique Informations and Deficiencies

Pradeep Kr. Banerjee, Eckehard Olbrich, Jürgen Jost et al.

Given two channels that convey information about the same random variable, we introduce two measures of the unique information of one channel with respect to the other. The two quantities are based on the notion of generalized weighted Le Cam deficiencies and differ on whether one channel can approximate the other by a randomization at either its input or output. We relate the proposed quantities to an existing measure of unique information which we call the minimum-synergy unique information. We give an operational interpretation of the latter in terms of an upper bound on the one-way secret key rate and discuss the role of the unique informations in the context of nonnegative mutual information decompositions into unique, redundant and synergistic components.

SDApr 30, 2015
Noise Sensitivity of Teager-Kaiser Energy Operators and Their Ratios

Pradeep Kr. Banerjee, Nirmal B. Chakrabarti

The Teager-Kaiser energy operator (TKO) belongs to a class of autocorrelators and their linear combination that can track the instantaneous energy of a nonstationary sinusoidal signal source. TKO-based monocomponent AM-FM demodulation algorithms work under the basic assumption that the operator outputs are always positive. In the absence of noise, this is assured for pure sinusoidal inputs and the instantaneous property is also guaranteed. Noise invalidates both of these, particularly under small signal conditions. Post-detection filtering and thresholding are of use to reestablish these at the cost of some time to acquire. Key questions are: (a) how many samples must one use and (b) how much noise power at the detector input can one tolerate. Results of study of the role of delay and the limits imposed by additive Gaussian noise are presented along with the computation of the cumulants and probability density functions of the individual quadratic forms and their ratios.