Pedro Cisneros-Velarde

LG
h-index10
13papers
41citations
Novelty54%
AI Score47

13 Papers

LGSep 29, 2022
Restricted Strong Convexity of Deep Learning Models with Smooth Activations

Arindam Banerjee, Pedro Cisneros-Velarde, Libin Zhu et al.

We consider the problem of optimization of deep learning models with smooth activation functions. While there exist influential results on the problem from the ``near initialization'' perspective, we shed considerable new light on the problem. In particular, we make two key technical contributions for such models with $L$ layers, $m$ width, and $σ_0^2$ initialization variance. First, for suitable $σ_0^2$, we establish a $O(\frac{\text{poly}(L)}{\sqrt{m}})$ upper bound on the spectral norm of the Hessian of such models, considerably sharpening prior results. Second, we introduce a new analysis of optimization based on Restricted Strong Convexity (RSC) which holds as long as the squared norm of the average gradient of predictors is $Ω(\frac{\text{poly}(L)}{\sqrt{m}})$ for the square loss. We also present results for more general losses. The RSC based analysis does not need the ``near initialization" perspective and guarantees geometric convergence for gradient descent (GD). To the best of our knowledge, ours is the first result on establishing geometric convergence of GD based on RSC for deep learning models, thus becoming an alternative sufficient condition for convergence that does not depend on the widely-used Neural Tangent Kernel (NTK). We share preliminary experimental results supporting our theoretical advances.

AIJun 7, 2022
Discrete State-Action Abstraction via the Successor Representation

Amnon Attali, Pedro Cisneros-Velarde, Marco Morales et al.

While the difficulty of reinforcement learning problems is typically related to the complexity of their state spaces, Abstraction proposes that solutions often lie in simpler underlying latent spaces. Prior works have focused on learning either a continuous or dense abstraction, or require a human to provide one. Information-dense representations capture features irrelevant for solving tasks, and continuous spaces can struggle to represent discrete objects. In this work we automatically learn a sparse discrete abstraction of the underlying environment. We do so using a simple end-to-end trainable model based on the successor representation and max-entropy regularization. We describe an algorithm to apply our model, named Discrete State-Action Abstraction (DSAA), which computes an action abstraction in the form of temporally extended actions, i.e., Options, to transition between discrete abstract states. Empirically, we demonstrate the effects of different exploration schemes on our resulting abstraction, and show that it is efficient for solving downstream tasks.

LGMay 31, 2022
One Policy is Enough: Parallel Exploration with a Single Policy is Near-Optimal for Reward-Free Reinforcement Learning

Pedro Cisneros-Velarde, Boxiang Lyu, Sanmi Koyejo et al.

Although parallelism has been extensively used in reinforcement learning (RL), the quantitative effects of parallel exploration are not well understood theoretically. We study the benefits of simple parallel exploration for reward-free RL in linear Markov decision processes (MDPs) and two-player zero-sum Markov games (MGs). In contrast to the existing literature, which focuses on approaches that encourage agents to explore a diverse set of policies, we show that using a single policy to guide exploration across all agents is sufficient to obtain an almost-linear speedup in all cases compared to their fully sequential counterpart. Furthermore, we demonstrate that this simple procedure is near-minimax optimal in the reward-free setting for linear MDPs. From a practical perspective, our paper shows that a single policy is sufficient and provably near-optimal for incorporating parallelism during the exploration phase.

LGSep 13, 2024
Optimization and Generalization Guarantees for Weight Normalization

Pedro Cisneros-Velarde, Zhijie Chen, Sanmi Koyejo et al.

Weight normalization (WeightNorm) is widely used in practice for the training of deep neural networks and modern deep learning libraries have built-in implementations of it. In this paper, we provide the first theoretical characterizations of both optimization and generalization of deep WeightNorm models with smooth activation functions. For optimization, from the form of the Hessian of the loss, we note that a small Hessian of the predictor leads to a tractable analysis. Thus, we bound the spectral norm of the Hessian of WeightNorm networks and show its dependence on the network width and weight normalization terms--the latter being unique to networks without WeightNorm. Then, we use this bound to establish training convergence guarantees under suitable assumptions for gradient decent. For generalization, we use WeightNorm to get a uniform convergence based generalization bound, which is independent from the width and depends sublinearly on the depth. Finally, we present experimental results which illustrate how the normalization terms and other quantities of theoretical interest relate to the training of WeightNorm networks.

LGMar 1, 2023
Finite-sample Guarantees for Nash Q-learning with Linear Function Approximation

Pedro Cisneros-Velarde, Sanmi Koyejo

Nash Q-learning may be considered one of the first and most known algorithms in multi-agent reinforcement learning (MARL) for learning policies that constitute a Nash equilibrium of an underlying general-sum Markov game. Its original proof provided asymptotic guarantees and was for the tabular case. Recently, finite-sample guarantees have been provided using more modern RL techniques for the tabular case. Our work analyzes Nash Q-learning using linear function approximation -- a representation regime introduced when the state space is large or continuous -- and provides finite-sample guarantees that indicate its sample efficiency. We find that the obtained performance nearly matches an existing efficient result for single-agent RL under the same representation and has a polynomial gap when compared to the best-known result for the tabular case.

CLFeb 5
Can One-sided Arguments Lead to Response Change in Large Language Models?

Pedro Cisneros-Velarde

Polemic questions need more than one viewpoint to express a balanced answer. Large Language Models (LLMs) can provide a balanced answer, but also take a single aligned viewpoint or refuse to answer. In this paper, we study if such initial responses can be steered to a specific viewpoint in a simple and intuitive way: by only providing one-sided arguments supporting the viewpoint. Our systematic study has three dimensions: (i) which stance is induced in the LLM response, (ii) how the polemic question is formulated, (iii) how the arguments are shown. We construct a small dataset and remarkably find that opinion steering occurs across (i)-(iii) for diverse models, number of arguments, and topics. Switching to other arguments consistently decreases opinion steering.

AIOct 10, 2025Code
GTAlign: Game-Theoretic Alignment of LLM Assistants for Social Welfare

Siqi Zhu, David Zhang, Pedro Cisneros-Velarde et al.

Large Language Models (LLMs) have achieved remarkable progress in reasoning, yet sometimes produce responses that are suboptimal for users in tasks such as writing, information seeking, or providing practical guidance. Conventional alignment practices typically assume that maximizing model reward also maximizes user welfare, but this assumption frequently fails in practice: models may over-clarify or generate overly verbose reasoning when users prefer concise answers. Such behaviors resemble the prisoner's dilemma, where individually rational choices lead to socially suboptimal outcomes. The fundamental challenge is the lack of a principled decision making mechanism that mutually benefits both the LLM and the user. We propose Game-Theoretic Alignment (GTAlign), an alignment framework that integrates game-theoretic decision making into both reasoning and training. During reasoning, the model explicitly treats user-LLM interaction as a strategic game: it constructs payoff matrices within its reasoning chain to estimate welfare for both itself and the user, and then selects actions that are mutually beneficial. During training, we introduce a social welfare reward that reinforces cooperative responses, aligning model behavior with socially efficient outcomes. In addition, we introduce an inference technique that leverages game-theoretic reasoning to dynamically adapt LLM's response when pricing policies of LLM service change. Extensive experiments demonstrate that GTAlign substantially improves reasoning efficiency, answer quality, and social welfare compared to baselines across diverse tasks. The code is available at https://github.com/ulab-uiuc/GTAlign .

CLApr 9, 2025
Bypassing Safety Guardrails in LLMs Using Humor

Pedro Cisneros-Velarde

In this paper, we show it is possible to bypass the safety guardrails of large language models (LLMs) through a humorous prompt including the unsafe request. In particular, our method does not edit the unsafe request and follows a fixed template -- it is simple to implement and does not need additional LLMs to craft prompts. Extensive experiments show the effectiveness of our method across different LLMs. We also show that both removing and adding more humor to our method can reduce its effectiveness -- excessive humor possibly distracts the LLM from fulfilling its unsafe request. Thus, we argue that LLM jailbreaking occurs when there is a proper balance between focus on the unsafe request and presence of humor.

CVOct 20, 2025
It Takes Two to Tango: Two Parallel Samplers Improve Quality in Diffusion Models for Limited Steps

Pedro Cisneros-Velarde

We consider the situation where we have a limited number of denoising steps, i.e., of evaluations of a diffusion model. We show that two parallel processors or samplers under such limitation can improve the quality of the sampled image. Particularly, the two samplers make denoising steps at successive times, and their information is appropriately integrated in the latent image. Remarkably, our method is simple both conceptually and to implement: it is plug-&-play, model agnostic, and does not require any additional fine-tuning or external models. We test our method with both automated and human evaluations for different diffusion models. We also show that a naive integration of the information from the two samplers lowers sample quality. Finally, we find that adding more parallel samplers does not necessarily improve sample quality.

LGFeb 2, 2025
Optimization for Neural Operators can Benefit from Width

Pedro Cisneros-Velarde, Bhavesh Shrimali, Arindam Banerjee

Neural Operators that directly learn mappings between function spaces, such as Deep Operator Networks (DONs) and Fourier Neural Operators (FNOs), have received considerable attention. Despite the universal approximation guarantees for DONs and FNOs, there is currently no optimization convergence guarantee for learning such networks using gradient descent (GD). In this paper, we address this open problem by presenting a unified framework for optimization based on GD and applying it to establish convergence guarantees for both DONs and FNOs. In particular, we show that the losses associated with both of these neural operators satisfy two conditions -- restricted strong convexity (RSC) and smoothness -- that guarantee a decrease on their loss values due to GD. Remarkably, these two conditions are satisfied for each neural operator due to different reasons associated with the architectural differences of the respective models. One takeaway that emerges from the theory is that wider networks should lead to better optimization convergence for both DONs and FNOs. We present empirical results on canonical operator learning problems to support our theoretical results.

MAJun 18, 2024
On the Principles behind Opinion Dynamics in Multi-Agent Systems of Large Language Models

Pedro Cisneros-Velarde

We study the evolution of opinions inside a population of interacting large language models (LLMs). Every LLM needs to decide how much funding to allocate to an item with three initial possibilities: full, partial, or no funding. We identify biases that drive the exchange of opinions based on the LLM's tendency to find consensus with the other LLM's opinion, display caution when specifying funding, and consider ethical concerns in its opinion. We find these biases are affected by the perceived absence of compelling reasons for opinion change, the perceived willingness to engage in discussion, and the distribution of allocation values. Moreover, tensions among biases can lead to the survival of funding for items with negative connotations. We also find that the final distribution of full, partial, and no funding opinions is more diverse when an LLM freely forms its opinion after an interaction than when its opinion is a multiple-choice selection among the three allocation options. In the latter case, consensus is mostly attained. When agents are aware of past opinions, they seek to maintain consistency with them, changing the opinion dynamics. Our study is performed using Llama 3 and Mistral LLMs.

OCMay 18, 2021
A Contraction Theory Approach to Optimization Algorithms from Acceleration Flows

Pedro Cisneros-Velarde, Francesco Bullo

Much recent interest has focused on the design of optimization algorithms from the discretization of an associated optimization flow, i.e., a system of differential equations (ODEs) whose trajectories solve an associated optimization problem. Such a design approach poses an important problem: how to find a principled methodology to design and discretize appropriate ODEs. This paper aims to provide a solution to this problem through the use of contraction theory. We first introduce general mathematical results that explain how contraction theory guarantees the stability of the implicit and explicit Euler integration methods. Then, we propose a novel system of ODEs, namely the Accelerated-Contracting-Nesterov flow, and use contraction theory to establish it is an optimization flow with exponential convergence rate, from which the linear convergence rate of its associated optimization algorithm is immediately established. Remarkably, a simple explicit Euler discretization of this flow corresponds to the Nesterov acceleration method. Finally, we present how our approach leads to performance guarantees in the design of optimization algorithms for time-varying optimization problems.

MLMay 22, 2019
Distributionally Robust Formulation and Model Selection for the Graphical Lasso

Pedro Cisneros-Velarde, Sang-Yun Oh, Alexander Petersen

Building on a recent framework for distributionally robust optimization, we consider estimation of the inverse covariance matrix for multivariate data. We provide a novel notion of a Wasserstein ambiguity set specifically tailored to this estimation problem, leading to a tractable class of regularized estimators. Special cases include penalized likelihood estimators for Gaussian data, specifically the graphical lasso estimator. As a consequence of this formulation, the radius of the Wasserstein ambiguity set is directly related to the regularization parameter in the estimation problem. Using this relationship, the level of robustness of the estimation procedure can be shown to correspond to the level of confidence with which the ambiguity set contains a distribution with the population covariance. Furthermore, a unique feature of our formulation is that the radius can be expressed in closed-form as a function of the ordinary sample covariance matrix. Taking advantage of this finding, we develop a simple algorithm to determine a regularization parameter for graphical lasso, using only the bootstrapped sample covariance matrices, meaning that computationally expensive repeated evaluation of the graphical lasso algorithm is not necessary. Alternatively, the distributionally robust formulation can also quantify the robustness of the corresponding estimator if one uses an off-the-shelf method such as cross-validation. Finally, we numerically study the obtained regularization criterion and analyze the robustness of other automated tuning procedures used in practice.