Pravin Nair

h-index3

8papers

116citations

Novelty53%

AI Score55

Ranked #26,577 of 201,326 authors (top 13%)#11,063 in CV (top 19%)

8 Papers

63.7CVMay 16Code

CAB: Accelerating Flow and Diffusion Sampling via Rectification and Corrected Adams-Bashforth

Anuska Roy, Pravin Nair

Flow and diffusion models achieve high-fidelity, high-resolution image synthesis, but often require many function evaluations (NFEs) at sampling time. Existing acceleration methods either require additional training through distillation or rely on training-free high-order solvers, and both can degrade sample quality at low NFE budgets. We propose CAB (Corrected Adams-Bashforth), a training-free sampler that accelerates both flow and diffusion models. CAB first transforms the sampling dynamics to a common rectified coordinate system, and then applies a multistep Adams-Bashforth predictor augmented with a simple correction term based on past velocity evaluations and therefore incurs no additional NFEs. The resulting method is simple, has the same algorithmic form across model classes, and has at least third-order local truncation error and second-order global error. Experiments on pretrained flow and diffusion models, including class-conditional and large-scale text-to-image benchmarks, show that CAB improves quality-NFE trade-offs in the low-step regime of 6-20 NFEs. It also remains competitive with strong training-free samplers at higher step counts across most tested models. The official implementation is available at https://github.com/Anuska-Roy/CAB.

21.2CVMar 27Code

Provably Contractive and High-Quality Denoisers for Convergent Restoration

Shubhi Shukla, Pravin Nair

Image restoration, the recovery of clean images from degraded measurements, has applications in various domains like surveillance, defense, and medical imaging. Despite achieving state-of-the-art (SOTA) restoration performance, existing convolutional and attention-based networks lack stability guarantees under minor shifts in input, exposing a robustness accuracy trade-off. We develop provably contractive (global Lipschitz $< 1$) denoiser networks that considerably reduce this gap. Our design composes proximal layers obtained from unfolding techniques, with Lipschitz-controlled convolutional refinements. By contractivity, our denoiser guarantees that input perturbations of strength $\|Î´\|\le\varepsilon$ induce at most $\varepsilon$ change at the output, while strong baselines such as DnCNN and Restormer can exhibit larger deviations under the same perturbations. On image denoising, the proposed model is competitive with unconstrained SOTA denoisers, reporting the tightest gap for a provably 1-Lipschitz model and establishing that such gaps are indeed achievable by contractive denoisers. Moreover, the proposed denoisers act as strong regularizers for image restoration that provably effect convergence in Plug-and-Play algorithms. Our results show that enforcing strict Lipschitz control does not inherently degrade output quality, challenging a common assumption in the literature and moving the field toward verifiable and stable vision models. Codes and pretrained models are available at https://github.com/SHUBHI1553/Contractive-Denoisers

LGOct 27, 2025

Softmax is $1/2$-Lipschitz: A tight bound across all $\ell_p$ norms

Pravin Nair

The softmax function is a basic operator in machine learning and optimization, used in classification, attention mechanisms, reinforcement learning, game theory, and problems involving log-sum-exp terms. Existing robustness guarantees of learning models and convergence analysis of optimization algorithms typically consider the softmax operator to have a Lipschitz constant of $1$ with respect to the $\ell_2$ norm. In this work, we prove that the softmax function is contractive with the Lipschitz constant $1/2$, uniformly across all $\ell_p$ norms with $p \ge 1$. We also show that the local Lipschitz constant of softmax attains $1/2$ for $p = 1$ and $p = \infty$, and for $p \in (1,\infty)$, the constant remains strictly below $1/2$ and the supremum $1/2$ is achieved only in the limit. To our knowledge, this is the first comprehensive norm-uniform analysis of softmax Lipschitz continuity. We demonstrate how the sharper constant directly improves a range of existing theoretical results on robustness and convergence. We further validate the sharpness of the $1/2$ Lipschitz constant of the softmax operator through empirical studies on attention-based architectures (ViT, GPT-2, Qwen3-8B) and on stochastic policies in reinforcement learning.

IVJun 10, 2025

Plug-and-Play Linear Attention for Pre-trained Image and Video Restoration Models

Srinivasan Kidambi, Pravin Nair

Multi-head self-attention (MHSA) has become a core component in modern computer vision models. However, its quadratic complexity with respect to input length poses a significant computational bottleneck in real-time and resource constrained environments. We propose PnP-Nystra, a Nyström based linear approximation of self-attention, developed as a plug-and-play (PnP) module that can be integrated into the pre-trained image and video restoration models without retraining. As a drop-in replacement for MHSA, PnP-Nystra enables efficient acceleration in various window-based transformer architectures, including SwinIR, Uformer, and RVRT. Our experiments across diverse image and video restoration tasks, including denoising, deblurring, and super-resolution, demonstrate that PnP-Nystra achieves a 2-4x speed-up on an NVIDIA RTX 4090 GPU and a 2-5x speed-up on CPU inference. Despite these significant gains, the method incurs a maximum PSNR drop of only 1.5 dB across all evaluated tasks. To the best of our knowledge, we are the first to demonstrate a linear attention functioning as a training-free substitute for MHSA in restoration models.

CVMar 28, 2025

Detecting Localized Deepfake Manipulations Using Action Unit-Guided Video Representations

Tharun Anand, Siva Sankar Sajeev, Pravin Nair

With rapid advancements in generative modeling, deepfake techniques are increasingly narrowing the gap between real and synthetic videos, raising serious privacy and security concerns. Beyond traditional face swapping and reenactment, an emerging trend in recent state-of-the-art deepfake generation methods involves localized edits such as subtle manipulations of specific facial features like raising eyebrows, altering eye shapes, or modifying mouth expressions. These fine-grained manipulations pose a significant challenge for existing detection models, which struggle to capture such localized variations. To the best of our knowledge, this work presents the first detection approach explicitly designed to generalize to localized edits in deepfake videos by leveraging spatiotemporal representations guided by facial action units. Our method leverages a cross-attention-based fusion of representations learned from pretext tasks like random masking and action unit detection, to create an embedding that effectively encodes subtle, localized changes. Comprehensive evaluations across multiple deepfake generation methods demonstrate that our approach, despite being trained solely on the traditional FF+ dataset, sets a new benchmark in detecting recent deepfake-generated videos with fine-grained local edits, achieving a $20\%$ improvement in accuracy over current state-of-the-art detection methods. Additionally, our method delivers competitive performance on standard datasets, highlighting its robustness and generalization across diverse types of local and global forgeries.

OCApr 21, 2021

Fixed-Point and Objective Convergence of Plug-and-Play Algorithms

Pravin Nair, Ruturaj G. Gavaskar, Kunal N. Chaudhury

A standard model for image reconstruction involves the minimization of a data-fidelity term along with a regularizer, where the optimization is performed using proximal algorithms such as ISTA and ADMM. In plug-and-play (PnP) regularization, the proximal operator (associated with the regularizer) in ISTA and ADMM is replaced by a powerful image denoiser. Although PnP regularization works surprisingly well in practice, its theoretical convergence -- whether convergence of the PnP iterates is guaranteed and if they minimize some objective function -- is not completely understood even for simple linear denoisers such as nonlocal means. In particular, while there are works where either iterate or objective convergence is established separately, a simultaneous guarantee on iterate and objective convergence is not available for any denoiser to our knowledge. In this paper, we establish both forms of convergence for a special class of linear denoisers. Notably, unlike existing works where the focus is on symmetric denoisers, our analysis covers non-symmetric denoisers such as nonlocal means and almost any convex data-fidelity. The novelty in this regard is that we make use of the convergence theory of averaged operators and we work with a special inner product (and norm) derived from the linear denoiser; the latter requires us to appropriately define the gradient and proximal operators associated with the data-fidelity term. We validate our convergence results using image reconstruction experiments.

CVJan 18, 2019

Fast High-Dimensional Kernel Filtering

Pravin Nair, Kunal N. Chaudhury

The bilateral and nonlocal means filters are instances of kernel-based filters that are popularly used in image processing. It was recently shown that fast and accurate bilateral filtering of grayscale images can be performed using a low-rank approximation of the kernel matrix. More specifically, based on the eigendecomposition of the kernel matrix, the overall filtering was approximated using spatial convolutions, for which efficient algorithms are available. Unfortunately, this technique cannot be scaled to high-dimensional data such as color and hyperspectral images. This is simply because one needs to compute/store a large matrix and perform its eigendecomposition in this case. We show how this problem can be solved using the Nyström method, which is generally used for approximating the eigendecomposition of large matrices. The resulting algorithm can also be used for nonlocal means filtering. We demonstrate the effectiveness of our proposal for bilateral and nonlocal means filtering of color and hyperspectral images. In particular, our method is shown to be competitive with state-of-the-art fast algorithms, and moreover it comes with a theoretical guarantee on the approximation error.

CVNov 6, 2018

Fast High-Dimensional Bilateral and Nonlocal Means Filtering

Pravin Nair, Kunal. N. Chaudhury

Existing fast algorithms for bilateral and nonlocal means filtering mostly work with grayscale images. They cannot easily be extended to high-dimensional data such as color and hyperspectral images, patch-based data, flow-fields, etc. In this paper, we propose a fast algorithm for high-dimensional bilateral and nonlocal means filtering. Unlike existing approaches, where the focus is on approximating the data (using quantization) or the filter kernel (via analytic expansions), we locally approximate the kernel using weighted and shifted copies of a Gaussian, where the weights and shifts are inferred from the data. The algorithm emerging from the proposed approximation essentially involves clustering and fast convolutions, and is easy to implement. Moreover, a variant of our algorithm comes with a guarantee (bound) on the approximation error, which is not enjoyed by existing algorithms. We present some results for high-dimensional bilateral and nonlocal means filtering to demonstrate the speed and accuracy of our proposal. Moreover, we also show that our algorithm can outperform state-of-the-art fast approximations in terms of accuracy and timing.