LGMay 15
On the Convergence and Straightness of Rectified FlowVansh Bansal, Saptarshi Roy, Purnamrita Sarkar et al.
Flow Matching has become a cornerstone of modern generative models like Stable Diffusion 3, largely due to the efficiency of its Rectified Flow (RF) variant. The success of RF hinges on iteratively learning straight trajectories, pushing generation towards fewer sampling steps. However, the theoretical link between path geometry and sampling efficiency has been underexplored. This paper fills this gap by introducing a novel \textit{Piecewise Straightness} parameter, $γ_{2,T}$. We establish the first Wasserstein convergence bound that explicitly links the discretization error of \textit{any} general flow-model to $γ_{2,T}$, proving that minimizing curvature is the key to achieving high-fidelity, one-step sampling. Building on this theory, we establish the first theoretical framework to analyze the straightness of RF. We begin by offering intuitive geometric arguments for simple cases before identifying sufficient conditions under which a single rectification step (1-RF) yields a perfectly straight or even a Monge optimal coupling. While whether these sufficient conditions are met depends on the problem geometry, they enable the first concrete proofs in this area. Critically, fulfilling these conditions makes the subsequent flow (2-RF) perfectly straight ($γ_{2,T}=0$). This eliminates the discretization error in our bound and makes flawless, single-step sampling possible.
MLNov 11, 2022
Thompson Sampling for High-Dimensional Sparse Linear Contextual BanditsSunrit Chakraborty, Saptarshi Roy, Ambuj Tewari
We consider the stochastic linear contextual bandit problem with high-dimensional features. We analyze the Thompson sampling algorithm using special classes of sparsity-inducing priors (e.g., spike-and-slab) to model the unknown parameter and provide a nearly optimal upper bound on the expected cumulative regret. To the best of our knowledge, this is the first work that provides theoretical guarantees of Thompson sampling in high-dimensional and sparse contextual bandits. For faster computation, we use variational inference instead of Markov Chain Monte Carlo (MCMC) to approximate the posterior distribution. Extensive simulations demonstrate the improved performance of our proposed algorithm over existing ones.
MLOct 11, 2023
On the Computational Complexity of Private High-dimensional Model SelectionSaptarshi Roy, Zehua Wang, Ambuj Tewari
We consider the problem of model selection in a high-dimensional sparse linear regression model under privacy constraints. We propose a differentially private (DP) best subset selection method with strong statistical utility properties by adopting the well-known exponential mechanism for selecting the best model. To achieve computational expediency, we propose an efficient Metropolis-Hastings algorithm and under certain regularity conditions, we establish that it enjoys polynomial mixing time to its stationary distribution. As a result, we also establish both approximate differential privacy and statistical utility for the estimates of the mixed Metropolis-Hastings chain. Finally, we perform some illustrative experiments on simulated data showing that our algorithm can quickly identify active features under reasonable privacy budget constraints.
LGJun 7, 2022
How does overparametrization affect performance on minority groups?Subha Maity, Saptarshi Roy, Songkai Xue et al.
The benefits of overparameterization for the overall performance of modern machine learning (ML) models are well known. However, the effect of overparameterization at a more granular level of data subgroups is less understood. Recent empirical studies demonstrate encouraging results: (i) when groups are not known, overparameterized models trained with empirical risk minimization (ERM) perform better on minority groups; (ii) when groups are known, ERM on data subsampled to equalize group sizes yields state-of-the-art worst-group-accuracy in the overparameterized regime. In this paper, we complement these empirical studies with a theoretical investigation of the risk of overparameterized random feature models on minority groups. In a setting in which the regression functions for the majority and minority groups are different, we show that overparameterization always improves minority group performance.
STJan 9, 2023
On Consistency and Asymptotic Normality of Least Absolute Deviation Estimators for 2-dimensional Sinusoidal ModelSaptarshi Roy, Amit Mitra, N K Archak
Estimation of the parameters of a 2-dimensional sinusoidal model is a fundamental problem in digital signal processing and time series analysis. In this paper, we propose a robust least absolute deviation (LAD) estimators for parameter estimation. The proposed methodology provides a robust alternative to non-robust estimation techniques like the least squares estimators, in situations where outliers are present in the data or in the presence of heavy tailed noise. We study important asymptotic properties of the LAD estimators and establish the strong consistency and asymptotic normality of the LAD estimators of the signal parameters of a 2-dimensional sinusoidal model. We further illustrate the advantage of using LAD estimators over least squares estimators through extensive simulation studies. Data analysis of a 2-dimensional texture data indicates practical applicability of the proposed LAD approach.
LGOct 19, 2024Code
On the Wasserstein Convergence and Straightness of Rectified FlowVansh Bansal, Saptarshi Roy, Purnamrita Sarkar et al.
Diffusion models have emerged as a powerful tool for image generation and denoising. Typically, generative models learn a trajectory between the starting noise distribution and the target data distribution. Recently Liu et al. (2023b) proposed Rectified Flow (RF), a generative model that aims to learn straight flow trajectories from noise to data using a sequence of convex optimization problems with close ties to optimal transport. If the trajectory is curved, one must use many Euler discretization steps or novel strategies, such as exponential integrators, to achieve a satisfactory generation quality. In contrast, RF has been shown to theoretically straighten the trajectory through successive rectifications, reducing the number of function evaluations (NFEs) while sampling. It has also been shown empirically that RF may improve the straightness in two rectifications if one can solve the underlying optimization problem within a sufficiently small error. In this paper, we make two contributions. First, we provide a theoretical analysis of the Wasserstein distance between the sampling distribution of RF and the target distribution. Our error rate is characterized by the number of discretization steps and a novel formulation of straightness stronger than that in the original work. Secondly, we present general conditions guaranteeing uniqueness and straightness of 1-RF, which is in line with previous empirical findings. As a byproduct of our analysis, we show that, in one dimension, RF started at the standard Gaussian distribution yields the Monge map. Additionally, we also present empirical results on both simulated and real datasets to validate our theoretical findings. The code is available at https://github.com/bansal-vansh/rectified-flow.
LGAug 20, 2024
Feature Selection from Differentially Private CorrelationsRyan Swope, Amol Khanna, Philip Doldo et al.
Data scientists often seek to identify the most important features in high-dimensional datasets. This can be done through $L_1$-regularized regression, but this can become inefficient for very high-dimensional datasets. Additionally, high-dimensional regression can leak information about individual datapoints in a dataset. In this paper, we empirically evaluate the established baseline method for feature selection with differential privacy, the two-stage selection technique, and show that it is not stable under sparsity. This makes it perform poorly on real-world datasets, so we consider a different approach to private feature selection. We employ a correlations-based order statistic to choose important features from a dataset and privatize them to ensure that the results do not leak information about individual datapoints. We find that our method significantly outperforms the established baseline for private feature selection on many datasets.
MLMay 22, 2024
FLIPHAT: Joint Differential Privacy for High Dimensional Sparse Linear BanditsSunrit Chakraborty, Saptarshi Roy, Debabrota Basu
High dimensional sparse linear bandits serve as an efficient model for sequential decision-making problems (e.g. personalized medicine), where high dimensional features (e.g. genomic data) on the users are available, but only a small subset of them are relevant. Motivated by data privacy concerns in these applications, we study the joint differentially private high dimensional sparse linear bandits, where both rewards and contexts are considered as private data. First, to quantify the cost of privacy, we derive a lower bound on the regret achievable in this setting. To further address the problem, we design a computationally efficient bandit algorithm, \textbf{F}orgetfu\textbf{L} \textbf{I}terative \textbf{P}rivate \textbf{HA}rd \textbf{T}hresholding (FLIPHAT). Along with doubling of episodes and episodic forgetting, FLIPHAT deploys a variant of Noisy Iterative Hard Thresholding (N-IHT) algorithm as a sparse linear regression oracle to ensure both privacy and regret-optimality. We show that FLIPHAT achieves optimal regret in terms of privacy parameters $ε, δ$, context dimension $d$, and time horizon $T$ up to a linear factor in model sparsity and logarithmic factor in $d$. We analyze the regret by providing a novel refined analysis of the estimation error of N-IHT, which is of parallel interest.
MLJan 21
Low-Dimensional Adaptation of Rectified Flow: A New Perspective through the Lens of Diffusion and Stochastic LocalizationSaptarshi Roy, Alessandro Rinaldo, Purnamrita Sarkar
In recent years, Rectified flow (RF) has gained considerable popularity largely due to its generation efficiency and state-of-the-art performance. In this paper, we investigate the degree to which RF automatically adapts to the intrinsic low dimensionality of the support of the target distribution to accelerate sampling. We show that, using a carefully designed choice of the time-discretization scheme and with sufficiently accurate drift estimates, the RF sampler enjoys an iteration complexity of order $O(k/\varepsilon)$ (up to log factors), where $\varepsilon$ is the precision in total variation distance and $k$ is the intrinsic dimension of the target distribution. In addition, we show that the denoising diffusion probabilistic model (DDPM) procedure is equivalent to a stochastic version of RF by establishing a novel connection between these processes and stochastic localization. Building on this connection, we further design a stochastic RF sampler that also adapts to the low-dimensionality of the target distribution under milder requirements on the accuracy of the drift estimates, and also with a specific time schedule. We illustrate with simulations on the synthetic data and text-to-image data experiments the improved performance of the proposed samplers implementing the newly designed time-discretization schedules.