Yingxue Zhou

10papers

505citations

Novelty53%

AI Score30

Ranked #145,538 of 201,326 authors (top 72%)#32,043 in LG (top 75%)

10 Papers

IRAug 28, 2023

RecMind: Large Language Model Powered Agent For Recommendation

Yancheng Wang, Ziyan Jiang, Zheng Chen et al. · amazon-science

While the recommendation system (RS) has advanced significantly through deep learning, current RS approaches usually train and fine-tune models on task-specific datasets, limiting their generalizability to new recommendation tasks and their ability to leverage external knowledge due to model scale and data size constraints. Thus, we designed an LLM-powered autonomous recommender agent, RecMind, which is capable of leveraging external knowledge, utilizing tools with careful planning to provide zero-shot personalized recommendations. We propose a Self-Inspiring algorithm to improve the planning ability. At each intermediate step, the LLM self-inspires to consider all previously explored states to plan for the next step. This mechanism greatly improves the model's ability to comprehend and utilize historical information in planning for recommendation. We evaluate RecMind's performance in various recommendation scenarios. Our experiment shows that RecMind outperforms existing zero/few-shot LLM-based recommendation baseline methods in various tasks and achieves comparable performance to a fully trained recommendation model P5.

LGJun 11, 2024

Loss Gradient Gaussian Width based Generalization and Optimization Guarantees

Arindam Banerjee, Qiaobo Li, Yingxue Zhou

Generalization and optimization guarantees on the population loss often rely on uniform convergence based analysis, typically based on the Rademacher complexity of the predictors. The rich representation power of modern models has led to concerns about this approach. In this paper, we present generalization and optimization guarantees in terms of the complexity of the gradients, as measured by the Loss Gradient Gaussian Width (LGGW). First, we introduce generalization guarantees directly in terms of the LGGW under a flexible gradient domination condition, which includes the popular PL (Polyak-Łojasiewicz) condition as a special case. Second, we show that sample reuse in iterative gradient descent does not make the empirical gradients deviate from the population gradients as long as the LGGW is small. Third, focusing on deep networks, we bound their single-sample LGGW in terms of the Gaussian width of the featurizer, i.e., the output of the last-but-one layer. To our knowledge, our generalization and optimization guarantees in terms of LGGW are the first results of its kind, and hold considerable promise towards quantitatively tight bounds for deep models.

LGJan 9, 2022

Stability Based Generalization Bounds for Exponential Family Langevin Dynamics

Arindam Banerjee, Tiancong Chen, Xinyan Li et al.

Recent years have seen advances in generalization bounds for noisy stochastic algorithms, especially stochastic gradient Langevin dynamics (SGLD) based on stability (Mou et al., 2018; Li et al., 2020) and information theoretic approaches (Xu and Raginsky, 2017; Negrea et al., 2019; Steinke and Zakynthinou, 2020). In this paper, we unify and substantially generalize stability based generalization bounds and make three technical contributions. First, we bound the generalization error in terms of expected (not uniform) stability which arguably leads to quantitatively sharper bounds. Second, as our main contribution, we introduce Exponential Family Langevin Dynamics (EFLD), a substantial generalization of SGLD, which includes noisy versions of Sign-SGD and quantized SGD as special cases. We establish data-dependent expected stability based generalization bounds for any EFLD algorithm with a O(1/n) sample dependence and dependence on gradient discrepancy rather than the norm of gradients, yielding significantly sharper bounds. Third, we establish optimization guarantees for special cases of EFLD. Further, empirical results on benchmarks illustrate that our bounds are non-vacuous, quantitatively sharper than existing bounds, and behave correctly under noisy labels.

LGFeb 26, 2021

Noisy Truncated SGD: Optimization and Generalization

Yingxue Zhou, Xinyan Li, Arindam Banerjee

Recent empirical work on stochastic gradient descent (SGD) applied to over-parameterized deep learning has shown that most gradient components over epochs are quite small. Inspired by such observations, we rigorously study properties of Truncated SGD (T-SGD), that truncates the majority of small gradient components to zeros. Considering non-convex optimization problems, we show that the convergence rate of T-SGD matches the order of vanilla SGD. We also establish the generalization error bound for T-SGD. Further, we propose Noisy Truncated SGD (NT-SGD), which adds Gaussian noise to the truncated gradients. We prove that NT-SGD has the same convergence rate as T-SGD for non-convex optimization problems. We demonstrate that with the help of noise, NT-SGD can provably escape from saddle points and requires less noise compared to previous related work. We also prove that NT-SGD achieves better generalization error bound compared to T-SGD because of the noise. Our generalization analysis is based on uniform stability and we show that additional noise in the gradient update can boost the stability. Our experiments on a variety of benchmark datasets (MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100) with various networks (VGG and ResNet) validate the theoretical properties of NT-SGD, i.e., NT-SGD matches the speed and accuracy of vanilla SGD while effectively working with sparse gradients, and can successfully escape poor local minima.

LGJul 7, 2020

Bypassing the Ambient Dimension: Private SGD with Gradient Subspace Identification

Yingxue Zhou, Zhiwei Steven Wu, Arindam Banerjee

Differentially private SGD (DP-SGD) is one of the most popular methods for solving differentially private empirical risk minimization (ERM). Due to its noisy perturbation on each gradient update, the error rate of DP-SGD scales with the ambient dimension $p$, the number of parameters in the model. Such dependence can be problematic for over-parameterized models where $p \gg n$, the number of training samples. Existing lower bounds on private ERM show that such dependence on $p$ is inevitable in the worst case. In this paper, we circumvent the dependence on the ambient dimension by leveraging a low-dimensional structure of gradient space in deep networks -- that is, the stochastic gradients for deep nets usually stay in a low dimensional subspace in the training process. We propose Projected DP-SGD that performs noise reduction by projecting the noisy gradients to a low-dimensional subspace, which is given by the top gradient eigenspace on a small public dataset. We provide a general sample complexity analysis on the public dataset for the gradient subspace identification problem and demonstrate that under certain low-dimensional assumptions the public sample complexity only grows logarithmically in $p$. Finally, we provide a theoretical analysis and empirical evaluations to show that our method can substantially improve the accuracy of DP-SGD in the high privacy regime (corresponding to low privacy loss $ε$).

LGJun 24, 2020

Private Stochastic Non-Convex Optimization: Adaptive Algorithms and Tighter Generalization Bounds

Yingxue Zhou, Xiangyi Chen, Mingyi Hong et al.

We study differentially private (DP) algorithms for stochastic non-convex optimization. In this problem, the goal is to minimize the population loss over a $p$-dimensional space given $n$ i.i.d. samples drawn from a distribution. We improve upon the population gradient bound of ${\sqrt{p}}/{\sqrt{n}}$ from prior work and obtain a sharper rate of $\sqrt[4]{p}/\sqrt{n}$. We obtain this rate by providing the first analyses on a collection of private gradient-based methods, including adaptive algorithms DP RMSProp and DP Adam. Our proof technique leverages the connection between differential privacy and adaptive data analysis to bound gradient estimation error at every iterate, which circumvents the worse generalization bound from the standard uniform convergence argument. Finally, we evaluate the proposed algorithms on two popular deep learning tasks and demonstrate the empirical advantages of DP adaptive gradient methods over standard DP SGD.

LGFeb 23, 2020

De-randomized PAC-Bayes Margin Bounds: Applications to Non-convex and Non-smooth Predictors

Arindam Banerjee, Tiancong Chen, Yingxue Zhou

In spite of several notable efforts, explaining the generalization of deterministic non-smooth deep nets, e.g., ReLU-nets, has remained challenging. Existing approaches for deterministic non-smooth deep nets typically need to bound the Lipschitz constant of such deep nets but such bounds are quite large, may even increase with the training set size yielding vacuous generalization bounds. In this paper, we present a new family of de-randomized PAC-Bayes margin bounds for deterministic non-convex and non-smooth predictors, e.g., ReLU-nets. Unlike PAC-Bayes, which applies to Bayesian predictors, the de-randomized bounds apply to deterministic predictors like ReLU-nets. A specific instantiation of the bound depends on a trade-off between the (weighted) distance of the trained weights from the initialization and the effective curvature (`flatness') of the trained predictor. To get to these bounds, we first develop a de-randomization argument for non-convex but smooth predictors, e.g., linear deep networks (LDNs), which connects the performance of the deterministic predictor with a Bayesian predictor. We then consider non-smooth predictors which for any given input realized as a smooth predictor, e.g., ReLU-nets become some LDNs for any given input, but the realized smooth predictors can be different for different inputs. For such non-smooth predictors, we introduce a new PAC-Bayes analysis which takes advantage of the smoothness of the realized predictors, e.g., LDN, for a given input, and avoids dependency on the Lipschitz constant of the non-smooth predictor. After careful de-randomization, we get a bound for the deterministic non-smooth predictor. We also establish non-uniform sample complexity results based on such bounds. Finally, we present extensive empirical results of our bounds over changing training set size and randomness in labels.

LGJul 24, 2019

Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization

Xinyan Li, Qilong Gu, Yingxue Zhou et al.

While stochastic gradient descent (SGD) and variants have been surprisingly successful for training deep nets, several aspects of the optimization dynamics and generalization are still not well understood. In this paper, we present new empirical observations and theoretical results on both the optimization dynamics and generalization behavior of SGD for deep nets based on the Hessian of the training loss and associated quantities. We consider three specific research questions: (1) what is the relationship between the Hessian of the loss and the second moment of stochastic gradients (SGs)? (2) how can we characterize the stochastic optimization dynamics of SGD with fixed and adaptive step sizes and diagonal pre-conditioning based on the first and second moments of SGs? and (3) how can we characterize a scale-invariant generalization bound of deep nets based on the Hessian of the loss, which by itself is not scale invariant? We shed light on these three questions using theoretical results supported by extensive empirical observations, with experiments on synthetic data, MNIST, and CIFAR-10, with different batch sizes, and with different difficulty levels by synthetically adding random labels.

DCFeb 21, 2016

Distributed Private Online Learning for Social Big Data Computing over Data Center Networks

Chencheng Li, Pan Zhou, Yingxue Zhou et al.

With the rapid growth of Internet technologies, cloud computing and social networks have become ubiquitous. An increasing number of people participate in social networks and massive online social data are obtained. In order to exploit knowledge from copious amounts of data obtained and predict social behavior of users, we urge to realize data mining in social networks. Almost all online websites use cloud services to effectively process the large scale of social data, which are gathered from distributed data centers. These data are so large-scale, high-dimension and widely distributed that we propose a distributed sparse online algorithm to handle them. Additionally, privacy-protection is an important point in social networks. We should not compromise the privacy of individuals in networks, while these social data are being learned for data mining. Thus we also consider the privacy problem in this article. Our simulations shows that the appropriate sparsity of data would enhance the performance of our algorithm and the privacy-preserving method does not significantly hurt the performance of the proposed algorithm.

LGSep 1, 2015

Differentially Private Online Learning for Cloud-Based Video Recommendation with Multimedia Big Data in Social Networks

Pan Zhou, Yingxue Zhou, Dapeng Wu et al.

With the rapid growth in multimedia services and the enormous offers of video contents in online social networks, users have difficulty in obtaining their interests. Therefore, various personalized recommendation systems have been proposed. However, they ignore that the accelerated proliferation of social media data has led to the big data era, which has greatly impeded the process of video recommendation. In addition, none of them has considered both the privacy of users' contexts (e,g., social status, ages and hobbies) and video service vendors' repositories, which are extremely sensitive and of significant commercial value. To handle the problems, we propose a cloud-assisted differentially private video recommendation system based on distributed online learning. In our framework, service vendors are modeled as distributed cooperative learners, recommending videos according to user's context, while simultaneously adapting the video-selection strategy based on user-click feedback to maximize total user clicks (reward). Considering the sparsity and heterogeneity of big social media data, we also propose a novel geometric differentially private model, which can greatly reduce the performance (recommendation accuracy) loss. Our simulation shows the proposed algorithms outperform other existing methods and keep a delicate balance between computing accuracy and privacy preserving level.