Yihan Zhang

h-index19

12papers

58citations

Novelty52%

AI Score44

Ranked #46,394 of 194,257 authors (top 24%)#10,679 in LG (top 27%)

12 Papers

10.7LGApr 20, 2023

Federated Compositional Deep AUC Maximization

Xinwen Zhang, Yihan Zhang, Tianbao Yang et al.

Federated learning has attracted increasing attention due to the promise of balancing privacy and large-scale learning; numerous approaches have been proposed. However, most existing approaches focus on problems with balanced data, and prediction performance is far from satisfactory for many real-world applications where the number of samples in different classes is highly imbalanced. To address this challenging problem, we developed a novel federated learning method for imbalanced data by directly optimizing the area under curve (AUC) score. In particular, we formulate the AUC maximization problem as a federated compositional minimax optimization problem, develop a local stochastic compositional gradient descent ascent with momentum algorithm, and provide bounds on the computational and communication complexities of our algorithm. To the best of our knowledge, this is the first work to achieve such favorable theoretical results. Finally, extensive experimental results confirm the efficacy of our method.

6.6STNov 21, 2022

Precise Asymptotics for Spectral Methods in Mixed Generalized Linear Models

Yihan Zhang, Marco Mondelli, Ramji Venkataramanan

In a mixed generalized linear model, the goal is to learn multiple signals from unlabeled observations: each sample comes from exactly one signal, but it is not known which one. We consider the prototypical problem of estimating two statistically independent signals in a mixed generalized linear model with Gaussian covariates. Spectral methods are a popular class of estimators which output the top two eigenvectors of a suitable data-dependent matrix. However, despite the wide applicability, their design is still obtained via heuristic considerations, and the number of samples $n$ needed to guarantee recovery is super-linear in the signal dimension $d$. In this paper, we develop exact asymptotics on spectral methods in the challenging proportional regime in which $n, d$ grow large and their ratio converges to a finite constant. This allows us optimize the design of the spectral method, and combine it with a simple linear estimator, to minimize the estimation error. Our characterization exploits a mix of tools from random matrices, free probability and the theory of approximate message passing algorithms. Numerical simulations for mixed linear regression and phase retrieval demonstrate the advantage enabled by our analysis over existing designs of spectral methods.

5.9STAug 28, 2023

Spectral Estimators for Structured Generalized Linear Models via Approximate Message Passing

Yihan Zhang, Hong Chang Ji, Ramji Venkataramanan et al.

We consider the problem of parameter estimation in a high-dimensional generalized linear model. Spectral methods obtained via the principal eigenvector of a suitable data-dependent matrix provide a simple yet surprisingly effective solution. However, despite their wide use, a rigorous performance characterization, as well as a principled way to preprocess the data, are available only for unstructured (i.i.d.\ Gaussian and Haar orthogonal) designs. In contrast, real-world data matrices are highly structured and exhibit non-trivial correlations. To address the problem, we consider correlated Gaussian designs capturing the anisotropic nature of the features via a covariance matrix $Σ$. Our main result is a precise asymptotic characterization of the performance of spectral estimators. This allows us to identify the optimal preprocessing that minimizes the number of samples needed for parameter estimation. Surprisingly, such preprocessing is universal across a broad set of designs, which partly addresses a conjecture on optimal spectral estimators for rotationally invariant models. Our principled approach vastly improves upon previous heuristic methods, including for designs common in computational imaging and genetics. The proposed methodology, based on approximate message passing, is broadly applicable and opens the way to the precise characterization of spiked matrices and of the corresponding spectral methods in a variety of settings.

2.0LGApr 24, 2023

Can Decentralized Stochastic Minimax Optimization Algorithms Converge Linearly for Finite-Sum Nonconvex-Nonconcave Problems?

Yihan Zhang, Wenhao Jiang, Feng Zheng et al.

Decentralized minimax optimization has been actively studied in the past few years due to its application in a wide range of machine learning models. However, the current theoretical understanding of its convergence rate is far from satisfactory since existing works only focus on the nonconvex-strongly-concave problem. This motivates us to study decentralized minimax optimization algorithms for the nonconvex-nonconcave problem. To this end, we develop two novel decentralized stochastic variance-reduced gradient descent ascent algorithms for the finite-sum nonconvex-nonconcave problem that satisfies the Polyak-Łojasiewicz (PL) condition. In particular, our theoretical analyses demonstrate how to conduct local updates and perform communication to achieve the linear convergence rate. To the best of our knowledge, this is the first work achieving linear convergence rates for decentralized nonconvex-nonconcave problems. Finally, we verify the performance of our algorithms on both synthetic and real-world datasets. The experimental results confirm the efficacy of our algorithms.

7.9PRApr 29

Sharp One-Dimensional Sub-Gaussian Comparison in Convex Order

Yihan Zhang

We prove that any random variable $X$ whose moment generating function is point-wise upper bounded by that of $ G \sim \mathcal{N}(0,1) $ must be dominated by $ G/\mathbb{E}[|G|] $ in convex order, meaning $ \mathbb{E}[f(X)] \le \mathbb{E}[f(G/\mathbb{E}[|G|])] $ for all convex $f$. Equality is attained by taking $ X \sim \mathrm{Unif}(\{-1,1\}) $ and $ f(x) = |x| $.

5.1ROApr 18

Leveraging VR Robot Games to Facilitate Data Collection for Embodied Intelligence Tasks

Yihan Zhang, Ziyun Huang, Linqi Ye

Collecting embodied interaction data at scale remains costly and difficult due to the limited accessibility of conventional interfaces. We present a gamified data collection framework based on Unity that combines procedural scene generation, VR-based humanoid robot control, automatic task evaluation, and trajectory logging. A trash pick-and-place task prototype is developed to validate the full workflow.Experimental results indicate that the collected demonstrations exhibit broad coverage of the state-action space, and that increasing task difficulty leads to higher motion intensity as well as more extensive exploration of the arm's workspace. The proposed framework demonstrates that game-oriented virtual environments can serve as an effective and extensible solution for embodied data collection.

1.2STJun 6, 2022

Mean Estimation in High-Dimensional Binary Markov Gaussian Mixture Models

Yihan Zhang, Nir Weinberger

We consider a high-dimensional mean estimation problem over a binary hidden Markov model, which illuminates the interplay between memory in data, sample size, dimension, and signal strength in statistical inference. In this model, an estimator observes $n$ samples of a $d$-dimensional parameter vector $θ_{*}\in\mathbb{R}^{d}$, multiplied by a random sign $ S_i $ ($1\le i\le n$), and corrupted by isotropic standard Gaussian noise. The sequence of signs $\{S_{i}\}_{i\in[n]}\in\{-1,1\}^{n}$ is drawn from a stationary homogeneous Markov chain with flip probability $δ\in[0,1/2]$. As $δ$ varies, this model smoothly interpolates two well-studied models: the Gaussian Location Model for which $δ=0$ and the Gaussian Mixture Model for which $δ=1/2$. Assuming that the estimator knows $δ$, we establish a nearly minimax optimal (up to logarithmic factors) estimation error rate, as a function of $\|θ_{*}\|,δ,d,n$. We then provide an upper bound to the case of estimating $δ$, assuming a (possibly inaccurate) knowledge of $θ_{*}$. The bound is proved to be tight when $θ_{*}$ is an accurately known constant. These results are then combined to an algorithm which estimates $θ_{*}$ with $δ$ unknown a priori, and theoretical guarantees on its error are stated.

10.5CVSep 1, 2024

Style Transfer: From Stitching to Neural Networks

Xinhe Xu, Zhuoer Wang, Yihan Zhang et al.

This article compares two style transfer methods in image processing: the traditional method, which synthesizes new images by stitching together small patches from existing images, and a modern machine learning-based approach that uses a segmentation network to isolate foreground objects and apply style transfer solely to the background. The traditional method excels in creating artistic abstractions but can struggle with seamlessness, whereas the machine learning method preserves the integrity of foreground elements while enhancing the background, offering improved aesthetic quality and computational efficiency. Our study indicates that machine learning-based methods are more suited for real-world applications where detail preservation in foreground elements is essential.

7.7LGNov 19, 2023

On the Communication Complexity of Decentralized Bilevel Optimization

Yihan Zhang, My T. Thai, Jie Wu et al.

Stochastic bilevel optimization finds widespread applications in machine learning, including meta-learning, hyperparameter optimization, and neural architecture search. To extend stochastic bilevel optimization to distributed data, several decentralized stochastic bilevel optimization algorithms have been developed. However, existing methods often suffer from slow convergence rates and high communication costs in heterogeneous settings, limiting their applicability to real-world tasks. To address these issues, we propose two novel decentralized stochastic bilevel gradient descent algorithms based on simultaneous and alternating update strategies. Our algorithms can achieve faster convergence rates and lower communication costs than existing methods. Importantly, our convergence analyses do not rely on strong assumptions regarding heterogeneity. More importantly, our theoretical analysis clearly discloses how the additional communication required for estimating hypergradient under the heterogeneous setting affects the convergence rate. To the best of our knowledge, this is the first time such favorable theoretical results have been achieved with mild assumptions in the heterogeneous setting. Furthermore, we demonstrate how to establish the convergence rate for the alternating update strategy when combined with the variance-reduced gradient. Finally, experimental results confirm the efficacy of our algorithms.

4.3STMay 22, 2024

Matrix Denoising with Doubly Heteroscedastic Noise: Fundamental Limits and Optimal Spectral Methods

Yihan Zhang, Marco Mondelli

We study the matrix denoising problem of estimating the singular vectors of a rank-$1$ signal corrupted by noise with both column and row correlations. Existing works are either unable to pinpoint the exact asymptotic estimation error or, when they do so, the resulting approaches (e.g., based on whitening or singular value shrinkage) remain vastly suboptimal. On top of this, most of the literature has focused on the special case of estimating the left singular vector of the signal when the noise only possesses row correlation (one-sided heteroscedasticity). In contrast, our work establishes the information-theoretic and algorithmic limits of matrix denoising with doubly heteroscedastic noise. We characterize the exact asymptotic minimum mean square error, and design a novel spectral estimator with rigorous optimality guarantees: under a technical condition, it attains positive correlation with the signals whenever information-theoretically possible and, for one-sided heteroscedasticity, it also achieves the Bayes-optimal error. Numerical experiments demonstrate the significant advantage of our theoretically principled method with the state of the art. The proofs draw connections with statistical physics and approximate message passing, departing drastically from standard random matrix theory techniques.

19.0MLFeb 3, 2025

Spectral Estimators for Multi-Index Models: Precise Asymptotics and Optimal Weak Recovery

Filip Kovačević, Yihan Zhang, Marco Mondelli

Multi-index models provide a popular framework to investigate the learnability of functions with low-dimensional structure and, also due to their connections with neural networks, they have been object of recent intensive study. In this paper, we focus on recovering the subspace spanned by the signals via spectral estimators -- a family of methods routinely used in practice, often as a warm-start for iterative algorithms. Our main technical contribution is a precise asymptotic characterization of the performance of spectral methods, when sample size and input dimension grow proportionally and the dimension $p$ of the space to recover is fixed. Specifically, we locate the top-$p$ eigenvalues of the spectral matrix and establish the overlaps between the corresponding eigenvectors (which give the spectral estimators) and a basis of the signal subspace. Our analysis unveils a phase transition phenomenon in which, as the sample complexity grows, eigenvalues escape from the bulk of the spectrum and, when that happens, eigenvectors recover directions of the desired subspace. The precise characterization we put forward enables the optimization of the data preprocessing, thus allowing to identify the spectral estimator that requires the minimal sample size for weak recovery.

1.2LGNov 17, 2020

MG-GCN: Fast and Effective Learning with Mix-grained Aggregators for Training Large Graph Convolutional Networks

Tao Huang, Yihan Zhang, Jiajing Wu et al.

Graph convolutional networks (GCNs) have been employed as a kind of significant tool on many graph-based applications recently. Inspired by convolutional neural networks (CNNs), GCNs generate the embeddings of nodes by aggregating the information of their neighbors layer by layer. However, the high computational and memory cost of GCNs due to the recursive neighborhood expansion across GCN layers makes it infeasible for training on large graphs. To tackle this issue, several sampling methods during the process of information aggregation have been proposed to train GCNs in a mini-batch Stochastic Gradient Descent (SGD) manner. Nevertheless, these sampling strategies sometimes bring concerns about insufficient information collection, which may hinder the learning performance in terms of accuracy and convergence. To tackle the dilemma between accuracy and efficiency, we propose to use aggregators with different granularities to gather neighborhood information in different layers. Then, a degree-based sampling strategy, which avoids the exponential complexity, is constructed for sampling a fixed number of nodes. Combining the above two mechanisms, the proposed model, named Mix-grained GCN (MG-GCN) achieves state-of-the-art performance in terms of accuracy, training speed, convergence speed, and memory cost through a comprehensive set of experiments on four commonly used benchmark datasets and a new Ethereum dataset.