Yuefeng Han

ML
h-index4
10papers
29citations
Novelty64%
AI Score55

10 Papers

45.3MLMay 19
Factor Augmented High-Dimensional SGD

Shubo Li, Yuefeng Han, Xiufan Yu

Stochastic gradient descent (SGD) is a fundamental optimization algorithm widely used in modern machine learning. In this paper, we propose Factor-Augmented SGD (FSGD), a new optimization method that leverages latent factor representations in high-dimensional learning tasks. Unlike standard two-stage dimension reduction approaches that rely on offline representation learning and full data storage, a key novelty of FSGD is that it operates purely on streaming data, making it scalable to large-scale and high-dimensional problems. Furthermore, we establish the first theoretical framework that explicitly incorporates latent factor estimation error into the analysis of SGD, and provide moment convergence in $\ell^s$ norm under decaying step sizes and mini-batch updates. Our results provide a new foundation for employing SGD reliably and scalably in high-dimensional machine learning systems.

MLOct 18, 2024
High-Dimensional Tensor Discriminant Analysis with Incomplete Tensors

Elynn Chen, Yuefeng Han, Jiayu Li

Tensor classification is gaining importance across fields, yet handling partially observed data remains challenging. In this paper, we introduce a novel approach to tensor classification with incomplete data, framed within high-dimensional tensor linear discriminant analysis. Specifically, we consider a high-dimensional tensor predictor with missing observations under the Missing Completely at Random (MCR) assumption and employ the Tensor Gaussian Mixture Model (TGMM) to capture the relationship between the tensor predictor and class label. We propose a Tensor Linear Discriminant Analysis with Missing Data (Tensor LDA-MD) algorithm, which manages high-dimensional tensor predictors with missing entries by leveraging the decomposable low-rank structure of the discriminant tensor. Our work establishes convergence rates for the estimation error of the discriminant tensor with incomplete data and minimax optimal bounds for the misclassification rate, addressing key gaps in the literature. Additionally, we derive large deviation bounds for the generalized mode-wise sample covariance matrix and its inverse, which are crucial tools in our analysis and hold independent interest. Our method demonstrates excellent performance in simulations and real data analysis, even with significant proportions of missing data.

59.3STMar 27
Detection Is Harder Than Estimation in Certain Regimes: Inference for Moment and Cumulant Tensors

Runshi Tang, Yuefeng Han, Anru R. Zhang

We study estimation and detection of high-order moment and cumulant tensors from $n$ i.i.d. observations of a $p$-dimensional random vector, with performance measured in tensor spectral norm. On the statistical side, we prove that under sub-Gaussianity, the minimax rate for estimating the order-$d$ moment and cumulant tensors is $\sqrt{p/n}\wedge 1$. In contrast to covariance estimation, the sample moment tensor is generally no longer rate-optimal for higher-order moments. We therefore develop an estimator that attains the minimax rate up to logarithmic factors through a convex feasibility formulation over an $\varepsilon$-net of the unit sphere. On the computational side, we study the problem of testing whether the $d$-th order cumulant tensor vanishes after whitening. Using the low-degree polynomial framework, we provide evidence that detection is computationally hard when $n\ll p^{d/2}$. At the same time, we identify a regime in which an efficiently computable estimator attains error smaller than the separation at which low-degree tests can reliably distinguish the null from the alternative. This yields the striking conclusion that computationally efficient detection can be harder than computationally efficient estimation, revealing an unusual reverse detection-estimation gap: in a broad regime, computationally efficient estimation is possible at a smaller scale than computationally efficient detection. This phenomenon arises because the computational difficulty is driven not only by the statistical model, but also by the loss function itself: tensor spectral norm is NP-hard to compute. This feature makes the proposed open problems regarding computational lower bounds for estimation qualitatively different from the existing literature. Our results therefore uncover a new kind of computational--statistical gap.

55.6STApr 1
A General Framework for Computational Lower Bounds in Nontrivial Norm Approximation

Runshi Tang, Yuefeng Han, Anru R. Zhang

In this note, we propose a general framework for proving computational lower bounds in norm approximation by leveraging a reverse detection--estimation gap. The starting point is a testing problem together with an estimator whose error is significantly smaller than the corresponding computational detection threshold. We show that such a gap yields a lower bound on the approximation distortion achievable by any algorithm in the underlying computational class. In this way, reverse detection--estimation gaps can be turned into a general mechanism for certifying the hardness of approximating nontrivial norms. We apply this framework to the spectral norm of order-$d$ symmetric tensors in $\mathbb{R}^{p^d}$. Using a recently established low-degree hardness result for detecting nonzero high-order cumulant tensors, together with an efficiently computable estimator whose error is below the low-degree detection threshold, we prove that any degree-$D$ low-degree algorithm with $D \le c_d(\log p)^2$ must incur distortion at least $p^{d/4-1/2}/\operatorname{polylog}(p)$ for the tensor spectral norm. Under the low-degree conjecture, the same conclusion extends to all polynomial-time algorithms. In several important settings, this lower bound matches the best known upper bounds up to polylogarithmic factors, suggesting that the exponent $d/4-1/2$ captures a genuine computational barrier. Our results provide evidence that the difficulty of approximating tensor spectral norm is not merely an artifact of existing techniques, but reflects a broader computational barrier.

LGDec 13, 2025
High-Dimensional Tensor Discriminant Analysis: Low-Rank Discriminant Structure, Representation Synergy, and Theoretical Guarantees

Elynn Chen, Yuefeng Han, Jiayu Li

High-dimensional tensor-valued predictors arise in modern applications, increasingly as learned representations from neural networks. Existing tensor classification methods rely on sparsity or Tucker structures and often lack theoretical guarantees. Motivated by empirical evidence that discriminative signals concentrate along a few multilinear components, we introduce CP low-rank structure for the discriminant tensor, a modeling perspective not previously explored. Under a Tensor Gaussian Mixture Model, we propose high-dimensional CP low-rank Tensor Discriminant Analysis (CP-TDA) with Randomized Composite PCA (\textsc{rc-PCA}) initialization, that is essential for handling dependent and anisotropic noise under weaker signal strength and incoherence conditions, followed by iterative refinement algorithm. We establish global convergence and minimax-optimal misclassification rates. To handle tensor data deviating from tensor normality, we develop the first semiparametric tensor discriminant model, in which learned tensor representations are mapped via deep generative models into a latent space tailored for CP-TDA. Misclassification risk decomposes into representation, approximation, and estimation errors. Numerical studies and real data analysis on graph classification demonstrate substantial gains over existing tensor classifiers and state-of-the-art graph neural networks, particularly in high-dimensional, small-sample regimes.

MLOct 1, 2025
Guaranteed Noisy CP Tensor Recovery via Riemannian Optimization on the Segre Manifold

Ke Xu, Yuefeng Han

Recovering a low-CP-rank tensor from noisy linear measurements is a central challenge in high-dimensional data analysis, with applications spanning tensor PCA, tensor regression, and beyond. We exploit the intrinsic geometry of rank-one tensors by casting the recovery task as an optimization problem over the Segre manifold, the smooth Riemannian manifold of rank-one tensors. This geometric viewpoint yields two powerful algorithms: Riemannian Gradient Descent (RGD) and Riemannian Gauss-Newton (RGN), each of which preserves feasibility at every iteration. Under mild noise assumptions, we prove that RGD converges at a local linear rate, while RGN exhibits an initial local quadratic convergence phase that transitions to a linear rate as the iterates approach the statistical noise floor. Extensive synthetic experiments validate these convergence guarantees and demonstrate the practical effectiveness of our methods.

CLAug 6, 2025
Factor Augmented Supervised Learning with Text Embeddings

Zhanye Luo, Yuefeng Han, Xiufan Yu

Large language models (LLMs) generate text embeddings from text data, producing vector representations that capture the semantic meaning and contextual relationships of words. However, the high dimensionality of these embeddings often impedes efficiency and drives up computational cost in downstream tasks. To address this, we propose AutoEncoder-Augmented Learning with Text (AEALT), a supervised, factor-augmented framework that incorporates dimension reduction directly into pre-trained LLM workflows. First, we extract embeddings from text documents; next, we pass them through a supervised augmented autoencoder to learn low-dimensional, task-relevant latent factors. By modeling the nonlinear structure of complex embeddings, AEALT outperforms conventional deep-learning approaches that rely on raw embeddings. We validate its broad applicability with extensive experiments on classification, anomaly detection, and prediction tasks using multiple real-world public datasets. Numerical results demonstrate that AEALT yields substantial gains over both vanilla embeddings and several standard dimension reduction methods.

MLAug 5, 2025
Supervised Dynamic Dimension Reduction with Deep Neural Network

Zhanye Luo, Yuefeng Han, Xiufan Yu

This paper studies the problem of dimension reduction, tailored to improving time series forecasting with high-dimensional predictors. We propose a novel Supervised Deep Dynamic Principal component analysis (SDDP) framework that incorporates the target variable and lagged observations into the factor extraction process. Assisted by a temporal neural network, we construct target-aware predictors by scaling the original predictors in a supervised manner, with larger weights assigned to predictors with stronger forecasting power. A principal component analysis is then performed on the target-aware predictors to extract the estimated SDDP factors. This supervised factor extraction not only improves predictive accuracy in the downstream forecasting task but also yields more interpretable and target-specific latent factors. Building upon SDDP, we propose a factor-augmented nonlinear dynamic forecasting model that unifies a broad family of factor-model-based forecasting approaches. To further demonstrate the broader applicability of SDDP, we extend our studies to a more challenging scenario when the predictors are only partially observable. We validate the empirical performance of the proposed method on several real-world public datasets. The results show that our algorithm achieves notable improvements in forecasting accuracy compared to state-of-the-art methods.

LGOct 27, 2024
TEAFormers: TEnsor-Augmented Transformers for Multi-Dimensional Time Series Forecasting

Linghang Kong, Elynn Chen, Yuzhou Chen et al.

Multi-dimensional time series data, such as matrix and tensor-variate time series, are increasingly prevalent in fields such as economics, finance, and climate science. Traditional Transformer models, though adept with sequential data, do not effectively preserve these multi-dimensional structures, as their internal operations in effect flatten multi-dimensional observations into vectors, thereby losing critical multi-dimensional relationships and patterns. To address this, we introduce the Tensor-Augmented Transformer (TEAFormer), a novel method that incorporates tensor expansion and compression within the Transformer framework to maintain and leverage the inherent multi-dimensional structures, thus reducing computational costs and improving prediction accuracy. The core feature of the TEAFormer, the Tensor-Augmentation (TEA) module, utilizes tensor expansion to enhance multi-view feature learning and tensor compression for efficient information aggregation and reduced computational load. The TEA module is not just a specific model architecture but a versatile component that is highly compatible with the attention mechanism and the encoder-decoder structure of Transformers, making it adaptable to existing Transformer architectures. Our comprehensive experiments, which integrate the TEA module into three popular time series Transformer models across three real-world benchmarks, show significant performance enhancements, highlighting the potential of TEAFormers for cutting-edge time series forecasting.

MLAug 10, 2021
Tensor Principal Component Analysis in High Dimensional CP Models

Yuefeng Han, Cun-Hui Zhang

The CP decomposition for high dimensional non-orthogonal spiked tensors is an important problem with broad applications across many disciplines. However, previous works with theoretical guarantee typically assume restrictive incoherence conditions on the basis vectors for the CP components. In this paper, we propose new computationally efficient composite PCA and concurrent orthogonalization algorithms for tensor CP decomposition with theoretical guarantees under mild incoherence conditions. The composite PCA applies the principal component or singular value decompositions twice, first to a matrix unfolding of the tensor data to obtain singular vectors and then to the matrix folding of the singular vectors obtained in the first step. It can be used as an initialization for any iterative optimization schemes for the tensor CP decomposition. The concurrent orthogonalization algorithm iteratively estimates the basis vector in each mode of the tensor by simultaneously applying projections to the orthogonal complements of the spaces generated by other CP components in other modes. It is designed to improve the alternating least squares estimator and other forms of the high order orthogonal iteration for tensors with low or moderately high CP ranks, and it is guaranteed to converge rapidly when the error of any given initial estimator is bounded by a small constant. Our theoretical investigation provides estimation accuracy and convergence rates for the two proposed algorithms. Both proposed algorithms are applicable to deterministic tensor, its noisy version, and the order-$2K$ covariance tensor of order-$K$ tensor data in a factor model with uncorrelated factors. Our implementations on synthetic data demonstrate significant practical superiority of our approach over existing methods.