CVMay 8, 2022
High-Resolution UAV Image Generation for Sorghum Panicle DetectionEnyu Cai, Zhankun Luo, Sriram Baireddy et al.
The number of panicles (or heads) of Sorghum plants is an important phenotypic trait for plant development and grain yield estimation. The use of Unmanned Aerial Vehicles (UAVs) enables the capability of collecting and analyzing Sorghum images on a large scale. Deep learning can provide methods for estimating phenotypic traits from UAV images but requires a large amount of labeled data. The lack of training data due to the labor-intensive ground truthing of UAV images causes a major bottleneck in developing methods for Sorghum panicle detection and counting. In this paper, we present an approach that uses synthetic training images from generative adversarial networks (GANs) for data augmentation to enhance the performance of Sorghum panicle detection and counting. Our method can generate synthetic high-resolution UAV RGB images with panicle labels by using image-to-image translation GANs with a limited ground truth dataset of real UAV RGB images. The results show the improvements in panicle detection and counting using our data augmentation approach.
LGMay 14
Unified High-Probability Analysis of Stochastic Variance-Reduced EstimationZhankun Luo, Antesh Upadhyay, M. Berk Sahin et al.
Stochastic estimators are fundamental to large-scale optimization, where population quantities must be inferred from noisy oracle observations. Although influential methods such as momentum, SPIDER, STORM, and PAGE have been highly successful, their analyses are largely estimator-specific and expectation-based, obscuring the structural tradeoffs that determine reliability. In this paper, we develop a unified framework for stochastic variance-reduced estimation based on a recursion with three components: memory retention, reset probability, and a correction term for iterate movement. This framework recovers several classical estimators, motivates new second-order variants, and yields a bias-variance decomposition of estimation error. Our main result is a unified high-probability bound proved using a new dimension-free vector-valued Freedman inequality, valid for smooth normed spaces involving random sums of vector martingales. The result applies in both Euclidean and non-Euclidean settings, including the analysis of mirror-descent-based methods in Banach spaces. As applications, we obtain high-probability oracle complexities for unconstrained optimization with mirror descent, establishing the logarithmic dependence on the confidence level. We also derive the first $\tilde{\mathcal{O}}(\varepsilon^{-3})$ oracle-complexity bounds for stochastic optimization with expectation constraints, improving upon the existing $\tilde{\mathcal{O}}(\varepsilon^{-4})$ complexity by leveraging variance-reduced estimation for the first time in this setting.
LGMar 6
First-Order Softmax Weighted Switching Gradient Method for Distributed Stochastic Minimax Optimization with Stochastic ConstraintsZhankun Luo, Antesh Upadhyay, Sang Bin Moon et al.
This paper addresses the distributed stochastic minimax optimization problem subject to stochastic constraints. We propose a novel first-order Softmax-Weighted Switching Gradient method tailored for federated learning. Under full client participation, our algorithm achieves the standard $\mathcal{O}(ε^{-4})$ oracle complexity to satisfy a unified bound $ε$ for both the optimality gap and feasibility tolerance. We extend our theoretical analysis to the practical partial participation regime by quantifying client sampling noise through a stochastic superiority assumption. Furthermore, by relaxing standard boundedness assumptions on the objective functions, we establish a strictly tighter lower bound for the softmax hyperparameter. We provide a unified error decomposition and establish a sharp $\mathcal{O}(\log\frac{1}δ)$ high-probability convergence guarantee. Ultimately, our framework demonstrates that a single-loop primal-only switching mechanism provides a stable alternative for optimizing worst-case client performance, effectively bypassing the hyperparameter sensitivity and convergence oscillations often encountered in traditional primal-dual or penalty-based approaches. We verify the efficacy of our algorithm via experiment on the Neyman-Pearson (NP) classification and fair classification tasks.
LGNov 7, 2025
Structural Properties, Cycloid Trajectories and Non-Asymptotic Guarantees of EM Algorithm for Mixed Linear RegressionZhankun Luo, Abolfazl Hashemi
This work investigates the structural properties, cycloid trajectories, and non-asymptotic convergence guarantees of the Expectation-Maximization (EM) algorithm for two-component Mixed Linear Regression (2MLR) with unknown mixing weights and regression parameters. Recent studies have established global convergence for 2MLR with known balanced weights and super-linear convergence in noiseless and high signal-to-noise ratio (SNR) regimes. However, the theoretical behavior of EM in the fully unknown setting remains unclear, with its trajectory and convergence order not yet fully characterized. We derive explicit EM update expressions for 2MLR with unknown mixing weights and regression parameters across all SNR regimes and analyze their structural properties and cycloid trajectories. In the noiseless case, we prove that the trajectory of the regression parameters in EM iterations traces a cycloid by establishing a recurrence relation for the sub-optimality angle, while in high SNR regimes we quantify its discrepancy from the cycloid trajectory. The trajectory-based analysis reveals the order of convergence: linear when the EM estimate is nearly orthogonal to the ground truth, and quadratic when the angle between the estimate and ground truth is small at the population level. Our analysis establishes non-asymptotic guarantees by sharpening bounds on statistical errors between finite-sample and population EM updates, relating EM's statistical accuracy to the sub-optimality angle, and proving convergence with arbitrary initialization at the finite-sample level. This work provides a novel trajectory-based framework for analyzing EM in Mixed Linear Regression.
LGAug 13, 2025
Characterizing Evolution in Expectation-Maximization Estimates for Overspecified Mixed Linear RegressionZhankun Luo, Abolfazl Hashemi
Mixture models have attracted significant attention due to practical effectiveness and comprehensive theoretical foundations. A persisting challenge is model misspecification, which occurs when the model to be fitted has more mixture components than those in the data distribution. In this paper, we develop a theoretical understanding of the Expectation-Maximization (EM) algorithm's behavior in the context of targeted model misspecification for overspecified two-component Mixed Linear Regression (2MLR) with unknown $d$-dimensional regression parameters and mixing weights. In Theorem 5.1 at the population level, with an unbalanced initial guess for mixing weights, we establish linear convergence of regression parameters in $O(\log(1/ε))$ steps. Conversely, with a balanced initial guess for mixing weights, we observe sublinear convergence in $O(ε^{-2})$ steps to achieve the $ε$-accuracy at Euclidean distance. In Theorem 6.1 at the finite-sample level, for mixtures with sufficiently unbalanced fixed mixing weights, we demonstrate a statistical accuracy of $O((d/n)^{1/2})$, whereas for those with sufficiently balanced fixed mixing weights, the accuracy is $O((d/n)^{1/4})$ given $n$ data samples. Furthermore, we underscore the connection between our population level and finite-sample level results: by setting the desired final accuracy $ε$ in Theorem 5.1 to match that in Theorem 6.1 at the finite-sample level, namely letting $ε= O((d/n)^{1/2})$ for sufficiently unbalanced fixed mixing weights and $ε= O((d/n)^{1/4})$ for sufficiently balanced fixed mixing weights, we intuitively derive iteration complexity bounds $O(\log (1/ε))=O(\log (n/d))$ and $O(ε^{-2})=O((n/d)^{1/2})$ at the finite-sample level for sufficiently unbalanced and balanced initial mixing weights. We further extend our analysis in overspecified setting to low SNR regime.