LGFeb 14, 2023
Spectral Clustering for Crowdsourcing with Inherently Distinct Task TypesSaptarshi Mandal, Seo Taek Kong, Dimitrios Katselis et al.
The Dawid-Skene model is the most widely assumed model in the analysis of crowdsourcing algorithms that estimate ground-truth labels from noisy worker responses. In this work, we are motivated by crowdsourcing applications where workers have distinct skill sets and their accuracy additionally depends on a task's type. While weighted majority vote (WMV) with a single weight vector for each worker achieves the optimal label estimation error in the Dawid-Skene model, we show that different weights for different types are necessary for a multi-type model. Focusing on the case where there are two types of tasks, we propose a spectral method to partition tasks into two groups that cluster tasks by type. Our analysis reveals that task types can be perfectly recovered if the number of workers $n$ scales logarithmically with the number of tasks $d$. Any algorithm designed for the Dawid-Skene model can then be applied independently to each type to infer the labels. Numerical experiments show how clustering tasks by type before estimating ground-truth labels enhances the performance of crowdsourcing algorithms in practical applications.
CVDec 26, 2022
Key Feature Replacement of In-Distribution Samples for Out-of-Distribution DetectionJaeyoung Kim, Seo Taek Kong, Dongbin Na et al.
Out-of-distribution (OOD) detection can be used in deep learning-based applications to reject outlier samples from being unreliably classified by deep neural networks. Learning to classify between OOD and in-distribution samples is difficult because data comprising the former is extremely diverse. It has been observed that an auxiliary OOD dataset is most effective in training a "rejection" network when its samples are semantically similar to in-distribution images. We first deduce that OOD images are perceived by a deep neural network to be semantically similar to in-distribution samples when they share a common background, as deep networks are observed to incorrectly classify such images with high confidence. We then propose a simple yet effective Key In-distribution feature Replacement BY inpainting (KIRBY) procedure that constructs a surrogate OOD dataset by replacing class-discriminative features of in-distribution samples with marginal background features. The procedure can be implemented using off-the-shelf vision algorithms, where each step within the algorithm is shown to make the surrogate data increasingly similar to in-distribution data. Design choices in each step are studied extensively, and an exhaustive comparison with state-of-the-art algorithms demonstrates KIRBY's competitiveness on various benchmarks.
LGMay 21
Noise Schedule Design for Diffusion Models: An Optimal Control PerspectiveSeo Taek Kong, Weina Wang, R. Srikant
We develop a principled framework for analyzing and designing noise schedules in diffusion models. We show that one can recast this design problem as an optimal control problem, whose state is the Fisher information of the diffusion process which evolves according to an ODE and the control input is the noise schedule. The objective of the optimal control problem is a functional involving the Fisher information, which is shown to be an upper bound on the Kullback-Leibler sampling error. By solving this optimal control problem, we obtain sufficient conditions on noise schedules under which state-of-the-art $\tilde{\mathcal{O}} (d/n)$ sampling error is achievable, where $d$ is the data dimension and $n$ is the number of discretization steps. While existing theoretical work also prove that $\tilde{\mathcal{O}}(d/n)$ sampling error bounds are achievable, these results hold for specific noise schedules, which do not include the schedules used in practice. Under a further parametric assumption on the data distribution, we show that one can obtain closed-form expressions for the noise schedules. These noise schedules generalize standard empirical schedules such as exponential and sigmoid schedules by allowing additional parameters that can be tuned. Systematically tuning the parameters of these schedules yields new schedules that achieve superior FID scores on image generation benchmarks.
CVMar 29, 2023
Self-accumulative Vision Transformer for Bone Age Assessment Using the Sauvegrain MethodHong-Jun Choi, Dongbin Na, Kyungjin Cho et al.
This study presents a novel approach to bone age assessment (BAA) using a multi-view, multi-task classification model based on the Sauvegrain method. A straightforward solution to automating the Sauvegrain method, which assesses a maturity score for each landmark in the elbow and predicts the bone age, is to train classifiers independently to score each region of interest (RoI), but this approach limits the accessible information to local morphologies and increases computational costs. As a result, this work proposes a self-accumulative vision transformer (SAT) that mitigates anisotropic behavior, which usually occurs in multi-view, multi-task problems and limits the effectiveness of a vision transformer, by applying token replay and regional attention bias. A number of experiments show that SAT successfully exploits the relationships between landmarks and learns global morphological features, resulting in a mean absolute error of BAA that is 0.11 lower than that of the previous work. Additionally, the proposed SAT has four times reduced parameters than an ensemble of individual classifiers of the previous work. Lastly, this work also provides informative implications for clinical practice, improving the accuracy and efficiency of BAA in diagnosing abnormal growth in adolescents.
LGFeb 14, 2025
Nonasymptotic CLT and Error Bounds for Two-Time-Scale Stochastic ApproximationSeo Taek Kong, Sihan Zeng, Thinh T. Doan et al.
We consider linear two-time-scale stochastic approximation algorithms driven by martingale noise. Recent applications in machine learning motivate the need to understand finite-time error rates, but conventional stochastic approximation analysis focus on either asymptotic convergence in distribution or finite-time bounds that are far from optimal. Prior work on asymptotic central limit theorems (CLTs) suggest that two-time-scale algorithms may be able to achieve $1/\sqrt{n}$ error in expectation, with a constant given by the expected norm of the limiting Gaussian vector. However, the best known finite-time rates are much slower. We derive the first non-asymptotic central limit theorem with respect to the Wasserstein-1 distance for two-time-scale stochastic approximation with Polyak-Ruppert averaging. As a corollary, we show that expected error achieved by Polyak-Ruppert averaging decays at rate $1/\sqrt{n}$, which significantly improves on the rates of convergence in prior works.
LGFeb 2
Finite-Sample Wasserstein Error Bounds and Concentration Inequalities for Nonlinear Stochastic ApproximationSeo Taek Kong, R. Srikant
This paper derives non-asymptotic error bounds for nonlinear stochastic approximation algorithms in the Wasserstein-$p$ distance. To obtain explicit finite-sample guarantees for the last iterate, we develop a coupling argument that compares the discrete-time process to a limiting Ornstein-Uhlenbeck process. Our analysis applies to algorithms driven by general noise conditions, including martingale differences and functions of ergodic Markov chains. Complementing this result, we handle the convergence rate of the Polyak-Ruppert average through a direct analysis that applies under the same general setting. Assuming the driving noise satisfies a non-asymptotic central limit theorem, we show that the normalized last iterates converge to a Gaussian distribution in the $p$-Wasserstein distance at a rate of order $γ_n^{1/6}$, where $γ_n$ is the step size. Similarly, the Polyak-Ruppert average is shown to converge in the Wasserstein distance at a rate of order $n^{-1/6}$. These distributional guarantees imply high-probability concentration inequalities that improve upon those derived from moment bounds and Markov's inequality. We demonstrate the utility of this approach by considering two applications: (1) linear stochastic approximation, where we explicitly quantify the transition from heavy-tailed to Gaussian behavior of the iterates, thereby bridging the gap between recent finite-sample analyses and asymptotic theory and (2) stochastic gradient descent, where we establish rate of convergence to the central limit theorem.
LGOct 7, 2025
Primal-Dual Direct Preference Optimization for Constrained LLM AlignmentYihan Du, Seo Taek Kong, R. Srikant
The widespread application of Large Language Models (LLMs) imposes increasing demands on safety, such as reducing harmful content and fake information, and avoiding certain forbidden tokens due to rules and laws. While there have been several recent works studying safe alignment of LLMs, these works either require the training of reward and cost models and incur high memory and computational costs, or need prior knowledge about the optimal solution. Motivated by this fact, we study the problem of constrained alignment in LLMs, i.e., maximizing the output reward while restricting the cost due to potentially unsafe content to stay below a threshold. For this problem, we propose a novel primal-dual DPO approach, which first trains a model using standard DPO on reward preference data to provide reward information, and then adopts a rearranged Lagrangian DPO objective utilizing the provided reward information to fine-tune LLMs on cost preference data. Our approach significantly reduces memory and computational costs, and does not require extra prior knowledge. Moreover, we establish rigorous theoretical guarantees on the suboptimality and constraint violation of the output policy. We also extend our approach to an online data setting by incorporating exploration bonuses, which enables our approach to explore uncovered prompt-response space, and then provide theoretical results that get rid of the dependence on preference data coverage. Experimental results on the widely-used preference dataset PKU-SafeRLHF demonstrate the effectiveness of our approach.
LGApr 8, 2021
A Neural Pre-Conditioning Active Learning Algorithm to Reduce Label ComplexitySeo Taek Kong, Soomin Jeon, Dongbin Na et al.
Deep learning (DL) algorithms rely on massive amounts of labeled data. Semi-supervised learning (SSL) and active learning (AL) aim to reduce this label complexity by leveraging unlabeled data or carefully acquiring labels, respectively. In this work, we primarily focus on designing an AL algorithm but first argue for a change in how AL algorithms should be evaluated. Although unlabeled data is readily available in pool-based AL, AL algorithms are usually evaluated by measuring the increase in supervised learning (SL) performance at consecutive acquisition steps. Because this measures performance gains from both newly acquired instances and newly acquired labels, we propose to instead evaluate the label efficiency of AL algorithms by measuring the increase in SSL performance at consecutive acquisition steps. After surveying tools that can be used to this end, we propose our neural pre-conditioning (NPC) algorithm inspired by a Neural Tangent Kernel (NTK) analysis. Our algorithm incorporates the classifier's uncertainty on unlabeled data and penalizes redundant samples within candidate batches to efficiently acquire a diverse set of informative labels. Furthermore, we prove that NPC improves downstream training in the large-width regime in a manner previously observed to correlate with generalization. Comparisons with other AL algorithms show that a state-of-the-art SSL algorithm coupled with NPC can achieve high performance using very few labeled data.
LGJan 25, 2019
Almost Boltzmann ExplorationHarsh Gupta, Seo Taek Kong, R. Srikant et al.
Boltzmann exploration is widely used in reinforcement learning to provide a trade-off between exploration and exploitation. Recently, in (Cesa-Bianchi et al., 2017) it has been shown that pure Boltzmann exploration does not perform well from a regret perspective, even in the simplest setting of stochastic multi-armed bandit (MAB) problems. In this paper, we show that a simple modification to Boltzmann exploration, motivated by a variation of the standard doubling trick, achieves $O(K\log^{1+α} T)$ regret for a stochastic MAB problem with $K$ arms, where $α>0$ is a parameter of the algorithm. This improves on the result in (Cesa-Bianchi et al., 2017), where an algorithm inspired by the Gumbel-softmax trick achieves $O(K\log^2 T)$ regret. We also show that our algorithm achieves $O(β(G) \log^{1+α} T)$ regret in stochastic MAB problems with graph-structured feedback, without knowledge of the graph structure, where $β(G)$ is the independence number of the feedback graph. Additionally, we present extensive experimental results on real datasets and applications for multi-armed bandits with both traditional bandit feedback and graph-structured feedback. In all cases, our algorithm performs as well or better than the state-of-the-art.