Jung-hun Kim

LG
h-index5
12papers
30citations
Novelty55%
AI Score51

12 Papers

43.0MLApr 2
Learning in Prophet Inequalities with Noisy Observations

Jung-hun Kim, Vianney Perchet

We study the prophet inequality, a fundamental problem in online decision-making and optimal stopping, in a practical setting where rewards are observed only through noisy realizations and reward distributions are unknown. At each stage, the decision-maker receives a noisy reward whose true value follows a linear model with an unknown latent parameter, and observes a feature vector drawn from a distribution. To address this challenge, we propose algorithms that integrate learning and decision-making via lower-confidence-bound (LCB) thresholding. In the i.i.d.\ setting, we establish that both an Explore-then-Decide strategy and an $\varepsilon$-Greedy variant achieve the sharp competitive ratio of $1 - 1/e$, under a mild condition on the optimal value. For non-identical distributions, we show that a competitive ratio of $1/2$ can be guaranteed against a relaxed benchmark. Moreover, with limited window access to past rewards, the tight ratio of $1/2$ against the optimal benchmark is achieved.

LGMay 30, 2022
Adversarial Bandits against Arbitrary Strategies

Jung-hun Kim, Se-Young Yun

We study the adversarial bandit problem against arbitrary strategies, where the difficulty is captured by an unknown parameter $S$, which is the number of switches in the best arm in hindsight. To handle this problem, we adopt the master-base framework using the online mirror descent method (OMD). We first provide a master-base algorithm with simple OMD, achieving $\tilde{O}(S^{1/2}K^{1/3}T^{2/3})$, in which $T^{2/3}$ comes from the variance of loss estimators. To mitigate the impact of the variance, we propose using adaptive learning rates for OMD and achieve $\tilde{O}(\min\{\sqrt{SKTρ},S\sqrt{KT}\})$, where $ρ$ is a variance term for loss estimators.

MLOct 14, 2024
Queueing Matching Bandits with Preference Feedback

Jung-hun Kim, Min-hwan Oh

In this study, we consider multi-class multi-server asymmetric queueing systems consisting of $N$ queues on one side and $K$ servers on the other side, where jobs randomly arrive in queues at each time. The service rate of each job-server assignment is unknown and modeled by a feature-based Multi-nomial Logit (MNL) function. At each time, a scheduler assigns jobs to servers, and each server stochastically serves at most one job based on its preferences over the assigned jobs. The primary goal of the algorithm is to stabilize the queues in the system while learning the service rates of servers. To achieve this goal, we propose algorithms based on UCB and Thompson Sampling, which achieve system stability with an average queue length bound of $O(\min\{N,K\}/ε)$ for a large time horizon $T$, where $ε$ is a traffic slackness of the system. Furthermore, the algorithms achieve sublinear regret bounds of $\tilde{O}(\min\{\sqrt{T} Q_{\max},T^{3/4}\})$, where $Q_{\max}$ represents the maximum queue length over agents and times. Lastly, we provide experimental results to demonstrate the performance of our algorithms.

LGApr 22, 2024
An Adaptive Approach for Infinitely Many-armed Bandits under Generalized Rotting Constraints

Jung-hun Kim, Milan Vojnovic, Se-Young Yun

In this study, we consider the infinitely many-armed bandit problems in a rested rotting setting, where the mean reward of an arm may decrease with each pull, while otherwise, it remains unchanged. We explore two scenarios regarding the rotting of rewards: one in which the cumulative amount of rotting is bounded by $V_T$, referred to as the slow-rotting case, and the other in which the cumulative number of rotting instances is bounded by $S_T$, referred to as the abrupt-rotting case. To address the challenge posed by rotting rewards, we introduce an algorithm that utilizes UCB with an adaptive sliding window, designed to manage the bias and variance trade-off arising due to rotting rewards. Our proposed algorithm achieves tight regret bounds for both slow and abrupt rotting scenarios. Lastly, we demonstrate the performance of our algorithm using numerical experiments.

LGJan 31, 2025
Tracking Most Significant Shifts in Infinite-Armed Bandits

Joe Suk, Jung-hun Kim

We study an infinite-armed bandit problem where actions' mean rewards are initially sampled from a reservoir distribution. Most prior works in this setting focused on stationary rewards (Berry et al., 1997; Wang et al., 2008; Bonald and Proutiere, 2013; Carpentier and Valko, 2015) with the more challenging adversarial/non-stationary variant only recently studied in the context of rotting/decreasing rewards (Kim et al., 2022; 2024). Furthermore, optimal regret upper bounds were only achieved using parameter knowledge of non-stationarity and only known for certain regimes of regularity of the reservoir. This work shows the first parameter-free optimal regret bounds for all regimes while also relaxing distributional assumptions on the reservoir. We first introduce a blackbox scheme to convert a finite-armed MAB algorithm designed for near-stationary environments into a parameter-free algorithm for the infinite-armed non-stationary problem with optimal regret guarantees. We next study a natural notion of significant shift for this problem inspired by recent developments in finite-armed MAB (Suk & Kpotufe, 2022). We show that tighter regret bounds in terms of significant shifts can be adaptively attained by employing a randomized variant of elimination within our blackbox scheme. Our enhanced rates only depend on the rotting non-stationarity and thus exhibit an interesting phenomenon for this problem where rising rewards do not factor into the difficulty of non-stationarity.

MLOct 24, 2025
Oracle-Efficient Combinatorial Semi-Bandits

Jung-hun Kim, Milan Vojnović, Min-hwan Oh

We study the combinatorial semi-bandit problem where an agent selects a subset of base arms and receives individual feedback. While this generalizes the classical multi-armed bandit and has broad applicability, its scalability is limited by the high cost of combinatorial optimization, requiring oracle queries at every round. To tackle this, we propose oracle-efficient frameworks that significantly reduce oracle calls while maintaining tight regret guarantees. For the worst-case linear reward setting, our algorithms achieve $\tilde{O}(\sqrt{T})$ regret using only $O(\log\log T)$ oracle queries. We also propose covariance-adaptive algorithms that leverage noise structure for improved regret, and extend our approach to general (non-linear) rewards. Overall, our methods reduce oracle usage from linear to (doubly) logarithmic in time, with strong theoretical guarantees.

MLSep 4, 2025
Batched Stochastic Matching Bandits

Jung-hun Kim, Min-hwan Oh

In this study, we introduce a novel bandit framework for stochastic matching based on the Multi-nomial Logit (MNL) choice model. In our setting, $N$ agents on one side are assigned to $K$ arms on the other side, where each arm stochastically selects an agent from its assigned pool according to an unknown preference and yields a corresponding reward. The objective is to minimize regret by maximizing the cumulative revenue from successful matches across all agents. This task requires solving a combinatorial optimization problem based on estimated preferences, which is NP-hard and leads a naive approach to incur a computational cost of $O(K^N)$ per round. To address this challenge, we propose batched algorithms that limit the frequency of matching updates, thereby reducing the amortized computational cost (i.e., the average cost per round) to $O(1)$ while still achieving a regret bound of $\tilde{O}(\sqrt{T})$.

MLApr 3, 2025
Dynamic Assortment Selection and Pricing with Censored Preference Feedback

Jung-hun Kim, Min-hwan Oh

In this study, we investigate the problem of dynamic multi-product selection and pricing by introducing a novel framework based on a \textit{censored multinomial logit} (C-MNL) choice model. In this model, sellers present a set of products with prices, and buyers filter out products priced above their valuation, purchasing at most one product from the remaining options based on their preferences. The goal is to maximize seller revenue by dynamically adjusting product offerings and prices, while learning both product valuations and buyer preferences through purchase feedback. To achieve this, we propose a Lower Confidence Bound (LCB) pricing strategy. By combining this pricing strategy with either an Upper Confidence Bound (UCB) or Thompson Sampling (TS) product selection approach, our algorithms achieve regret bounds of $\tilde{O}(d^{\frac{3}{2}}\sqrt{T/κ})$ and $\tilde{O}(d^{2}\sqrt{T/κ})$, respectively. Finally, we validate the performance of our methods through simulations, demonstrating their effectiveness.

LGJan 31, 2022
Rotting Infinitely Many-armed Bandits

Jung-hun Kim, Milan Vojnovic, Se-Young Yun

We consider the infinitely many-armed bandit problem with rotting rewards, where the mean reward of an arm decreases at each pull of the arm according to an arbitrary trend with maximum rotting rate $\varrho=o(1)$. We show that this learning problem has an $Ω(\max\{\varrho^{1/3}T,\sqrt{T}\})$ worst-case regret lower bound where $T$ is the horizon time. We show that a matching upper bound $\tilde{O}(\max\{\varrho^{1/3}T,\sqrt{T}\})$, up to a poly-logarithmic factor, can be achieved by an algorithm that uses a UCB index for each arm and a threshold value to decide whether to continue pulling an arm or remove the arm from further consideration, when the algorithm knows the value of the maximum rotting rate $\varrho$. We also show that an $\tilde{O}(\max\{\varrho^{1/3}T,T^{3/4}\})$ regret upper bound can be achieved by an algorithm that does not know the value of $\varrho$, by using an adaptive UCB index along with an adaptive threshold value.

LGDec 13, 2021
Learning to Schedule in Parallel-Server Queues with Stochastic Bilinear Rewards

Jung-hun Kim, Milan Vojnovic

We consider the problem of scheduling in multi-class, parallel-server queuing systems with uncertain rewards from job-server assignments. In this scenario, jobs incur holding costs while awaiting completion, and job-server assignments yield observable stochastic rewards with unknown mean values. The mean rewards for job-server assignments are assumed to follow a bilinear model with respect to features that characterize jobs and servers. Our objective is to minimize regret by maximizing the cumulative reward of job-server assignments over a time horizon, while keeping the total job holding cost bounded to ensure the stability of the queueing system. This problem is motivated by applications requiring resource allocation in network systems. In this problem, it is essential to control the tradeoff between reward maximization and fair allocation for the stability of the underlying queuing system (i.e., maximizing network throughput). To address this problem, we propose a scheduling algorithm based on a weighted proportional fair criteria augmented with marginal costs for reward maximization, incorporating a bandit algorithm tailored for bilinear rewards. Our algorithm achieves a sub-linear regret bound and a sub-linear mean holding cost (and queue length bound) of $\tilde{O}(\sqrt{T})$, respectively, with respect to the time horizon $T$, thus guaranteeing queuing system stability. Additionally, we establish stability conditions for distributed iterative algorithms for computing allocations, which are relevant to large-scale system applications. Finally, we demonstrate the efficiency of our algorithm through numerical experiments.

IRAug 16, 2017
Hypotheses generation using link prediction in a bipartite graph

Jung-Hun Kim, Aviv Segev

The large volume of scientific publications is likely to have hidden knowledge that can be used for suggesting new research topics. We propose an automatic method that is helpful for generating research hypotheses in the field of physics using the massive number of physics journal publications. We convert the text data of titles and abstract sections in publications to a bipartite graph, extracting words of physical matter composed of chemical elements and extracting related keywords in the paper. The proposed method predicts the formation of new links between matter and keyword nodes based on collaborative filtering and matter popularity. The formation of links represents research hypotheses, as it suggests the new possible relationships between physical matter and keywords for physical properties or phenomena. The suggested method has better performance than existing methods for link prediction in the entire bipartite graph and the subgraph that contains only a specific keyword, such as `antiferromagnetism' or `superconductivity.'

AIMar 3, 2017
Contextual Linear Bandits under Noisy Features: Towards Bayesian Oracles

Jung-hun Kim, Se-Young Yun, Minchan Jeong et al.

We study contextual linear bandit problems under feature uncertainty, where the features are noisy and have missing entries. To address the challenges posed by this noise, we analyze Bayesian oracles given the observed noisy features. Our Bayesian analysis reveals that the optimal hypothesis can significantly deviate from the underlying realizability function, depending on the noise characteristics. These deviations are highly non-intuitive and do not occur in classical noiseless setups. This implies that classical approaches cannot guarantee a non-trivial regret bound. Therefore, we propose an algorithm that aims to approximate the Bayesian oracle based on the observed information under this model, achieving $\tilde{O}(d\sqrt{T})$ regret bound when there is a large number of arms. We demonstrate the proposed algorithm using synthetic and real-world datasets.