Vanessa Kosoy

LG
h-index3
6papers
13citations
Novelty55%
AI Score41

6 Papers

16.7FLMar 27
Stringological sequence prediction I: efficient algorithms for predicting highly repetitive sequences

Vanessa Kosoy

We propose novel algorithms for sequence prediction based on ideas from stringology. These algorithms are time and space efficient and satisfy mistake bounds related to particular stringological complexity measures of the sequence. In this work (the first in a series) we focus on two such measures: (i) the size of the smallest straight-line program that produces the sequence, and (ii) the number of states in the minimal automaton that can compute any symbol in the sequence when given its position in base k as input. These measures are interesting because multiple rich classes of sequences studied in combinatorics of words (automatic sequences, morphic sequences, Sturmian words) have low complexity and hence high predictability in this sense.

LGApr 9, 2025
Regret Bounds for Robust Online Decision Making

Alexander Appel, Vanessa Kosoy

We propose a framework which generalizes "decision making with structured observations" by allowing robust (i.e. multivalued) models. In this framework, each model associates each decision with a convex set of probability distributions over outcomes. Nature can choose distributions out of this set in an arbitrary (adversarial) manner, that can be nonoblivious and depend on past history. The resulting framework offers much greater generality than classical bandits and reinforcement learning, since the realizability assumption becomes much weaker and more realistic. We then derive a theory of regret bounds for this framework. Although our lower and upper bounds are not tight, they are sufficient to fully characterize power-law learnability. We demonstrate this theory in two special cases: robust linear bandits and tabular robust online reinforcement learning. In both cases, we derive regret bounds that improve state-of-the-art (except that we do not address computational efficiency).

LGJun 24, 2025
Ambiguous Online Learning

Vanessa Kosoy

We propose a new variant of online learning that we call "ambiguous online learning". In this setting, the learner is allowed to produce multiple predicted labels. Such an "ambiguous prediction" is considered correct when at least one of the labels is correct, and none of the labels are "predictably wrong". The definition of "predictably wrong" comes from a hypothesis class in which hypotheses are also multi-valued. Thus, a prediction is "predictably wrong" if it's not allowed by the (unknown) true hypothesis. In particular, this setting is natural in the context of multivalued dynamical systems, recommendation algorithms and lossless compression. It is also strongly related to so-called "apple tasting". We show that in this setting, there is a trichotomy of mistake bounds: up to logarithmic factors, any hypothesis class has an optimal mistake bound of either Theta(1), Theta(sqrt(N)) or N.

LGMay 9, 2024
Imprecise Multi-Armed Bandits

Vanessa Kosoy

We introduce a novel multi-armed bandit framework, where each arm is associated with a fixed unknown credal set over the space of outcomes (which can be richer than just the reward). The arm-to-credal-set correspondence comes from a known class of hypotheses. We then define a notion of regret corresponding to the lower prevision defined by these credal sets. Equivalently, the setting can be regarded as a two-player zero-sum game, where, on each round, the agent chooses an arm and the adversary chooses the distribution over outcomes from a set of options associated with this arm. The regret is defined with respect to the value of game. For certain natural hypothesis classes, loosely analgous to stochastic linear bandits (which are a special case of the resulting setting), we propose an algorithm and prove a corresponding upper bound on regret. We also prove lower bounds on regret for particular special cases.

LGJul 19, 2019
Delegative Reinforcement Learning: learning to avoid traps with a little help

Vanessa Kosoy

Most known regret bounds for reinforcement learning are either episodic or assume an environment without traps. We derive a regret bound without making either assumption, by allowing the algorithm to occasionally delegate an action to an external advisor. We thus arrive at a setting of active one-shot model-based reinforcement learning that we call DRL (delegative reinforcement learning.) The algorithm we construct in order to demonstrate the regret bound is a variant of Posterior Sampling Reinforcement Learning supplemented by a subroutine that decides which actions should be delegated. The algorithm is not anytime, since the parameters must be adjusted according to the target time discount. Currently, our analysis is limited to Markov decision processes with finite numbers of hypotheses, states and actions.

LGMay 12, 2017
Forecasting using incomplete models

Vanessa Kosoy

We consider the task of forecasting an infinite sequence of future observations based on some number of past observations, where the probability measure generating the observations is "suspected" to satisfy one or more of a set of incomplete models, i.e. convex sets in the space of probability measures. This setting is in some sense intermediate between the realizable setting where the probability measure comes from some known set of probability measures (which can be addressed using e.g. Bayesian inference) and the unrealizable setting where the probability measure is completely arbitrary. We demonstrate a method of forecasting which guarantees that, whenever the true probability measure satisfies an incomplete model in a given countable set, the forecast converges to the same incomplete model in the (appropriately normalized) Kantorovich-Rubinstein metric. This is analogous to merging of opinions for Bayesian inference, except that convergence in the Kantorovich-Rubinstein metric is weaker than convergence in total variation.