Daniil Ryabko

h-index17

17papers

375citations

Novelty60%

AI Score28

Ranked #149,315 of 194,257 authors (top 77%)#32,862 in LG (top 82%)

17 Papers

1.2LGOct 1, 2020

Universal time-series forecasting with mixture predictors

Daniil Ryabko

This book is devoted to the problem of sequential probability forecasting, that is, predicting the probabilities of the next outcome of a growing sequence of observations given the past. This problem is considered in a very general setting that unifies commonly used probabilistic and non-probabilistic settings, trying to make as few as possible assumptions on the mechanism generating the observations. A common form that arises in various formulations of this problem is that of mixture predictors, which are formed as a combination of a finite or infinite set of other predictors attempting to combine their predictive powers. The main subject of this book are such mixture predictors, and the main results demonstrate the universality of this method in a very general probabilistic setting, but also show some of its limitations. While the problems considered are motivated by practical applications, involving, for example, financial, biological or behavioural data, this motivation is left implicit and all the results exposed are theoretical. The book targets graduate students and researchers interested in the problem of sequential prediction, and, more generally, in theoretical analysis of problems in machine learning and non-parametric statistics, as well as mathematical and philosophical foundations of these fields. The material in this volume is presented in a way that presumes familiarity with basic concepts of probability and statistics, up to and including probability distributions over spaces of infinite sequences. Familiarity with the literature on learning or stochastic processes is not required.

1.2MLJun 26, 2019

Clustering piecewise stationary processes

Azadeh Khaleghi, Daniil Ryabko

The problem of time-series clustering is considered in the case where each data-point is a sample generated by a piecewise stationary ergodic process. Stationary processes are perhaps the most general class of processes considered in non-parametric statistics and allow for arbitrary long-range dependence between variables. Piecewise stationary processes studied here for the first time in the context of clustering, relax the last remaining assumption in this model: stationarity. A natural formulation is proposed for this problem and a notion of consistency is introduced which requires the samples to be placed in the same cluster if and only if the piecewise stationary distributions that generate them have the same set of stationary distributions. Simple, computationally efficient algorithms are proposed and are shown to be consistent without any additional assumptions beyond piecewise stationarity.

4.3STMar 30, 2019

Asymptotic nonparametric statistical analysis of stationary time series

Daniil Ryabko

Stationarity is a very general, qualitative assumption, that can be assessed on the basis of application specifics. It is thus a rather attractive assumption to base statistical analysis on, especially for problems for which less general qualitative assumptions, such as independence or finite memory, clearly fail. However, it has long been considered too general to allow for statistical inference to be made. One of the reasons for this is that rates of convergence, even of frequencies to the mean, are not available under this assumption alone. Recently, it has been shown that, while some natural and simple problems such as homogeneity, are indeed provably impossible to solve if one only assumes that the data is stationary (or stationary ergodic), many others can be solved using rather simple and intuitive algorithms. The latter problems include clustering and change point estimation. In this volume I summarize these results. The emphasis is on asymptotic consistency, since this the strongest property one can obtain assuming stationarity alone. While for most of the problems for which a solution is found this solution is algorithmically realizable, the main objective in this area of research, the objective which is only partially attained, is to understand what is possible and what is not possible to do for stationary time series. The considered problems include homogeneity testing, clustering with respect to distribution, clustering with respect to independence, change-point estimation, identity testing, and the general question of composite hypotheses testing. For the latter problem, a topological criterion for the existence of a consistent test is presented. In addition, several open questions are discussed.

0.8LGDec 20, 2018

Finite-time optimality of Bayesian predictors

Daniil Ryabko

The problem of sequential probability forecasting is considered in the most general setting: a model set C is given, and it is required to predict as well as possible if any of the measures (environments) in C is chosen to generate the data. No assumptions whatsoever are made on the model class C, in particular, no independence or mixing assumptions; C may not be measurable; there may be no predictor whose loss is sublinear, etc. It is shown that the cumulative loss of any possible predictor can be matched by that of a Bayesian predictor whose prior is discrete and is concentrated on C, up to an additive term of order $\log n$, where $n$ is the time step. The bound holds for every $n$ and every measure in C. This is the first non-asymptotic result of this kind. In addition, a non-matching lower bound is established: it goes to infinity with $n$ but may do so arbitrarily slow.

3.2LGAug 10, 2017

Hypotheses testing on infinite random graphs

Daniil Ryabko

Drawing on some recent results that provide the formalism necessary to definite stationarity for infinite random graphs, this paper initiates the study of statistical and learning questions pertaining to these objects. Specifically, a criterion for the existence of a consistent test for complex hypotheses is presented, generalizing the corresponding results on time series. As an application, it is shown how one can test that a tree has the Markov property, or, more generally, to estimate its memory.

1.4LGMar 20, 2017

Independence clustering (without a matrix)

Daniil Ryabko

The independence clustering problem is considered in the following formulation: given a set $S$ of random variables, it is required to find the finest partitioning $\{U_1,\dots,U_k\}$ of $S$ into clusters such that the clusters $U_1,\dots,U_k$ are mutually independent. Since mutual independence is the target, pairwise similarity measurements are of no use, and thus traditional clustering algorithms are inapplicable. The distribution of the random variables in $S$ is, in general, unknown, but a sample is available. Thus, the problem is cast in terms of time series. Two forms of sampling are considered: i.i.d.\ and stationary time series, with the main emphasis being on the latter, more general, case. A consistent, computationally tractable algorithm for each of the settings is proposed, and a number of open directions for further research are outlined.

2.3STOct 26, 2016

Universality of Bayesian mixture predictors

Daniil Ryabko

The problem is that of sequential probability forecasting for finite-valued time series. The data is generated by an unknown probability distribution over the space of all one-way infinite sequences. It is known that this measure belongs to a given set C, but the latter is completely arbitrary (uncountably infinite, without any structure given). The performance is measured with asymptotic average log loss. In this work it is shown that the minimax asymptotic performance is always attainable, and it is attained by a convex combination of a countably many measures from the set C (a Bayesian mixture). This was previously only known for the case when the best achievable asymptotic error is 0. This also contrasts previous results that show that in the non-realizable case all Bayesian mixtures may be suboptimal, while there is a predictor that achieves the optimal performance.

2.6LGAug 9, 2014

Characterizing predictable classes of processes

Daniil Ryabko

The problem is sequence prediction in the following setting. A sequence x1,..., xn,... of discrete-valued observations is generated according to some unknown probabilistic law (measure) mu. After observing each outcome, it is required to give the conditional probabilities of the next observation. The measure mu belongs to an arbitrary class C of stochastic processes. We are interested in predictors ? whose conditional probabilities converge to the 'true' mu-conditional probabilities if any mu { C is chosen to generate the data. We show that if such a predictor exists, then a predictor can also be obtained as a convex combination of a countably many elements of C. In other words, it can be obtained as a Bayesian predictor whose prior is concentrated on a countable set. This result is established for two very different measures of performance of prediction, one of which is very strong, namely, total variation, and the other is very weak, namely, prediction in expected average Kullback-Leibler divergence.

12.2LGMay 12, 2014

Selecting Near-Optimal Approximate State Representations in Reinforcement Learning

Ronald Ortner, Odalric-Ambrym Maillard, Daniil Ryabko

We consider a reinforcement learning setting introduced in (Maillard et al., NIPS 2011) where the learner does not have explicit access to the states of the underlying Markov decision process (MDP). Instead, she has access to several models that map histories of past interactions to states. Here we improve over known regret bounds in this setting, and more importantly generalize to the case where the models given to the learner do not contain a true model resulting in an MDP representation but only approximations of it. We also give improved error bounds for state aggregation.

2.9LGApr 17, 2013

Unsupervised model-free representation learning

Daniil Ryabko

Numerous control and learning problems face the situation where sequences of high-dimensional highly dependent data are available but no or little feedback is provided to the learner, which makes any inference rather challenging. To address this challenge, we formulate the following problem. Given a series of observations $X_0,\dots,X_n$ coming from a large (high-dimensional) space $\mathcal X$, find a representation function $f$ mapping $\mathcal X$ to a finite space $\mathcal Y$ such that the series $f(X_0),\dots,f(X_n)$ preserves as much information as possible about the original time-series dependence in $X_0,\dots,X_n$. We show that, for stationary time series, the function $f$ can be selected as the one maximizing a certain information criterion that we call time-series information. Some properties of this functions are investigated, including its uniqueness and consistency of its empirical estimates. Implications for the problem of optimal control are presented.

1.4MLFeb 14, 2013

A consistent clustering-based approach to estimating the number of change-points in highly dependent time-series

Azaden Khaleghi, Daniil Ryabko

The problem of change-point estimation is considered under a general framework where the data are generated by unknown stationary ergodic process distributions. In this context, the consistent estimation of the number of change-points is provably impossible. However, it is shown that a consistent clustering method may be used to estimate the number of change points, under the additional constraint that the correct number of process distributions that generate the data is provided. This additional parameter has a natural interpretation in many real-world applications. An algorithm is proposed that estimates the number of change-points and locates the changes. The proposed algorithm is shown to be asymptotically consistent; its empirical evaluations are provided.

11.5LGFeb 11, 2013

Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning

Odalric-Ambrym Maillard, Phuong Nguyen, Ronald Ortner et al.

We consider an agent interacting with an environment in a single stream of actions, observations, and rewards, with no reset. This process is not assumed to be a Markov Decision Process (MDP). Rather, the agent has several representations (mapping histories of past interactions to a discrete state space) of the environment with unknown dynamics, only some of which result in an MDP. The goal is to minimize the average regret criterion against an agent who knows an MDP representation giving the highest optimal reward, and acts optimally in it. Recent regret bounds for this setting are of order $O(T^{2/3})$ with an additive term constant yet exponential in some characteristics of the optimal MDP. We propose an algorithm whose regret after $T$ time steps is $O(\sqrt{T})$, with all constants reasonably small. This is optimal in $T$ since $O(\sqrt{T})$ is the optimal regret in the setting of learning in a (single discrete) MDP.

17.4LGFeb 11, 2013

Selecting the State-Representation in Reinforcement Learning

Odalric-Ambrym Maillard, Rémi Munos, Daniil Ryabko

The problem of selecting the right state-representation in a reinforcement learning problem is considered. Several models (functions mapping past observations to a finite set) of the observations are given, and it is known that for at least one of these models the resulting state dynamics are indeed Markovian. Without knowing neither which of the models is the correct one, nor what are the probabilistic characteristics of the resulting MDP, it is required to obtain as much reward as the optimal policy for the correct model (or for the best of the correct models, if there are several). We propose an algorithm that achieves that, with a regret of order T^{2/3} where T is the horizon time.

23.1LGFeb 11, 2013

Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

Ronald Ortner, Daniil Ryabko

We derive sublinear regret bounds for undiscounted reinforcement learning in continuous state space. The proposed algorithm combines state aggregation with the use of upper confidence bounds for implementing optimism in the face of uncertainty. Beside the existence of an optimal policy which satisfies the Poisson equation, the only assumptions made are Holder continuity of rewards and transition probabilities.

2.7LGOct 22, 2012

Reducing statistical time-series problems to binary classification

Daniil Ryabko, Jérémie Mary

We show how binary classification methods developed to work on i.i.d. data can be used for solving statistical problems that are seemingly unrelated to classification and concern highly-dependent time series. Specifically, the problems of time-series clustering, homogeneity testing and the three-sample problem are addressed. The algorithms that we construct for solving these problems are based on a new metric between time-series distributions, which can be evaluated using binary classification methods. Universal consistency of the proposed algorithms is proven under most general assumptions. The theoretical results are illustrated with experiments on synthetic and real-world data.

23.3LGSep 12, 2012

Regret Bounds for Restless Markov Bandits

Ronald Ortner, Daniil Ryabko, Peter Auer et al.

We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm that after $T$ steps achieves $\tilde{O}(\sqrt{T})$ regret with respect to the best policy that knows the distributions of all arms. No assumptions on the Markov chains are made except that they are irreducible. In addition, we show that index-based policies are necessarily suboptimal for the considered problem.

4.5MLMar 7, 2012

Multiple Change Point Estimation in Stationary Ergodic Time Series

Azadeh Khaleghi, Daniil Ryabko

Given a heterogeneous time-series sample, the objective is to find points in time (called change points) where the probability distribution generating the data has changed. The data are assumed to have been generated by arbitrary unknown stationary ergodic distributions. No modelling, independence or mixing assumptions are made. A novel, computationally efficient, nonparametric method is proposed, and is shown to be asymptotically consistent in this general framework. The theoretical results are complemented with experimental evaluations.