Steven de Rooij

LG
5papers
226citations
Novelty52%
AI Score25

5 Papers

MLOct 31, 2018
A tutorial on MDL hypothesis testing for graph analysis

Peter Bloem, Steven de Rooij

This document provides a tutorial description of the use of the MDL principle in complex graph analysis. We give a brief summary of the preliminary subjects, and describe the basic principle, using the example of analysing the size of the largest clique in a graph. We also provide a discussion of how to interpret the results of such an analysis, making note of several common pitfalls.

MLJun 9, 2017
An Expectation-Maximization Algorithm for the Fractal Inverse Problem

Peter Bloem, Steven de Rooij

We present an Expectation-Maximization algorithm for the fractal inverse problem: the problem of fitting a fractal model to data. In our setting the fractals are Iterated Function Systems (IFS), with similitudes as the family of transformations. The data is a point cloud in ${\mathbb R}^H$ with arbitrary dimension $H$. Each IFS defines a probability distribution on ${\mathbb R}^H$, so that the fractal inverse problem can be cast as a problem of parameter estimation. We show that the algorithm reconstructs well-known fractals from data, with the model converging to high precision parameters. We also show the utility of the model as an approximation for datasources outside the IFS model class.

LGJan 8, 2017
Large-scale network motif analysis using compression

Peter Bloem, Steven de Rooij

We introduce a new method for finding network motifs: interesting or informative subgraph patterns in a network. Subgraphs are motifs when their frequency in the data is high compared to the expected frequency under a null model. To compute this expectation, a full or approximate count of the occurrences of a motif is normally repeated on as many as 1000 random graphs sampled from the null model; a prohibitively expensive step. We use ideas from the Minimum Description Length (MDL) literature to define a new measure of motif relevance. With our method, samples from the null model are not required. Instead we compute the probability of the data under the null model and compare this to the probability under a specially designed alternative model. With this new relevance test, we can search for motifs by random sampling, rather than requiring an accurate count of all instances of a motif. This allows motif analysis to scale to networks with billions of links.

ITNov 26, 2013
Universal Codes from Switching Strategies

Wouter M. Koolen, Steven de Rooij

We discuss algorithms for combining sequential prediction strategies, a task which can be viewed as a natural generalisation of the concept of universal coding. We describe a graphical language based on Hidden Markov Models for defining prediction strategies, and we provide both existing and new models as examples. The models include efficient, parameterless models for switching between the input strategies over time, including a model for the case where switches tend to occur in clusters, and finally a new model for the scenario where the prediction strategies have a known relationship, and where jumps are typically between strongly related ones. This last model is relevant for coding time series data where parameter drift is expected. As theoretical ontributions we introduce an interpolation construction that is useful in the development and analysis of new algorithms, and we establish a new sophisticated lemma for analysing the individual sequence regret of parameterised models.

LGJan 3, 2013
Follow the Leader If You Can, Hedge If You Must

Steven de Rooij, Tim van Erven, Peter D. Grünwald et al.

Follow-the-Leader (FTL) is an intuitive sequential prediction strategy that guarantees constant regret in the stochastic setting, but has terrible performance for worst-case data. Other hedging strategies have better worst-case guarantees but may perform much worse than FTL if the data are not maximally adversarial. We introduce the FlipFlop algorithm, which is the first method that provably combines the best of both worlds. As part of our construction, we develop AdaHedge, which is a new way of dynamically tuning the learning rate in Hedge without using the doubling trick. AdaHedge refines a method by Cesa-Bianchi, Mansour and Stoltz (2007), yielding slightly improved worst-case guarantees. By interleaving AdaHedge and FTL, the FlipFlop algorithm achieves regret within a constant factor of the FTL regret, without sacrificing AdaHedge's worst-case guarantees. AdaHedge and FlipFlop do not need to know the range of the losses in advance; moreover, unlike earlier methods, both have the intuitive property that the issued weights are invariant under rescaling and translation of the losses. The losses are also allowed to be negative, in which case they may be interpreted as gains.