Mathijs M. de Weerdt

LG
h-index33
6papers
107citations
Novelty52%
AI Score40

6 Papers

LGSep 19, 2024
Optimal or Greedy Decision Trees? Revisiting their Objectives, Tuning, and Performance

Jacobus G. M. van der Linden, Daniël Vos, Mathijs M. de Weerdt et al.

Recently there has been a surge of interest in optimal decision tree (ODT) methods that globally optimize accuracy directly, in contrast to traditional approaches that locally optimize an impurity or information metric. However, the value of optimal methods is not well understood yet, as the literature provides conflicting results, with some demonstrating superior out-of-sample performance of ODTs over greedy approaches, while others show the opposite. Through a novel extensive experimental study, we provide new insights into the design and behavior of learning decision trees. In particular, we identify and analyze two relatively unexplored aspects of ODTs: the objective function used in training trees, and tuning techniques. Thus, we address these three questions: what objective to optimize in ODTs; how to tune ODTs; and how do optimal and greedy methods compare? Our experimental evaluation examines 11 objective functions, six tuning methods, and six claims from the literature on optimal and greedy methods on 180 real and synthetic data sets. Through our analysis, both conceptually and experimentally, we show the effect of (non-)concave objectives in greedy and optimal approaches; we highlight the importance of proper tuning of ODTs; support and refute several claims from the literature; provide clear recommendations for researchers and practitioners on the usage of greedy and optimal methods; and code for future comparisons.

LGFeb 21
VariBASed: Variational Bayes-Adaptive Sequential Monte-Carlo Planning for Deep Reinforcement Learning

Joery A. de Vries, Jinke He, Yaniv Oren et al.

Optimally trading-off exploration and exploitation is the holy grail of reinforcement learning as it promises maximal data-efficiency for solving any task. Bayes-optimal agents achieve this, but obtaining the belief-state and performing planning are both typically intractable. Although deep learning methods can greatly help in scaling this computation, existing methods are still costly to train. To accelerate this, this paper proposes a variational framework for learning and planning in Bayes-adaptive Markov decision processes that coalesces variational belief learning, sequential Monte-Carlo planning, and meta-reinforcement learning. In a single-GPU setup, our new method VariBASeD exhibits favorable scaling to larger planning budgets, improving sample- and runtime-efficiency over prior methods.

LGMay 24, 2025
Bayesian Meta-Reinforcement Learning with Laplace Variational Recurrent Networks

Joery A. de Vries, Jinke He, Mathijs M. de Weerdt et al.

Meta-reinforcement learning trains a single reinforcement learning agent on a distribution of tasks to quickly generalize to new tasks outside of the training set at test time. From a Bayesian perspective, one can interpret this as performing amortized variational inference on the posterior distribution over training tasks. Among the various meta-reinforcement learning approaches, a common method is to represent this distribution with a point-estimate using a recurrent neural network. We show how one can augment this point estimate to give full distributions through the Laplace approximation, either at the start of, during, or after learning, without modifying the base model architecture. With our approximation, we are able to estimate distribution statistics (e.g., the entropy) of non-Bayesian agents and observe that point-estimate based methods produce overconfident estimators while not satisfying consistency. Furthermore, when comparing our approach to full-distribution based learning of the task posterior, our method performs on par with variational baselines while having much fewer parameters.

LGMay 31, 2023
Necessary and Sufficient Conditions for Optimal Decision Trees using Dynamic Programming

Jacobus G. M. van der Linden, Mathijs M. de Weerdt, Emir Demirović

Global optimization of decision trees has shown to be promising in terms of accuracy, size, and consequently human comprehensibility. However, many of the methods used rely on general-purpose solvers for which scalability remains an issue. Dynamic programming methods have been shown to scale much better because they exploit the tree structure by solving subtrees as independent subproblems. However, this only works when an objective can be optimized separately for subtrees. We explore this relationship in detail and show the necessary and sufficient conditions for such separability and generalize previous dynamic programming approaches into a framework that can optimize any combination of separable objectives and constraints. Experiments on five application domains show the general applicability of this framework, while outperforming the scalability of general-purpose solvers by a large margin.

AINov 29, 2015
Solving Transition-Independent Multi-agent MDPs with Sparse Interactions (Extended version)

Joris Scharpff, Diederik M. Roijers, Frans A. Oliehoek et al.

In cooperative multi-agent sequential decision making under uncertainty, agents must coordinate to find an optimal joint policy that maximises joint value. Typical algorithms exploit additive structure in the value function, but in the fully-observable multi-agent MDP setting (MMDP) such structure is not present. We propose a new optimal solver for transition-independent MMDPs, in which agents can only affect their own state but their reward depends on joint transitions. We represent these dependencies compactly in conditional return graphs (CRGs). Using CRGs the value of a joint policy and the bounds on partially specified joint policies can be efficiently computed. We propose CoRe, a novel branch-and-bound policy search algorithm building on CRGs. CoRe typically requires less runtime than the available alternatives and finds solutions to problems previously unsolvable.

DSJan 18, 2014
Computing All-Pairs Shortest Paths by Leveraging Low Treewidth

Léon R. Planken, Mathijs M. de Weerdt, Roman P. J. van der Krogt

We present two new and efficient algorithms for computing all-pairs shortest paths. The algorithms operate on directed graphs with real (possibly negative) weights. They make use of directed path consistency along a vertex ordering d. Both algorithms run in O(n^2 w_d) time, where w_d is the graph width induced by this vertex ordering. For graphs of constant treewidth, this yields O(n^2) time, which is optimal. On chordal graphs, the algorithms run in O(nm) time. In addition, we present a variant that exploits graph separators to arrive at a run time of O(n w_d^2 + n^2 s_d) on general graphs, where s_d andlt= w_d is the size of the largest minimal separator induced by the vertex ordering d. We show empirically that on both constructed and realistic benchmarks, in many cases the algorithms outperform Floyd-Warshalls as well as Johnsons algorithm, which represent the current state of the art with a run time of O(n^3) and O(nm + n^2 log n), respectively. Our algorithms can be used for spatial and temporal reasoning, such as for the Simple Temporal Problem, which underlines their relevance to the planning and scheduling community.