Pieter Robberechts

LG
h-index29
8papers
160citations
Novelty44%
AI Score37

8 Papers

CVJun 25, 2025
What Makes a Dribble Successful? Insights From 3D Pose Tracking Data

Michiel Schepers, Pieter Robberechts, Jan Van Haaren et al.

Data analysis plays an increasingly important role in soccer, offering new ways to evaluate individual and team performance. One specific application is the evaluation of dribbles: one-on-one situations where an attacker attempts to bypass a defender with the ball. While previous research has primarily relied on 2D positional tracking data, this fails to capture aspects like balance, orientation, and ball control, limiting the depth of current insights. This study explores how pose tracking data (capturing players' posture and movement in three dimensions) can improve our understanding of dribbling skills. We extract novel pose-based features from 1,736 dribbles in the 2022/23 Champions League season and evaluate their impact on dribble success. Our results indicate that features capturing the attacker's balance and the alignment of the orientation between the attacker and defender are informative for predicting dribble success. Incorporating these pose-based features on top of features derived from traditional 2D positional data leads to a measurable improvement in model performance.

LGJun 18, 2025
Warping and Matching Subsequences Between Time Series

Simiao Lin, Wannes Meert, Pieter Robberechts et al.

Comparing time series is essential in various tasks such as clustering and classification. While elastic distance measures that allow warping provide a robust quantitative comparison, a qualitative comparison on top of them is missing. Traditional visualizations focus on point-to-point alignment and do not convey the broader structural relationships at the level of subsequences. This limitation makes it difficult to understand how and where one time series shifts, speeds up or slows down with respect to another. To address this, we propose a novel technique that simplifies the warping path to highlight, quantify and visualize key transformations (shift, compression, difference in amplitude). By offering a clearer representation of how subsequences match between time series, our method enhances interpretability in time series comparison.

LGJan 18, 2024
Biases in Expected Goals Models Confound Finishing Ability

Jesse Davis, Pieter Robberechts

Expected Goals (xG) has emerged as a popular tool for evaluating finishing skill in soccer analytics. It involves comparing a player's cumulative xG with their actual goal output, where consistent overperformance indicates strong finishing ability. However, the assessment of finishing skill in soccer using xG remains contentious due to players' difficulty in consistently outperforming their cumulative xG. In this paper, we aim to address the limitations and nuances surrounding the evaluation of finishing skill using xG statistics. Specifically, we explore three hypotheses: (1) the deviation between actual and expected goals is an inadequate metric due to the high variance of shot outcomes and limited sample sizes, (2) the inclusion of all shots in cumulative xG calculation may be inappropriate, and (3) xG models contain biases arising from interdependencies in the data that affect skill measurement. We found that sustained overperformance of cumulative xG requires both high shot volumes and exceptional finishing, including all shot types can obscure the finishing ability of proficient strikers, and that there is a persistent bias that makes the actual and expected goals closer for excellent finishers than it really is. Overall, our analysis indicates that we need more nuanced quantitative approaches for investigating a player's finishing ability, which we achieved using a technique from AI fairness to learn an xG model that is calibrated for multiple subgroups of players. As a concrete use case, we show that (1) the standard biased xG model underestimates Messi's GAX by 17% and (2) Messi's GAX is 27% higher than the typical elite high-shot-volume attacker, indicating that Messi is even a more exceptional finisher than people commonly believed.

LGJan 4, 2022
Elastic Product Quantization for Time Series

Pieter Robberechts, Wannes Meert, Jesse Davis

Analyzing numerous or long time series is difficult in practice due to the high storage costs and computational requirements. Therefore, techniques have been proposed to generate compact similarity-preserving representations of time series, enabling real-time similarity search on large in-memory data collections. However, the existing techniques are not ideally suited for assessing similarity when sequences are locally out of phase. In this paper, we propose the use of product quantization for efficient similarity-based comparison of time series under time warping. The idea is to first compress the data by partitioning the time series into equal length sub-sequences which are represented by a short code. The distance between two time series can then be efficiently approximated by pre-computed elastic distances between their codes. The partitioning into sub-sequences forces unwanted alignments, which we address with a pre-alignment step using the maximal overlap discrete wavelet transform (MODWT). To demonstrate the efficiency and accuracy of our method, we perform an extensive experimental evaluation on benchmark datasets in nearest neighbors classification and clustering applications. Overall, the proposed solution emerges as a highly efficient (both in terms of memory usage and computation time) replacement for elastic measures in time series applications.

AIApr 7, 2021
Leaving Goals on the Pitch: Evaluating Decision Making in Soccer

Maaike Van Roy, Pieter Robberechts, Wen-Chi Yang et al.

Analysis of the popular expected goals (xG) metric in soccer has determined that a (slightly) smaller number of high-quality attempts will likely yield more goals than a slew of low-quality ones. This observation has driven a change in shooting behavior. Teams are passing up on shots from outside the penalty box, in the hopes of generating a better shot closer to goal later on. This paper evaluates whether this decrease in long-distance shots is warranted. Therefore, we propose a novel generic framework to reason about decision-making in soccer by combining techniques from machine learning and artificial intelligence (AI). First, we model how a team has behaved offensively over the course of two seasons by learning a Markov Decision Process (MDP) from event stream data. Second, we use reasoning techniques arising from the AI literature on verification to each team's MDP. This allows us to reason about the efficacy of certain potential decisions by posing counterfactual questions to the MDP. Our key conclusion is that teams would score more goals if they shot more often from outside the penalty box in a small number of team-specific locations. The proposed framework can easily be extended and applied to analyze other aspects of the game.

LGOct 29, 2019
Predicting gait events from tibial acceleration in rearfoot running: a structured machine learning approach

Pieter Robberechts, Rud Derie, Pieter Van den Berghe et al.

Gait event detection of the initial contact and toe off is essential for running gait analysis, allowing the derivation of parameters such as stance time. Heuristic-based methods exist to estimate these key gait events from tibial accelerometry. However, these methods are tailored to very specific acceleration profiles, which may offer complications when dealing with larger data sets and inherent biological variability. Therefore, this paper investigates whether a structured machine learning approach can achieve a more accurate prediction of running gait event timings from tibial accelerometry. Force-based event detection acted as the criterion measure in order to assess the accuracy, repeatability and sensitivity of the predicted gait events. A heuristic method and two structured machine learning methods were employed to derive initial contact, toe off and stance time from tibial acceleration signals. Both a structured perceptron model (median absolute error of stance time estimation: 10.00 $\pm$ 8.73 ms) and a structured recurrent neural network model (median absolute error of stance time estimation: 6.50 $\pm$ 5.74 ms) significantly outperformed the existing heuristic approach (median absolute error of stance time estimation: 11.25 $\pm$ 9.52 ms) on data from 93 rearfoot runners. Thus, results indicate that a structured recurrent neural network machine learning model offers the most accurate and consistent estimation of the gait events and its derived stance time during level overground running. The machine learning methods seem less affected by intra- and inter-subject variation within the data, allowing for accurate and efficient automated data output during rearfoot overground running. Furthermore offering possibilities for real-time monitoring and biofeedback during prolonged measurements, even outside the laboratory.

LGJun 12, 2019
A Bayesian Approach to In-Game Win Probability in Soccer

Pieter Robberechts, Jan Van Haaren, Jesse Davis

In-game win probability models, which provide a sports team's likelihood of winning at each point in a game based on historical observations, are becoming increasingly popular. In baseball, basketball and American football, they have become important tools to enhance fan experience, to evaluate in-game decision-making, and to inform coaching decisions. While equally relevant in soccer, the adoption of these models is held back by technical challenges arising from the low-scoring nature of the sport. In this paper, we introduce an in-game win probability model for soccer that addresses the shortcomings of existing models. First, we demonstrate that in-game win probability models for other sports struggle to provide accurate estimates for soccer, especially towards the end of a game. Second, we introduce a novel Bayesian statistical framework that estimates running win, tie and loss probabilities by leveraging a set of contextual game state features. An empirical evaluation on eight seasons of data for the top-five soccer leagues demonstrates that our framework provides well-calibrated probabilities. Furthermore, two use cases show its ability to enhance fan experience and to evaluate performance in crucial game situations.

LGSep 10, 2018
Beyond the Selected Completely At Random Assumption for Learning from Positive and Unlabeled Data

Jessa Bekker, Pieter Robberechts, Jesse Davis

Most positive and unlabeled data is subject to selection biases. The labeled examples can, for example, be selected from the positive set because they are easier to obtain or more obviously positive. This paper investigates how learning can be ena BHbled in this setting. We propose and theoretically analyze an empirical-risk-based method for incorporating the labeling mechanism. Additionally, we investigate under which assumptions learning is possible when the labeling mechanism is not fully understood and propose a practical method to enable this. Our empirical analysis supports the theoretical results and shows that taking into account the possibility of a selection bias, even when the labeling mechanism is unknown, improves the trained classifiers.