Jan Van Haaren

LG
h-index16
7papers
341citations
Novelty39%
AI Score33

7 Papers

DBFeb 6, 2025
Common Data Format (CDF): A Standardized Format for Match-Data in Football (Soccer)

Gabriel Anzer, Kilian Arnsmeyer, Pascal Bauer et al.

During football matches, a variety of different parties (e.g., companies) each collect (possibly overlapping) data about the match ranging from basic information (e.g., starting players) to detailed positional data. This data is provided to clubs, federations, and other organizations who are increasingly interested in leveraging this data to inform their decision making. Unfortunately, analyzing such data pose significant barriers because each provider may (1) collect different data, (2) use different specifications even within the same category of data, (3) represent the data differently, and (4) delivers the data in a different manner (e.g., file format, protocol). Consequently, working with these data requires a significant investment of time and money. The goal of this work is to propose a uniform and standardized format for football data called the Common Data Format (CDF). The CDF specifies a minimal schema for five types of match data: match sheet data, video footage, event data, tracking data, and match meta data. It aims to ensure that the provided data is clear, sufficiently contextualized (e.g., its provenance is clear), and complete such that it enables common downstream analysis tasks. Concretely, this paper will detail the technical specifications of the CDF, the representational choices that were made to help ensure the clarity of the provided data, and a concrete approach for delivering data in the CDF. This represents Version 1.0.0 of the CDF.

CVJun 25, 2025
What Makes a Dribble Successful? Insights From 3D Pose Tracking Data

Michiel Schepers, Pieter Robberechts, Jan Van Haaren et al.

Data analysis plays an increasingly important role in soccer, offering new ways to evaluate individual and team performance. One specific application is the evaluation of dribbles: one-on-one situations where an attacker attempts to bypass a defender with the ball. While previous research has primarily relied on 2D positional tracking data, this fails to capture aspects like balance, orientation, and ball control, limiting the depth of current insights. This study explores how pose tracking data (capturing players' posture and movement in three dimensions) can improve our understanding of dribbling skills. We extract novel pose-based features from 1,736 dribbles in the 2022/23 Champions League season and evaluate their impact on dribble success. Our results indicate that features capturing the attacker's balance and the alignment of the orientation between the attacker and defender are informative for predicting dribble success. Incorporating these pose-based features on top of features derived from traditional 2D positional data leads to a measurable improvement in model performance.

LGMay 27, 2021
"Why Would I Trust Your Numbers?" On the Explainability of Expected Values in Soccer

Jan Van Haaren

In recent years, many different approaches have been proposed to quantify the performances of soccer players. Since player performances are challenging to quantify directly due to the low-scoring nature of soccer, most approaches estimate the expected impact of the players' on-the-ball actions on the scoreline. While effective, these approaches are yet to be widely embraced by soccer practitioners. The soccer analytics community has primarily focused on improving the accuracy of the models, while the explainability of the produced metrics is often much more important to practitioners. To help bridge the gap between scientists and practitioners, we introduce an explainable Generalized Additive Model that estimates the expected value for shots. Unlike existing models, our model leverages features corresponding to widespread soccer concepts. To this end, we represent the locations of shots by fuzzily assigning the shots to designated zones on the pitch that practitioners are familiar with. Our experimental evaluation shows that our model is as accurate as existing models, while being easier to explain to soccer practitioners.

LGJun 12, 2019
A Bayesian Approach to In-Game Win Probability in Soccer

Pieter Robberechts, Jan Van Haaren, Jesse Davis

In-game win probability models, which provide a sports team's likelihood of winning at each point in a game based on historical observations, are becoming increasingly popular. In baseball, basketball and American football, they have become important tools to enhance fan experience, to evaluate in-game decision-making, and to inform coaching decisions. While equally relevant in soccer, the adoption of these models is held back by technical challenges arising from the low-scoring nature of the sport. In this paper, we introduce an in-game win probability model for soccer that addresses the shortcomings of existing models. First, we demonstrate that in-game win probability models for other sports struggle to provide accurate estimates for soccer, especially towards the end of a game. Second, we introduce a novel Bayesian statistical framework that estimates running win, tie and loss probabilities by leveraging a set of contextual game state features. An empirical evaluation on eight seasons of data for the top-five soccer leagues demonstrates that our framework provides well-calibrated probabilities. Furthermore, two use cases show its ability to enhance fan experience and to evaluate performance in crucial game situations.

MLSep 13, 2018
Distinguishing Between Roles of Football Players in Play-by-play Match Event Data

Bart Aalbers, Jan Van Haaren

Over the last few decades, the player recruitment process in professional football has evolved into a multi-billion industry and has thus become of vital importance. To gain insights into the general level of their candidate reinforcements, many professional football clubs have access to extensive video footage and advanced statistics. However, the question whether a given player would fit the team's playing style often still remains unanswered. In this paper, we aim to bridge that gap by proposing a set of 21 player roles and introducing a method for automatically identifying the most applicable roles for each player from play-by-play event data collected during matches.

APSep 13, 2018
Measuring Football Players' On-the-ball Contributions From Passes During Games

Lotte Bransen, Jan Van Haaren

Several performance metrics for quantifying the in-game performances of individual football players have been proposed in recent years. Although the majority of the on-the-ball actions during games constitutes of passes, many of the currently available metrics focus on measuring the quality of shots only. To help bridge this gap, we propose a novel approach to measure players' on-the-ball contributions from passes during games. Our proposed approach measures the expected impact of each pass on the scoreline.

APFeb 18, 2018
Actions Speak Louder Than Goals: Valuing Player Actions in Soccer

Tom Decroos, Lotte Bransen, Jan Van Haaren et al.

Assessing the impact of the individual actions performed by soccer players during games is a crucial aspect of the player recruitment process. Unfortunately, most traditional metrics fall short in addressing this task as they either focus on rare actions like shots and goals alone or fail to account for the context in which the actions occurred. This paper introduces (1) a new language for describing individual player actions on the pitch and (2) a framework for valuing any type of player action based on its impact on the game outcome while accounting for the context in which the action happened. By aggregating soccer players' action values, their total offensive and defensive contributions to their team can be quantified. We show how our approach considers relevant contextual information that traditional player evaluation metrics ignore and present a number of use cases related to scouting and playing style characterization in the 2016/2017 and 2017/2018 seasons in Europe's top competitions.