DBAIFeb 6, 2025

Common Data Format (CDF): A Standardized Format for Match-Data in Football (Soccer)

arXiv:2505.15820v46 citationsh-index: 16
Originality Synthesis-oriented
AI Analysis

This addresses data integration challenges for football clubs, federations, and organizations, but it is incremental as it standardizes existing data types rather than introducing new analytical methods.

The paper tackles the problem of fragmented and inconsistent football match data from various providers by proposing the Common Data Format (CDF), a standardized format for five types of match data to reduce barriers in analysis.

During football matches, a variety of different parties (e.g., companies) each collect (possibly overlapping) data about the match ranging from basic information (e.g., starting players) to detailed positional data. This data is provided to clubs, federations, and other organizations who are increasingly interested in leveraging this data to inform their decision making. Unfortunately, analyzing such data pose significant barriers because each provider may (1) collect different data, (2) use different specifications even within the same category of data, (3) represent the data differently, and (4) delivers the data in a different manner (e.g., file format, protocol). Consequently, working with these data requires a significant investment of time and money. The goal of this work is to propose a uniform and standardized format for football data called the Common Data Format (CDF). The CDF specifies a minimal schema for five types of match data: match sheet data, video footage, event data, tracking data, and match meta data. It aims to ensure that the provided data is clear, sufficiently contextualized (e.g., its provenance is clear), and complete such that it enables common downstream analysis tasks. Concretely, this paper will detail the technical specifications of the CDF, the representational choices that were made to help ensure the clarity of the provided data, and a concrete approach for delivering data in the CDF. This represents Version 1.0.0 of the CDF.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes