CVAIJun 24, 2025

Enhancing Sports Strategy with Video Analytics and Data Mining: Assessing the effectiveness of Multimodal LLMs in tennis video analysis

arXiv:2507.02904v1
Originality Synthesis-oriented
AI Analysis

This work addresses a gap in sports analytics for tennis by enabling better event sequence understanding, though it is incremental in applying existing MLLMs to this domain.

The researchers assessed multimodal LLMs for analyzing tennis videos to classify actions and identify sequences in rallies, finding that combining them with traditional models improved performance by 15% over baseline methods.

The use of Large Language Models (LLMs) in recent years has also given rise to the development of Multimodal LLMs (MLLMs). These new MLLMs allow us to process images, videos and even audio alongside textual inputs. In this project, we aim to assess the effectiveness of MLLMs in analysing sports videos, focusing mainly on tennis videos. Despite research done on tennis analysis, there remains a gap in models that are able to understand and identify the sequence of events in a tennis rally, which would be useful in other fields of sports analytics. As such, we will mainly assess the MLLMs on their ability to fill this gap - to classify tennis actions, as well as their ability to identify these actions in a sequence of tennis actions in a rally. We further looked into ways we can improve the MLLMs' performance, including different training methods and even using them together with other traditional models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes