47.8CVMay 10
SoccerLens: Grounded Soccer Video Understanding Beyond AccuracyIsmael Elsharkawi, Ahmed Sait, Silvio Giancola et al.
Vision-language models (VLMs) have recently shown strong potential in soccer video understanding. However, given the high complexity of soccer videos due to large viewpoint variations, rapid shot transitions, and cluttered scenes, it remains unclear on whether VLMs rely on meaningful visual evidence or exploit spurious correlations and shortcut learning. Existing evaluation protocols focus primarily on classification accuracy and do not assess visual grounding. To address this limitation, we introduce SoccerLens, a benchmark for grounded soccer video understanding. The benchmark contains annotated video segments spanning $13$ common soccer events, with structured visual cues organized into three levels of semantic relevance. We further extend the attribution method of Chefer [arXiv:2103.15679] to jointly model spatial and temporal attention, and introduce evaluation metrics that measure whether model attention aligns with annotated cues or drifts toward spurious regions. Our evaluation of state-of-the-art soccer VLMs shows that, despite strong classification accuracy, current models fail to exceed $50\%$ grounding performance even under the loosest cue definitions and consistently underutilize temporal information. These results reveal a substantial gap between predictive performance and true visual grounding, highlighting the need for grounded evaluation in complex spatio-temporal domains such as soccer.
CVSep 23, 2025
ViG-LRGC: Vision Graph Neural Networks with Learnable Reparameterized Graph ConstructionIsmael Elsharkawi, Hossam Sharara, Ahmed Rafea
Image Representation Learning is an important problem in Computer Vision. Traditionally, images were processed as grids, using Convolutional Neural Networks or as a sequence of visual tokens, using Vision Transformers. Recently, Vision Graph Neural Networks (ViG) have proposed the treatment of images as a graph of nodes; which provides a more intuitive image representation. The challenge is to construct a graph of nodes in each layer that best represents the relations between nodes and does not need a hyper-parameter search. ViG models in the literature depend on non-parameterized and non-learnable statistical methods that operate on the latent features of nodes to create a graph. This might not select the best neighborhood for each node. Starting from k-NN graph construction to HyperGraph Construction and Similarity-Thresholded graph construction, these methods lack the ability to provide a learnable hyper-parameter-free graph construction method. To overcome those challenges, we present the Learnable Reparameterized Graph Construction (LRGC) for Vision Graph Neural Networks. LRGC applies key-query attention between every pair of nodes; then uses soft-threshold reparameterization for edge selection, which allows the use of a differentiable mathematical model for training. Using learnable parameters to select the neighborhood removes the bias that is induced by any clustering or thresholding methods previously introduced in the literature. In addition, LRGC allows tuning the threshold in each layer to the training data since the thresholds are learnable through training and are not provided as hyper-parameters to the model. We demonstrate that the proposed ViG-LRGC approach outperforms state-of-the-art ViG models of similar sizes on the ImageNet-1k benchmark dataset.
DCDec 22, 2021
FLoBC: A Decentralized Blockchain-Based Federated Learning FrameworkMohamed Ghanem, Fadi Dawoud, Habiba Gamal et al.
The rapid expansion of data worldwide invites the need for more distributed solutions in order to apply machine learning on a much wider scale. The resultant distributed learning systems can have various degrees of centralization. In this work, we demonstrate our solution FLoBC for building a generic decentralized federated learning system using blockchain technology, accommodating any machine learning model that is compatible with gradient descent optimization. We present our system design comprising the two decentralized actors: trainer and validator, alongside our methodology for ensuring reliable and efficient operation of said system. Finally, we utilize FLoBC as an experimental sandbox to compare and contrast the effects of trainer-to-validator ratio, reward-penalty policy, and model synchronization schemes on the overall system performance, ultimately showing by example that a decentralized federated learning system is indeed a feasible alternative to more centralized architectures.