ROApr 3, 2023
Motion Capture Benchmark of Real Industrial Tasks and Traditional Crafts for Human Movement AnalysisBrenda Elizabeth Olivas-Padilla, Alina Glushkova, Sotiris Manitsaris
Human movement analysis is a key area of research in robotics, biomechanics, and data science. It encompasses tracking, posture estimation, and movement synthesis. While numerous methodologies have evolved over time, a systematic and quantitative evaluation of these approaches using verifiable ground truth data of three-dimensional human movement is still required to define the current state of the art. This paper presents seven datasets recorded using inertial-based motion capture. The datasets contain professional gestures carried out by industrial operators and skilled craftsmen performed in real conditions in-situ. The datasets were created with the intention of being used for research in human motion modeling, analysis, and generation. The protocols for data collection are described in detail, and a preliminary analysis of the collected data is provided as a benchmark. The Gesture Operational Model, a hybrid stochastic-biomechanical approach based on kinematic descriptors, is utilized to model the dynamics of the experts' movements and create mathematical representations of their motion trajectories for analysis and quantifying their body dexterity. The models allowed accurate the generation of human professional poses and an intuitive description of how body joints cooperate and change over time through the performance of the task.
ROMar 21, 2022
Computational ergonomics for task delegation in Human-Robot Collaboration: spatiotemporal adaptation of the robot to the human through contactless gesture recognitionBrenda Elizabeth Olivas-Padilla, Dimitris Papanagiotou, Gavriela Senteri et al.
The high prevalence of work-related musculoskeletal disorders (WMSDs) could be addressed by optimizing Human-Robot Collaboration (HRC) frameworks for manufacturing applications. In this context, this paper proposes two hypotheses for ergonomically effective task delegation and HRC. The first hypothesis states that it is possible to quantify ergonomically professional tasks using motion data from a reduced set of sensors. Then, the most dangerous tasks can be delegated to a collaborative robot. The second hypothesis is that by including gesture recognition and spatial adaptation, the ergonomics of an HRC scenario can be improved by avoiding needless motions that could expose operators to ergonomic risks and by lowering the physical effort required of operators. An HRC scenario for a television manufacturing process is optimized to test both hypotheses. For the ergonomic evaluation, motion primitives with known ergonomic risks were modeled for their detection in professional tasks and to estimate a risk score based on the European Assembly Worksheet (EAWS). A Deep Learning gesture recognition module trained with egocentric television assembly data was used to complement the collaboration between the human operator and the robot. Additionally, a skeleton-tracking algorithm provided the robot with information about the operator's pose, allowing it to spatially adapt its motion to the operator's anthropometrics. Three experiments were conducted to determine the effect of gesture recognition and spatial adaptation on the operator's range of motion. The rate of spatial adaptation was used as a key performance indicator (KPI), and a new KPI for measuring the reduction in the operator's motion is presented in this paper.
CVApr 13, 2023
Deep state-space modeling for explainable representation, analysis, and generation of professional human posesBrenda Elizabeth Olivas-Padilla, Alina Glushkova, Sotiris Manitsaris
The analysis of human movements has been extensively studied due to its wide variety of practical applications, such as human-robot interaction, human learning applications, or clinical diagnosis. Nevertheless, the state-of-the-art still faces scientific challenges when modeling human movements. To begin, new models must account for the stochasticity of human movement and the physical structure of the human body in order to accurately predict the evolution of full-body motion descriptors over time. Second, while utilizing deep learning algorithms, their explainability in terms of body posture predictions needs to be improved as they lack comprehensible representations of human movement. This paper addresses these challenges by introducing three novel methods for creating explainable representations of human movement. In this study, human body movement is formulated as a state-space model adhering to the structure of the Gesture Operational Model (GOM), whose parameters are estimated through the application of deep learning and statistical algorithms. The trained models are used for the full-body dexterity analysis of expert professionals, in which dynamic associations between body joints are identified, and for generating artificially professional movements.
CVFeb 21, 2025
Game State and Spatio-temporal Action Detection in Soccer using Graph Neural Networks and 3D Convolutional NetworksJeremie Ochin, Guillaume Devineau, Bogdan Stanciulescu et al.
Soccer analytics rely on two data sources: the player positions on the pitch and the sequences of events they perform. With around 2000 ball events per game, their precise and exhaustive annotation based on a monocular video stream remains a tedious and costly manual task. While state-of-the-art spatio-temporal action detection methods show promise for automating this task, they lack contextual understanding of the game. Assuming professional players' behaviors are interdependent, we hypothesize that incorporating surrounding players' information such as positions, velocity and team membership can enhance purely visual predictions. We propose a spatio-temporal action detection approach that combines visual and game state information via Graph Neural Networks trained end-to-end with state-of-the-art 3D CNNs, demonstrating improved metrics through game state integration.
CVMay 14, 2025
Beyond Pixels: Leveraging the Language of Soccer to Improve Spatio-Temporal Action Detection in Broadcast VideosJeremie Ochin, Raphael Chekroun, Bogdan Stanciulescu et al.
State-of-the-art spatio-temporal action detection (STAD) methods show promising results for extracting soccer events from broadcast videos. However, when operated in the high-recall, low-precision regime required for exhaustive event coverage in soccer analytics, their lack of contextual understanding becomes apparent: many false positives could be resolved by considering a broader sequence of actions and game-state information. In this work, we address this limitation by reasoning at the game level and improving STAD through the addition of a denoising sequence transduction task. Sequences of noisy, context-free player-centric predictions are processed alongside clean game state information using a Transformer-based encoder-decoder model. By modeling extended temporal context and reasoning jointly over team-level dynamics, our method leverages the "language of soccer" - its tactical regularities and inter-player dependencies - to generate "denoised" sequences of actions. This approach improves both precision and recall in low-confidence regimes, enabling more reliable event extraction from broadcast video and complementing existing pixel-based methods.
AINov 20, 2025
FOOTPASS: A Multi-Modal Multi-Agent Tactical Context Dataset for Play-by-Play Action Spotting in Soccer Broadcast VideosJeremie Ochin, Raphael Chekroun, Bogdan Stanciulescu et al.
Soccer video understanding has motivated the creation of datasets for tasks such as temporal action localization, spatiotemporal action detection (STAD), or multiobject tracking (MOT). The annotation of structured sequences of events (who does what, when, and where) used for soccer analytics requires a holistic approach that integrates both STAD and MOT. However, current action recognition methods remain insufficient for constructing reliable play-by-play data and are typically used to assist rather than fully automate annotation. Parallel research has advanced tactical modeling, trajectory forecasting, and performance analysis, all grounded in game-state and play-by-play data. This motivates leveraging tactical knowledge as a prior to support computer-vision-based predictions, enabling more automated and reliable extraction of play-by-play data. We introduce Footovision Play-by-Play Action Spotting in Soccer Dataset (FOOTPASS), the first benchmark for play-by-play action spotting over entire soccer matches in a multi-modal, multi-agent tactical context. It enables the development of methods for player-centric action spotting that exploit both outputs from computer-vision tasks (e.g., tracking, identification) and prior knowledge of soccer, including its tactical regularities over long time horizons, to generate reliable play-by-play data streams. These streams form an essential input for data-driven sports analytics.
CVAug 5, 2019
3D Reconstruction of Deformable Revolving Object under Heavy Hand InteractionRaoul de Charette, Sotiris Manitsaris
We reconstruct 3D deformable object through time, in the context of a live pottery making process where the crafter molds the object. Because the object suffers from heavy hand interaction, and is being deformed, classical techniques cannot be applied. We use particle energy optimization to estimate the object profile and benefit of the object radial symmetry to increase the robustness of the reconstruction to both occlusion and noise. Our method works with an unconstrained scalable setup with one or more depth sensors. We evaluate on our database (released upon publication) on a per-frame and temporal basis and shows it significantly outperforms state-of-the-art achieving 7.60mm average object reconstruction error. Further ablation studies demonstrate the effectiveness of our method.