Sebastian Müksch

h-index42
2papers

2 Papers

CVJan 6, 2025
WorldPose: A World Cup Dataset for Global 3D Human Pose Estimation

Tianjian Jiang, Johsan Billingham, Sebastian Müksch et al.

We present WorldPose, a novel dataset for advancing research in multi-person global pose estimation in the wild, featuring footage from the 2022 FIFA World Cup. While previous datasets have primarily focused on local poses, often limited to a single person or in constrained, indoor settings, the infrastructure deployed for this sporting event allows access to multiple fixed and moving cameras in different stadiums. We exploit the static multi-view setup of HD cameras to recover the 3D player poses and motions with unprecedented accuracy given capture areas of more than 1.75 acres. We then leverage the captured players' motions and field markings to calibrate a moving broadcasting camera. The resulting dataset comprises more than 80 sequences with approx 2.5 million 3D poses and a total traveling distance of over 120 km. Subsequently, we conduct an in-depth analysis of the SOTA methods for global pose estimation. Our experiments demonstrate that WorldPose challenges existing multi-person techniques, supporting the potential for new research in this area and others, such as sports analysis. All pose annotations (in SMPL format), broadcasting camera parameters and footage will be released for academic research purposes.

CVMay 11, 2020
Quantitative Analysis of Image Classification Techniques for Memory-Constrained Devices

Sebastian Müksch, Theo Olausson, John Wilhelm et al.

Convolutional Neural Networks, or CNNs, are the state of the art for image classification, but typically come at the cost of a large memory footprint. This limits their usefulness in applications relying on embedded devices, where memory is often a scarce resource. Recently, there has been significant progress in the field of image classification on such memory-constrained devices, with novel contributions like the ProtoNN, Bonsai and FastGRNN algorithms. These have been shown to reach up to 98.2% accuracy on optical character recognition using MNIST-10, with a memory footprint as little as 6KB. However, their potential on more complex multi-class and multi-channel image classification has yet to be determined. In this paper, we compare CNNs with ProtoNN, Bonsai and FastGRNN when applied to 3-channel image classification using CIFAR-10. For our analysis, we use the existing Direct Convolution algorithm to implement the CNNs memory-optimally and propose new methods of adjusting the FastGRNN model to work with multi-channel images. We extend the evaluation of each algorithm to a memory size budget of 8KB, 16KB, 32KB, 64KB and 128KB to show quantitatively that Direct Convolution CNNs perform best for all chosen budgets, with a top performance of 65.7% accuracy at a memory footprint of 58.23KB.