CVNov 29, 2022Code
Challenging the Universal Representation of Deep Models for 3D Point Cloud RegistrationDavid Bojanić, Kristijan Bartol, Josep Forest et al.
Learning universal representations across different applications domain is an open research problem. In fact, finding universal architecture within the same application but across different types of datasets is still unsolved problem too, especially in applications involving processing 3D point clouds. In this work we experimentally test several state-of-the-art learning-based methods for 3D point cloud registration against the proposed non-learning baseline registration method. The proposed method either outperforms or achieves comparable results w.r.t. learning based methods. In addition, we propose a dataset on which learning based methods have a hard time to generalize. Our proposed method and dataset, along with the provided experiments, can be used in further research in studying effective solutions for universal representations. Our source code is available at: github.com/DavidBoja/greedy-grid-search.
CVJan 10, 2025Code
Pose-independent 3D Anthropometry from Sparse DataDavid Bojanić, Stefanie Wuhrer, Tomislav Petković et al.
3D digital anthropometry is the study of estimating human body measurements from 3D scans. Precise body measurements are important health indicators in the medical industry, and guiding factors in the fashion, ergonomic and entertainment industries. The measuring protocol consists of scanning the whole subject in the static A-pose, which is maintained without breathing or movement during the scanning process. However, the A-pose is not easy to maintain during the whole scanning process, which can last even up to a couple of minutes. This constraint affects the final quality of the scan, which in turn affects the accuracy of the estimated body measurements obtained from methods that rely on dense geometric data. Additionally, this constraint makes it impossible to develop a digital anthropometry method for subjects unable to assume the A-pose, such as those with injuries or disabilities. We propose a method that can obtain body measurements from sparse landmarks acquired in any pose. We make use of the sparse landmarks of the posed subject to create pose-independent features, and train a network to predict the body measurements as taken from the standard A-pose. We show that our method achieves comparable results to competing methods that use dense geometry in the standard A-pose, but has the capability of estimating the body measurements from any pose using sparse landmarks only. Finally, we address the lack of open-source 3D anthropometry methods by making our method available to the research community at https://github.com/DavidBoja/pose-independent-anthropometry.
CVJul 20, 2020Code
On the Comparison of Classic and Deep Keypoint Detector and Descriptor MethodsKristijan Bartol, David Bojanić, Tomislav Pribanić et al.
The purpose of this study is to give a performance comparison between several classic hand-crafted and deep key-point detector and descriptor methods. In particular, we consider the following classical algorithms: SIFT, SURF, ORB, FAST, BRISK, MSER, HARRIS, KAZE, AKAZE, AGAST, GFTT, FREAK, BRIEF and RootSIFT, where a subset of all combinations is paired into detector-descriptor pipelines. Additionally, we analyze the performance of two recent and perspective deep detector-descriptor models, LF-Net and SuperPoint. Our benchmark relies on the HPSequences dataset that provides real and diverse images under various geometric and illumination changes. We analyze the performance on three evaluation tasks: keypoint verification, image matching and keypoint retrieval. The results show that certain classic and deep approaches are still comparable, with some classic detector-descriptor combinations overperforming pretrained deep models. In terms of the execution times of tested implementations, SuperPoint model is the fastest, followed by ORB. The source code is published on \url{https://github.com/kristijanbartol/keypoint-algorithms-benchmark}.
CVOct 1, 2021
Generalizable Human Pose TriangulationKristijan Bartol, David Bojanić, Tomislav Petković et al.
We address the problem of generalizability for multi-view 3D human pose estimation. The standard approach is to first detect 2D keypoints in images and then apply triangulation from multiple views. Even though the existing methods achieve remarkably accurate 3D pose estimation on public benchmarks, most of them are limited to a single spatial camera arrangement and their number. Several methods address this limitation but demonstrate significantly degraded performance on novel views. We propose a stochastic framework for human pose triangulation and demonstrate a superior generalization across different camera arrangements on two public datasets. In addition, we apply the same approach to the fundamental matrix estimation problem, showing that the proposed method can successfully apply to other computer vision problems. The stochastic framework achieves more than 8.8% improvement on the 3D pose estimation task, compared to the state-of-the-art, and more than 30% improvement for fundamental matrix estimation, compared to a standard algorithm.
CVSep 24, 2021
Catadioptric Stereo on a SmartphoneKristijan Bartol, David Bojanić, Tomislav Petković et al.
We present a 3D printed adapter with planar mirrors for stereo reconstruction using front and back smartphone camera. The adapter presents a practical and low-cost solution for enabling any smartphone to be used as a stereo camera, which is currently only possible using high-end phones with expensive 3D sensors. Using the prototype version of the adapter, we experiment with parameters like the angles between cameras and mirrors and the distance to each camera (the stereo baseline). We find the most convenient configuration and calibrate the stereo pair. Based on the presented preliminary analysis, we identify possible improvements in the current design. To demonstrate the working prototype, we reconstruct a 3D human pose using 2D keypoint detections from the stereo pair and evaluate extracted body lengths. The result shows that the adapter can be used for anthropometric measurement of several body segments.
ROJan 14, 2021
Ensemble of LSTMs and feature selection for human action predictionTomislav Petković, Luka Petrović, Ivan Marković et al.
As robots are becoming more and more ubiquitous in human environments, it will be necessary for robotic systems to better understand and predict human actions. However, this is not an easy task, at times not even for us humans, but based on a relatively structured set of possible actions, appropriate cues, and the right model, this problem can be computationally tackled. In this paper, we propose to use an ensemble of long-short term memory (LSTM) networks for human action prediction. To train and evaluate models, we used the MoGaze dataset - currently the most comprehensive dataset capturing poses of human joints and the human gaze. We have thoroughly analyzed the MoGaze dataset and selected a reduced set of cues for this task. Our model can predict (i) which of the labeled objects the human is going to grasp, and (ii) which of the macro locations the human is going to visit (such as table or shelf). We have exhaustively evaluated the proposed method and compared it to individual cue baselines. The results suggest that our LSTM model slightly outperforms the gaze baseline in single object picking accuracy, but achieves better accuracy in macro object prediction. Furthermore, we have also analyzed the prediction accuracy when the gaze is not used, and in this case, the LSTM model considerably outperformed the best single cue baseline
ROMay 22, 2020
Human Intention Recognition for Human Aware Planning in Integrated Warehouse SystemsTomislav Petković, Jakub Hvězda, Tomáš Rybecký et al.
With the substantial growth of logistics businesses the need for larger and more automated warehouses increases, thus giving rise to fully robotized shop-floors with mobile robots in charge of transporting and distributing goods. However, even in fully automatized warehouse systems the need for human intervention frequently arises, whether because of maintenance or because of fulfilling specific orders, thus bringing mobile robots and humans ever closer in an integrated warehouse environment. In order to ensure smooth and efficient operation of such a warehouse, paths of both robots and humans need to be carefully planned; however, due to the possibility of humans deviating from the assigned path, this becomes an even more challenging task. Given that, the supervising system should be able to recognize human intentions and its alternative paths in real-time. In this paper, we propose a framework for human deviation detection and intention recognition which outputs the most probable paths of the humans workers and the planner that acts accordingly by replanning for robots to move out of the human's path. Experimental results demonstrate that the proposed framework increases total number of deliveries, especially human deliveries, and reduces human-robot encounters.
RONov 20, 2018
Human Intention Estimation based on Hidden Markov Model Motion Validation for Safe Flexible Robotized WarehousesTomislav Petković, David Puljiz, Ivan Marković et al.
With the substantial growth of logistics businesses the need for larger warehouses and their automation arises, thus using robots as assistants to human workers is becoming a priority. In order to operate efficiently and safely, robot assistants or the supervising system should recognize human intentions in real-time. Theory of mind (ToM) is an intuitive human conception of other humans' mental state, i.e., beliefs and desires, and how they cause behavior. In this paper we propose a ToM based human intention estimation algorithm for flexible robotized warehouses. We observe human's, i.e., worker's motion and validate it with respect to the goal locations using generalized Voronoi diagram based path planning. These observations are then processed by the proposed hidden Markov model framework which estimates worker intentions in an online manner, capable of handling changing environments. To test the proposed intention estimation we ran experiments in a real-world laboratory warehouse with a worker wearing Microsoft Hololens augmented reality glasses. Furthermore, in order to demonstrate the scalability of the approach to larger warehouses, we propose to use virtual reality digital warehouse twins in order to realistically simulate worker behavior. We conducted intention estimation experiments in the larger warehouse digital twin with up to 24 running robots. We demonstrate that the proposed framework estimates warehouse worker intentions precisely and in the end we discuss the experimental results.
ROApr 5, 2018
Human Intention Recognition in Flexible Robotized Warehouses based on Markov Decision ProcessesTomislav Petković, Ivan Marković, Ivan Petrović
The rapid growth of e-commerce increases the need for larger warehouses and their automation, thus using robots as assistants to human workers becomes a priority. In order to operate efficiently and safely, robot assistants or the supervising system should recognize human intentions. Theory of mind (ToM) is an intuitive conception of other agents' mental state, i.e., beliefs and desires, and how they cause behavior. In this paper we present a ToM-based algorithm for human intention recognition in flexible robotized warehouses. We have placed the warehouse worker in a simulated 2D environment with three potential goals. We observe agent's actions and validate them with respect to the goal locations using a Markov decision process framework. Those observations are then processed by the proposed hidden Markov model framework which estimated agent's desires. We demonstrate that the proposed framework predicts human warehouse worker's desires in an intuitive manner and in the end we discuss the simulation results.
CVOct 16, 2015
An Extension to Hough Transform Based on Gradient OrientationTomislav Petković, Sven Lončarić
The Hough transform is one of the most common methods for line detection. In this paper we propose a novel extension of the regular Hough transform. The proposed extension combines the extension of the accumulator space and the local gradient orientation resulting in clutter reduction and yielding more prominent peaks, thus enabling better line identification. We demonstrate benefits in applications such as visual quality inspection and rectangle detection.
CVOct 1, 2013
Flexible Visual Quality Inspection in Discrete ManufacturingTomislav Petković, Darko Jurić, Sven Lončarić
Most visual quality inspections in discrete manufacturing are composed of length, surface, angle or intensity measurements. Those are implemented as end-user configurable inspection tools that should not require an image processing expert to set up. Currently available software solutions providing such capability use a flowchart based programming environment, but do not fully address an inspection flowchart robustness and can require a redefinition of the flowchart if a small variation is introduced. In this paper we propose an acquire-register-analyze image processing pattern designed for discrete manufacturing that aims to increase the robustness of the inspection flowchart by consistently addressing variations in product position, orientation and size. A proposed pattern is transparent to the end-user and simplifies the flowchart. We describe a developed software solution that is a practical implementation of the proposed pattern. We give an example of its real-life use in industrial production of electric components.