LGAug 17, 2023
Uplift Modeling: from Causal Inference to PersonalizationFelipe Moraes, Hugo Manuel Proença, Anastasiia Kornilova et al.
Uplift modeling is a collection of machine learning techniques for estimating causal effects of a treatment at the individual or subgroup levels. Over the last years, causality and uplift modeling have become key trends in personalization at online e-commerce platforms, enabling the selection of the best treatment for each user in order to maximize the target business metric. Uplift modeling can be particularly useful for personalized promotional campaigns, where the potential benefit caused by a promotion needs to be weighed against the potential costs. In this tutorial we will cover basic concepts of causality and introduce the audience to state-of-the-art techniques in uplift modeling. We will discuss the advantages and the limitations of different approaches and dive into the unique setup of constrained uplift modeling. Finally, we will present real-life applications and discuss challenges in implementing these models in production.
CVApr 21, 2022
SmartPortraits: Depth Powered Handheld Smartphone Dataset of Human Portraits for State Estimation, Reconstruction and SynthesisAnastasiia Kornilova, Marsel Faizullin, Konstantin Pakulev et al.
We present a dataset of 1000 video sequences of human portraits recorded in real and uncontrolled conditions by using a handheld smartphone accompanied by an external high-quality depth camera. The collected dataset contains 200 people captured in different poses and locations and its main purpose is to bridge the gap between raw measurements obtained from a smartphone and downstream applications, such as state estimation, 3D reconstruction, view synthesis, etc. The sensors employed in data collection are the smartphone's camera and Inertial Measurement Unit (IMU), and an external Azure Kinect DK depth camera software synchronized with sub-millisecond precision to the smartphone system. During the recording, the smartphone flash is used to provide a periodic secondary source of lightning. Accurate mask of the foremost person is provided as well as its impact on the camera alignment accuracy. For evaluation purposes, we compare multiple state-of-the-art camera alignment methods by using a Motion Capture system. We provide a smartphone visual-inertial benchmark for portrait capturing, where we report results for multiple methods and motivate further use of the provided trajectories, available in the dataset, in view synthesis and 3D reconstruction tasks.
CVApr 12, 2022
EVOPS Benchmark: Evaluation of Plane Segmentation from RGBD and LiDAR DataAnastasiia Kornilova, Dmitrii Iarosh, Denis Kukushkin et al.
This paper provides the EVOPS dataset for plane segmentation from 3D data, both from RGBD images and LiDAR point clouds. We have designed two annotation methodologies (RGBD and LiDAR) running on well-known and widely-used datasets for SLAM evaluation and we have provided a complete set of benchmarking tools including point, planes and segmentation metrics. The data includes a total number of 10k RGBD and 7K LiDAR frames over different selected scenes which consist of high quality segmented planes. The experiments report quality of SOTA methods for RGBD plane segmentation on our annotated data. We also have provided learnable baseline for plane segmentation in LiDAR point clouds. All labeled data and benchmark tools used have been made publicly available at https://evops.netlify.app/.
CVMar 9, 2023
EVOLIN Benchmark: Evaluation of Line Detection and AssociationKirill Ivanov, Gonzalo Ferrer, Anastasiia Kornilova
Lines are interesting geometrical features commonly seen in indoor and urban environments. There is missing a complete benchmark where one can evaluate lines from a sequential stream of images in all its stages: Line detection, Line Association and Pose error. To do so, we present a complete and exhaustive benchmark for visual lines in a SLAM front-end, both for RGB and RGBD, by providing a plethora of complementary metrics. We have also labelled data from well-known SLAM datasets in order to have all in one poses and accurately annotated lines. In particular, we have evaluated 17 line detection algorithms, 5 line associations methods and the resultant pose error for aligning a pair of frames with several combinations of detector-association. We have packaged all methods and evaluations metrics and made them publicly available on web-page https://prime-slam.github.io/evolin/.
CVMar 9, 2023
Dominating Set Database Selection for Visual Place RecognitionAnastasiia Kornilova, Ivan Moskalenko, Timofei Pushkin et al.
This paper presents an approach for creating a visual place recognition (VPR) database for localization in indoor environments from RGBD scanning sequences. The proposed approach is formulated as a minimization problem in terms of dominating set algorithm for graph, constructed from spatial information, and referred as DominatingSet. Our algorithm shows better scene coverage in comparison to other methodologies that are used for database creation. Also, we demonstrate that using DominatingSet, a database size could be up to 250-1400 times smaller than the original scanning sequence while maintaining a recall rate of more than 80% on testing sequences. We evaluated our algorithm on 7-scenes and BundleFusion datasets and an additionally recorded sequence in a highly repetitive office setting. In addition, the database selection can produce weakly-supervised labels for fine-tuning neural place recognition algorithms to particular settings, improving even more their accuracy. The paper also presents a fully automated pipeline for VPR database creation from RGBD scanning sequences, as well as a set of metrics for VPR database evaluation. The code and released data are available on our web-page~ -- https://prime-slam.github.io/place-recognition-db/
CVMar 24, 2024Code
AutoInst: Automatic Instance-Based Segmentation of LiDAR 3D ScansCedric Perauer, Laurenz Adrian Heidrich, Haifan Zhang et al.
Recently, progress in acquisition equipment such as LiDAR sensors has enabled sensing increasingly spacious outdoor 3D environments. Making sense of such 3D acquisitions requires fine-grained scene understanding, such as constructing instance-based 3D scene segmentations. Commonly, a neural network is trained for this task; however, this requires access to a large, densely annotated dataset, which is widely known to be challenging to obtain. To address this issue, in this work we propose to predict instance segmentations for 3D scenes in an unsupervised way, without relying on ground-truth annotations. To this end, we construct a learning framework consisting of two components: (1) a pseudo-annotation scheme for generating initial unsupervised pseudo-labels; and (2) a self-training algorithm for instance segmentation to fit robust, accurate instances from initial noisy proposals. To enable generating 3D instance mask proposals, we construct a weighted proxy-graph by connecting 3D points with edges integrating multi-modal image- and point-based self-supervised features, and perform graph-cuts to isolate individual pseudo-instances. We then build on a state-of-the-art point-based architecture and train a 3D instance segmentation model, resulting in significant refinement of initial proposals. To scale to arbitrary complexity 3D scenes, we design our algorithm to operate on local 3D point chunks and construct a merging step to generate scene-level instance segmentations. Experiments on the challenging SemanticKITTI benchmark demonstrate the potential of our approach, where it attains 13.3% higher Average Precision and 9.1% higher F1 score compared to the best-performing baseline. The code will be made publicly available at https://github.com/artonson/autoinst.
CVFeb 9
Thegra: Graph-based SLAM for Thermal ImageryAnastasiia Kornilova, Ivan Moskalenko, Arabella Gromova et al.
Thermal imaging provides a practical sensing modality for visual SLAM in visually degraded environments such as low illumination, smoke, or adverse weather. However, thermal imagery often exhibits low texture, low contrast, and high noise, complicating feature-based SLAM. In this work, we propose a sparse monocular graph-based SLAM system for thermal imagery that leverages general-purpose learned features -- the SuperPoint detector and LightGlue matcher, trained on large-scale visible-spectrum data to improve cross-domain generalization. To adapt these components to thermal data, we introduce a preprocessing pipeline to enhance input suitability and modify core SLAM modules to handle sparse and outlier-prone feature matches. We further incorporate keypoint confidence scores from SuperPoint into a confidence-weighted factor graph to improve estimation robustness. Evaluations on public thermal datasets demonstrate that the proposed system achieves reliable performance without requiring dataset-specific training or fine-tuning a desired feature detector, given the scarcity of quality thermal data. Code will be made available upon publication.
CVJun 2, 2024Code
Visual place recognition for aerial imagery: A surveyIvan Moskalenko, Anastasiia Kornilova, Gonzalo Ferrer
Aerial imagery and its direct application to visual localization is an essential problem for many Robotics and Computer Vision tasks. While Global Navigation Satellite Systems (GNSS) are the standard default solution for solving the aerial localization problem, it is subject to a number of limitations, such as, signal instability or solution unreliability that make this option not so desirable. Consequently, visual geolocalization is emerging as a viable alternative. However, adapting Visual Place Recognition (VPR) task to aerial imagery presents significant challenges, including weather variations and repetitive patterns. Current VPR reviews largely neglect the specific context of aerial data. This paper introduces a methodology tailored for evaluating VPR techniques specifically in the domain of aerial imagery, providing a comprehensive assessment of various methods and their performance. However, we not only compare various VPR methods, but also demonstrate the importance of selecting appropriate zoom and overlap levels when constructing map tiles to achieve maximum efficiency of VPR algorithms in the case of aerial imagery. The code is available on our GitHub repository -- https://github.com/prime-slam/aero-vloc.
CVNov 5, 2021Code
SmartDepthSync: Open Source Synchronized Video Recording System of Smartphone RGB and Depth Camera Range Image Frames with Sub-millisecond PrecisionMarsel Faizullin, Anastasiia Kornilova, Azat Akhmetyanov et al.
Nowadays, smartphones can produce a synchronized (synced) stream of high-quality data, including RGB images, inertial measurements, and other data. Therefore, smartphones are becoming appealing sensor systems in the robotics community. Unfortunately, there is still the need for external supporting sensing hardware, such as a depth camera precisely synced with the smartphone sensors. In this paper, we propose a hardware-software recording system that presents a heterogeneous structure and contains a smartphone and an external depth camera for recording visual, depth, and inertial data that are mutually synchronized. The system is synced at the time and the frame levels: every RGB image frame from the smartphone camera is exposed at the same moment of time with a depth camera frame with sub-millisecond precision. We provide a method and a tool for sync performance evaluation that can be applied to any pair of depth and RGB cameras. Our system could be replicated, modified, or extended by employing our open-sourced materials.
ROJul 6, 2021Code
Open-Source LiDAR Time Synchronization System by Mimicking GNSS-clockMarsel Faizullin, Anastasiia Kornilova, Gonzalo Ferrer
Data fusion algorithms that employ LiDAR measurements, such as Visual-LiDAR, LiDAR-Inertial, or Multiple LiDAR Odometry and simultaneous localization and mapping (SLAM) rely on precise timestamping schemes that grant synchronicity to data from LiDAR and other sensors. Poor synchronization performance, due to incorrect timestamping procedure, may negatively affect the algorithms' state estimation results. To provide highly accurate and precise synchronization between the sensors, we introduce an open-source hardware-software LiDAR to other sensors time synchronization system that exploits a dedicated hardware LiDAR time synchronization interface by providing emulated GNSS-clock to this interface, no physical GNSS-receiver is needed. The emulator is based on a general-purpose microcontroller and, due to concise hardware and software architecture, can be easily modified or extended for synchronization of sets of different sensors such as cameras, inertial measurement units (IMUs), wheel encoders, other LiDARs, etc. In the paper, we provide an example of such a system with synchronized LiDAR and IMU sensors. We conducted an evaluation of the sensors synchronization accuracy and precision, and state 1 microsecond performance. We compared our results with timestamping provided by ROS software and by a LiDAR inner clocking scheme to underline clear advantages over these two baseline methods.
CVMar 26, 2024
DeepMIF: Deep Monotonic Implicit Fields for Large-Scale LiDAR 3D MappingKutay Yılmaz, Matthias Nießner, Anastasiia Kornilova et al.
Recently, significant progress has been achieved in sensing real large-scale outdoor 3D environments, particularly by using modern acquisition equipment such as LiDAR sensors. Unfortunately, they are fundamentally limited in their ability to produce dense, complete 3D scenes. To address this issue, recent learning-based methods integrate neural implicit representations and optimizable feature grids to approximate surfaces of 3D scenes. However, naively fitting samples along raw LiDAR rays leads to noisy 3D mapping results due to the nature of sparse, conflicting LiDAR measurements. Instead, in this work we depart from fitting LiDAR data exactly, instead letting the network optimize a non-metric monotonic implicit field defined in 3D space. To fit our field, we design a learning system integrating a monotonicity loss that enables optimizing neural monotonic fields and leverages recent progress in large-scale 3D mapping. Our algorithm achieves high-quality dense 3D mapping performance as captured by multiple quantitative and perceptual measures and visual results obtained for Mai City, Newer College, and KITTI benchmarks. The code of our approach will be made publicly available.
CVJul 2, 2021
Sub-millisecond Video Synchronization of Multiple Android SmartphonesAzat Akhmetyanov, Anastasiia Kornilova, Marsel Faizullin et al.
This paper addresses the problem of building an affordable easy-to-setup synchronized multi-view camera system, which is in demand for many Computer Vision and Robotics applications in high-dynamic environments. In our work, we propose a solution for this problem -- a publicly-available Android application for synchronized video recording on multiple smartphones with sub-millisecond accuracy. We present a generalized mathematical model of timestamping for Android smartphones and prove its applicability on 47 different physical devices. Also, we estimate the time drift parameter for those smartphones, which is less than 1.2 msec per minute for most of the considered devices, that makes smartphones' camera system a worthy analog for professional multi-view systems. Finally, we demonstrate Android-app performance on the camera system built from Android smartphones quantitatively on setup with lights and qualitatively -- on panorama stitching task.
ROJun 21, 2021
Be your own Benchmark: No-Reference Trajectory Metric on Registered Point CloudsAnastasiia Kornilova, Gonzalo Ferrer
This paper addresses the problem of assessing trajectory quality in conditions when no ground truth poses are available or when their accuracy is not enough for the specific task - for example, small-scale mapping in outdoor scenes. In our work, we propose a no-reference metric, Mutually Orthogonal Metric (MOM), that estimates the quality of the map from registered point clouds via the trajectory poses. MOM strongly correlates with full-reference trajectory metric Relative Pose Error, making it a trajectory benchmarking tool on setups where 3D sensing technologies are employed. We provide a mathematical foundation for such correlation and confirm it statistically in synthetic environments. Furthermore, since our metric uses a subset of points from mutually orthogonal surfaces, we provide an algorithm for the extraction of such subset and evaluate its performance in synthetic CARLA environment and on KITTI dataset. The code of the proposed metric is publicly available as pip-package.
IRJan 26, 2021
Mining the Stars: Learning Quality Ratings with User-facing Explanations for Vacation RentalsAnastasiia Kornilova, Lucas Bernardi
Online Travel Platforms are virtual two-sided marketplaces where guests search for accommodations and accommodation providers list their properties such as hotels and vacation rentals. The large majority of hotels are rated by official institutions with a number of stars indicating the quality of service they provide. It is a simple and effective mechanism that contributes to match supply with demand by helping guests to find options meeting their criteria and accommodation suppliers to market their product to the right segment directly impacting the number of transactions on the platform. Unfortunately, no similar rating system exists for the large majority of vacation rentals, making it difficult for guests to search and compare options and hard for vacation rentals suppliers to market their product effectively. In this work we describe a machine learned quality rating system for vacation rentals. The problem is challenging, mainly due to explainability requirements and the lack of ground truth. We present techniques to address these challenges and empirical evidence of their efficacy. Our system was successfully deployed and validated through Online Controlled Experiments performed in Booking. com, a large Online Travel Platform, and running for more than one year, impacting more than a million accommodations and millions of guests.