Dmytro Mishkin

CV
h-index17
23papers
4,839citations
Novelty43%
AI Score36

23 Papers

CVFeb 20, 2023Code
A Large Scale Homography Benchmark

Daniel Barath, Dmytro Mishkin, Michal Polic et al.

We present a large-scale dataset of Planes in 3D, Pi3D, of roughly 1000 planes observed in 10 000 images from the 1DSfM dataset, and HEB, a large-scale homography estimation benchmark leveraging Pi3D. The applications of the Pi3D dataset are diverse, e.g. training or evaluating monocular depth, surface normal estimation and image matching algorithms. The HEB dataset consists of 226 260 homographies and includes roughly 4M correspondences. The homographies link images that often undergo significant viewpoint and illumination changes. As applications of HEB, we perform a rigorous evaluation of a wide range of robust estimators and deep learning-based correspondence filtering methods, establishing the current state-of-the-art in robust homography estimation. We also evaluate the uncertainty of the SIFT orientations and scales w.r.t. the ground truth coming from the underlying homographies and provide codes for comparing uncertainty of custom detectors. The dataset is available at \url{https://github.com/danini/homography-benchmark}.

CVApr 19, 2022Code
OpenGlue: Open Source Graph Neural Net Based Pipeline for Image Matching

Ostap Viniavskyi, Mariia Dobko, Dmytro Mishkin et al.

We present OpenGlue: a free open-source framework for image matching, that uses a Graph Neural Network-based matcher inspired by SuperGlue \cite{sarlin20superglue}. We show that including additional geometrical information, such as local feature scale, orientation, and affine geometry, when available (e.g. for SIFT features), significantly improves the performance of the OpenGlue matcher. We study the influence of the various attention mechanisms on accuracy and speed. We also present a simple architectural improvement by combining local descriptors with context-aware descriptors. The code and pretrained OpenGlue models for the different local features are publicly available.

CVJul 28, 2023
AffineGlue: Joint Matching and Robust Estimation

Daniel Barath, Dmytro Mishkin, Luca Cavalli et al.

We propose AffineGlue, a method for joint two-view feature matching and robust estimation that reduces the combinatorial complexity of the problem by employing single-point minimal solvers. AffineGlue selects potential matches from one-to-many correspondences to estimate minimal models. Guided matching is then used to find matches consistent with the model, suffering less from the ambiguities of one-to-one matches. Moreover, we derive a new minimal solver for homography estimation, requiring only a single affine correspondence (AC) and a gravity prior. Furthermore, we train a neural network to reject ACs that are unlikely to lead to a good model. AffineGlue is superior to the SOTA on real-world datasets, even when assuming that the gravity direction points downwards. On PhotoTourism, the AUC@10° score is improved by 6.6 points compared to the SOTA. On ScanNet, AffineGlue makes SuperPoint and SuperGlue achieve similar accuracy as the detector-free LoFTR.

CVJul 29, 2022
Matching with AffNet based rectifications

Václav Vávra, Dmytro Mishkin, Jiří Matas

We consider the problem of two-view matching under significant viewpoint changes with view synthesis. We propose two novel methods, minimizing the view synthesis overhead. The first one, named DenseAffNet, uses dense affine shapes estimates from AffNet, which allows it to partition the image, rectifying each partition with just a single affine map. The second one, named DepthAffNet, combines information from depth maps and affine shapes estimates to produce different sets of rectifying affine maps for different image partitions. DenseAffNet is faster than the state-of-the-art and more accurate on generic scenes. DepthAffNet is on par with the state of the art on scenes containing large planes. The evaluation is performed on 3 public datasets - EVD Dataset, Strong ViewPoint Changes Dataset and IMC Phototourism Dataset.

CVMar 11, 2025Code
Explaining Human Preferences via Metrics for Structured 3D Reconstruction

Jack Langerman, Denys Rozumnyi, Yuzhong Huang et al.

"What cannot be measured cannot be improved" while likely never uttered by Lord Kelvin, summarizes effectively the driving force behind this work. This paper presents a detailed discussion of automated metrics for evaluating structured 3D reconstructions. Pitfalls of each metric are discussed, and an analysis through the lens of expert 3D modelers' preferences is presented. A set of systematic "unit tests" are proposed to empirically verify desirable properties, and context aware recommendations regarding which metric to use depending on application are provided. Finally, a learned metric distilled from human expert judgments is proposed and analyzed. The source code is available at https://github.com/s23dr/wireframe-metrics-iccv2025

CVMar 3, 2020Code
Image Matching across Wide Baselines: From Paper to Practice

Yuhe Jin, Dmytro Mishkin, Anastasiia Mishchuk et al.

We introduce a comprehensive benchmark for local features and robust estimation algorithms, focusing on the downstream task -- the accuracy of the reconstructed camera pose -- as our primary metric. Our pipeline's modular structure allows easy integration, configuration, and combination of different methods and heuristics. This is demonstrated by embedding dozens of popular algorithms and evaluating them, from seminal works to the cutting edge of machine learning research. We show that with proper settings, classical solutions may still outperform the perceived state of the art. Besides establishing the actual state of the art, the conducted experiments reveal unexpected properties of Structure from Motion (SfM) pipelines that can help improve their performance, for both algorithmic and learned methods. Data and code are online https://github.com/vcg-uvic/image-matching-benchmark, providing an easy-to-use and flexible framework for the benchmarking of local features and robust estimation methods, both alongside and against top-performing methods. This work provides a basis for the Image Matching Challenge https://vision.uvic.ca/image-matching-challenge.

CVOct 5, 2019Code
Kornia: an Open Source Differentiable Computer Vision Library for PyTorch

Edgar Riba, Dmytro Mishkin, Daniel Ponsa et al.

This work presents Kornia -- an open source computer vision library which consists of a set of differentiable routines and modules to solve generic computer vision problems. The package uses PyTorch as its main backend both for efficiency and to take advantage of the reverse-mode auto-differentiation to define and compute the gradient of complex functions. Inspired by OpenCV, Kornia is composed of a set of modules containing operators that can be inserted inside neural networks to train models to perform image transformations, camera calibration, epipolar geometry, and low level image processing techniques, such as filtering and edge detection that operate directly on high dimensional tensor representations. Examples of classical vision problems implemented using our framework are provided including a benchmark comparing to existing vision libraries.

CVNov 19, 2017Code
DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks

Orest Kupyn, Volodymyr Budzan, Mykola Mykhailych et al.

We present DeblurGAN, an end-to-end learned method for motion deblurring. The learning is based on a conditional GAN and the content loss . DeblurGAN achieves state-of-the art performance both in the structural similarity measure and visual appearance. The quality of the deblurring model is also evaluated in a novel way on a real-world problem -- object detection on (de-)blurred images. The method is 5 times faster than the closest competitor -- DeepDeblur. We also introduce a novel method for generating synthetic motion blurred images from sharp ones, allowing realistic dataset augmentation. The model, code and the dataset are available at https://github.com/KupynOrest/DeblurGAN

CVNov 17, 2017Code
Repeatability Is Not Enough: Learning Affine Regions via Discriminability

Dmytro Mishkin, Filip Radenovic, Jiri Matas

A method for learning local affine-covariant regions is presented. We show that maximizing geometric repeatability does not lead to local regions, a.k.a features,that are reliably matched and this necessitates descriptor-based learning. We explore factors that influence such learning and registration: the loss function, descriptor type, geometric parametrization and the trade-off between matchability and geometric accuracy and propose a novel hard negative-constant loss function for learning of affine regions. The affine shape estimator -- AffNet -- trained with the hard negative-constant loss outperforms the state-of-the-art in bag-of-words image retrieval and wide baseline stereo. The proposed training process does not require precisely geometrically aligned patches.The source codes and trained weights are available at https://github.com/ducha-aiki/affnet

CVDec 22, 2021
Learning and Crafting for the Wide Multiple Baseline Stereo

Dmytro Mishkin

This thesis introduces the wide multiple baseline stereo (WxBS) problem. WxBS, a generalization of the standard wide baseline stereo problem, considers the matching of images that simultaneously differ in more than one image acquisition factor such as viewpoint, illumination, sensor type, or where object appearance changes significantly, e.g., over time. A new dataset with the ground truth, evaluation metric and baselines has been introduced. The thesis presents the following improvements of the WxBS pipeline. (i) A loss function, called HardNeg, for learning a local image descriptor that relies on hard negative mining within a mini-batch and on the maximization of the distance between the closest positive and the closest negative patches. (ii) The descriptor trained with the HardNeg loss, called HardNet, is compact and shows state-of-the-art performance in standard matching, patch verification and retrieval benchmarks. (iii) A method for learning the affine shape, orientation, and potentially other parameters related to geometric and appearance properties of local features. (iv) A tentative correspondences generation strategy which generalizes the standard first to second closest distance ratio is presented. The selection strategy, which shows performance superior to the standard method, is applicable to either hard-engineered descriptors like SIFT, LIOP, and MROGH or deeply learned like HardNet. (v) A feedback loop is introduced for the two-view matching problem, resulting in MODS -- matching with on-demand view synthesis -- algorithm. MODS is an algorithm that handles a viewing angle difference even larger than the previous state-of-the-art ASIFT algorithm, without a significant increase of computational cost over "standard" wide and narrow baseline approaches. Last, but not least, a comprehensive benchmark for local features and robust estimation algorithms is introduced.

CVSep 27, 2021
HarrisZ$^+$: Harris Corner Selection for Next-Gen Image Matching Pipelines

Fabio Bellavia, Dmytro Mishkin

Due to its role in many computer vision tasks, image matching has been subjected to an active investigation by researchers, which has lead to better and more discriminant feature descriptors and to more robust matching strategies, also thanks to the advent of the deep learning and the increased computational power of the modern hardware. Despite of these achievements, the keypoint extraction process at the base of the image matching pipeline has not seen equivalent progresses. This paper presents HarrisZ$^+$, an upgrade to the HarrisZ corner detector, optimized to synergically take advance of the recent improvements of the other steps of the image matching pipeline. HarrisZ$^+$ does not only consists of a tuning of the setup parameters, but introduces further refinements to the selection criteria delineated by HarrisZ, so providing more, yet discriminative, keypoints, which are better distributed on the image and with higher localization accuracy. The image matching pipeline including HarrisZ$^+$, together with the other modern components, obtained in different recent matching benchmarks state-of-the-art results among the classic image matching pipelines. These results are quite close to those obtained by the more recent fully deep end-to-end trainable approaches and show that there is still a proper margin of improvement that can be granted by the research in classic image matching methods.

CVNov 24, 2020
Efficient Initial Pose-graph Generation for Global SfM

Daniel Barath, Dmytro Mishkin, Ivan Eichhardt et al.

We propose ways to speed up the initial pose-graph generation for global Structure-from-Motion algorithms. To avoid forming tentative point correspondences by FLANN and geometric verification by RANSAC, which are the most time-consuming steps of the pose-graph creation, we propose two new methods - built on the fact that image pairs usually are matched consecutively. Thus, candidate relative poses can be recovered from paths in the partly-built pose-graph. We propose a heuristic for the A* traversal, considering global similarity of images and the quality of the pose-graph edges. Given a relative pose from a path, descriptor-based feature matching is made "light-weight" by exploiting the known epipolar geometry. To speed up PROSAC-based sampling when RANSAC is applied, we propose a third method to order the correspondences by their inlier probabilities from previous estimations. The algorithms are tested on 402130 image pairs from the 1DSfM dataset and they speed up the feature matching 17 times and pose estimation 5 times.

CVNov 19, 2020
Differentiable Data Augmentation with Kornia

Jian Shi, Edgar Riba, Dmytro Mishkin et al.

In this paper we present a review of the Kornia differentiable data augmentation (DDA) module for both for spatial (2D) and volumetric (3D) tensors. This module leverages differentiable computer vision solutions from Kornia, with an aim of integrating data augmentation (DA) pipelines and strategies to existing PyTorch components (e.g. autograd for differentiability, optim for optimization). In addition, we provide a benchmark comparing different DA frameworks and a short review for a number of approaches that make use of Kornia DDA.

DLOct 11, 2020
ArXiving Before Submission Helps Everyone

Dmytro Mishkin, Amy Tabb, Jiri Matas

We claim, and present evidence, that allowing arXiv publication before a conference or journal submission benefits researchers, especially early career, as well as the whole scientific community. Specifically, arXiving helps professional identity building, protects against independent re-discovery, idea theft and gate-keeping; it facilitates open research result distribution and reduces inequality. The advantages dwarf the drawbacks -- mainly the relative increase in acceptance rate of papers of well-known authors -- which studies show to be marginal. Analyzing the pros and cons of arXiving papers, we conclude that requiring preprints be anonymous is nearly as detrimental as not allowing them. We see no reasons why anyone but the authors should decide whether to arXiv or not.

CVJan 30, 2019
Benchmarking Classic and Learned Navigation in Complex 3D Environments

Dmytro Mishkin, Alexey Dosovitskiy, Vladlen Koltun

Navigation research is attracting renewed interest with the advent of learning-based methods. However, this new line of work is largely disconnected from well-established classic navigation approaches. In this paper, we take a step towards coordinating these two directions of research. We set up classic and learning-based navigation systems in common simulated environments and thoroughly evaluate them in indoor spaces of varying complexity, with access to different sensory modalities. Additionally, we measure human performance in the same environments. We find that a classic pipeline, when properly tuned, can perform very well in complex cluttered environments. On the other hand, learned systems can operate more robustly with a limited sensor suite. Overall, both approaches are still far from human-level performance.

CVJan 28, 2019
Leveraging Outdoor Webcams for Local Descriptor Learning

Milan Pultar, Dmytro Mishkin, Jiří Matas

We present AMOS Patches, a large set of image cut-outs, intended primarily for the robustification of trainable local feature descriptors to illumination and appearance changes. Images contributing to AMOS Patches originate from the AMOS dataset of recordings from a large set of outdoor webcams. The semiautomatic method used to generate AMOS Patches is described. It includes camera selection, viewpoint clustering and patch selection. For training, we provide both the registered full source images as well as the patches. A new descriptor, trained on the AMOS Patches and 6Brown datasets, is introduced. It achieves state-of-the-art in matching under illumination changes on standard benchmarks.

CVMay 30, 2017
Working hard to know your neighbor's margins: Local descriptor learning loss

Anastasiya Mishchuk, Dmytro Mishkin, Filip Radenovic et al.

We introduce a novel loss for learning local feature descriptors which is inspired by the Lowe's matching criterion for SIFT. We show that the proposed loss that maximizes the distance between the closest positive and closest negative patch in the batch is better than complex regularization methods; it works well for both shallow and deep convolution network architectures. Applying the novel loss to the L2Net CNN architecture results in a compact descriptor -- it has the same dimensionality as SIFT (128) that shows state-of-art performance in wide baseline stereo, patch verification and instance retrieval benchmarks. It is fast, computing a descriptor takes about 1 millisecond on a low-end GPU.

CVAug 24, 2016
In the Saddle: Chasing Fast and Repeatable Features

Javier Aldana-Iuit, Dmytro Mishkin, Ondrej Chum et al.

A novel similarity-covariant feature detector that extracts points whose neighbourhoods, when treated as a 3D intensity surface, have a saddle-like intensity profile. The saddle condition is verified efficiently by intensity comparisons on two concentric rings that must have exactly two dark-to-bright and two bright-to-dark transitions satisfying certain geometric constraints. Experiments show that the Saddle features are general, evenly spread and appearing in high density in a range of images. The Saddle detector is among the fastest proposed. In comparison with detector with similar speed, the Saddle features show superior matching performance on number of challenging datasets.

NEJun 7, 2016
Systematic evaluation of CNN advances on the ImageNet

Dmytro Mishkin, Nikolay Sergievskiy, Jiri Matas

The paper systematically studies the impact of a range of recent advances in CNN architectures and learning methods on the object categorization (ILSVRC) problem. The evalution tests the influence of the following choices of the architecture: non-linearity (ReLU, ELU, maxout, compatibility with batch normalization), pooling variants (stochastic, max, average, mixed), network width, classifier design (convolutional, fully-connected, SPP), image pre-processing, and of learning parameters: learning rate, batch size, cleanliness of the data, etc. The performance gains of the proposed modifications are first tested individually and then in combination. The sum of individual gains is bigger than the observed improvement when all modifications are introduced, but the "deficit" is small suggesting independence of their benefits. We show that the use of 128x128 pixel images is sufficient to make qualitative conclusions about optimal network structure that hold for the full size Caffe and VGG nets. The results are obtained an order of magnitude faster than with the standard 224 pixel images.

LGNov 19, 2015
All you need is a good init

Dmytro Mishkin, Jiri Matas

Layer-sequential unit-variance (LSUV) initialization - a simple method for weight initialization for deep net learning - is proposed. The method consists of the two steps. First, pre-initialize weights of each convolution or inner-product layer with orthonormal matrices. Second, proceed from the first to the final layer, normalizing the variance of the output of each layer to be equal to one. Experiment with different activation functions (maxout, ReLU-family, tanh) show that the proposed initialization leads to learning of very deep nets that (i) produces networks with test accuracy better or equal to standard methods and (ii) is at least as fast as the complex schemes proposed specifically for very deep nets such as FitNets (Romero et al. (2015)) and Highway (Srivastava et al. (2015)). Performance is evaluated on GoogLeNet, CaffeNet, FitNets and Residual nets and the state-of-the-art, or very close to it, is achieved on the MNIST, CIFAR-10/100 and ImageNet datasets.

CVApr 24, 2015
WxBS: Wide Baseline Stereo Generalizations

Dmytro Mishkin, Jiri Matas, Michal Perdoch et al.

We have presented a new problem -- the wide multiple baseline stereo (WxBS) -- which considers matching of images that simultaneously differ in more than one image acquisition factor such as viewpoint, illumination, sensor type or where object appearance changes significantly, e.g. over time. A new dataset with the ground truth for evaluation of matching algorithms has been introduced and will be made public. We have extensively tested a large set of popular and recent detectors and descriptors and show than the combination of RootSIFT and HalfRootSIFT as descriptors with MSER and Hessian-Affine detectors works best for many different nuisance factors. We show that simple adaptive thresholding improves Hessian-Affine, DoG, MSER (and possibly other) detectors and allows to use them on infrared and low contrast images. A novel matching algorithm for addressing the WxBS problem has been introduced. We have shown experimentally that the WxBS-M matcher dominantes the state-of-the-art methods both on both the new and existing datasets.

CVMar 9, 2015
MODS: Fast and Robust Method for Two-View Matching

Dmytro Mishkin, Jiri Matas, Michal Perdoch

A novel algorithm for wide-baseline matching called MODS - Matching On Demand with view Synthesis - is presented. The MODS algorithm is experimentally shown to solve a broader range of wide-baseline problems than the state of the art while being nearly as fast as standard matchers on simple problems. The apparent robustness vs. speed trade-off is finessed by the use of progressively more time-consuming feature detectors and by on-demand generation of synthesized images that is performed until a reliable estimate of geometry is obtained. We introduce an improved method for tentative correspondence selection, applicable both with and without view synthesis. A modification of the standard first to second nearest distance rule increases the number of correct matches by 5-20% at no additional computational cost. Performance of the MODS algorithm is evaluated on several standard publicly available datasets, and on a new set of geometrically challenging wide baseline problems that is made public together with the ground truth. Experiments show that the MODS outperforms the state-of-the-art in robustness and speed. Moreover, MODS performs well on other classes of difficult two-view problems like matching of images from different modalities, with wide temporal baseline or with significant lighting changes.

CVJun 17, 2013
Two-View Matching with View Synthesis Revisited

Dmytro Mishkin, Michal Perdoch, Jiri Matas

Wide-baseline matching focussing on problems with extreme viewpoint change is considered. We introduce the use of view synthesis with affine-covariant detectors to solve such problems and show that matching with the Hessian-Affine or MSER detectors outperforms the state-of-the-art ASIFT. To minimise the loss of speed caused by view synthesis, we propose the Matching On Demand with view Synthesis algorithm (MODS) that uses progressively more synthesized images and more (time-consuming) detectors until reliable estimation of geometry is possible. We show experimentally that the MODS algorithm solves problems beyond the state-of-the-art and yet is comparable in speed to standard wide-baseline matchers on simpler problems. Minor contributions include an improved method for tentative correspondence selection, applicable both with and without view synthesis and a view synthesis setup greatly improving MSER robustness to blur and scale change that increase its running time by 10% only.