B. Caputo

2papers

2 Papers

CVJun 7, 2021
Self-Supervision & Meta-Learning for One-Shot Unsupervised Cross-Domain Detection

F. Cappio Borlino, S. Polizzotto, B. Caputo et al.

Deep detection approaches are powerful in controlled conditions, but appear brittle and fail when source models are used off-the-shelf on unseen domains. Most of the existing works on domain adaptation simplify the setting and access jointly both a large source dataset and a sizable amount of target samples. However this scenario is unrealistic in many practical cases as when monitoring image feeds from social media: only a pretrained source model is available and every target image uploaded by the users belongs to a different domain not foreseen during training. We address this challenging setting by presenting an object detection algorithm able to exploit a pre-trained source model and perform unsupervised adaptation by using only one target sample seen at test time. Our multi-task architecture includes a self-supervised branch that we exploit to meta-train the whole model with single-sample cross-domain episodes, and prepare to the test condition. At deployment time the self-supervised task is iteratively solved on any incoming sample to one-shot adapt on it. We introduce a new dataset of social media image feeds and present a thorough benchmark with the most recent cross-domain detection methods showing the advantages of our approach.

CVMar 31, 2017
(DE)^2 CO: Deep Depth Colorization

F. M. Carlucci, P. Russo, B. Caputo

The ability to classify objects is fundamental for robots. Besides knowledge about their visual appearance, captured by the RGB channel, robots heavily need also depth information to make sense of the world. While the use of deep networks on RGB robot images has benefited from the plethora of results obtained on databases like ImageNet, using convnets on depth images requires mapping them into three dimensional channels. This transfer learning procedure makes them processable by pre-trained deep architectures. Current mappings are based on heuristic assumptions over preprocessing steps and on what depth properties should be most preserved, resulting often in cumbersome data visualizations, and in sub-optimal performance in terms of generality and recognition results. Here we take an alternative route and we attempt instead to learn an optimal colorization mapping for any given pre-trained architecture, using as training data a reference RGB-D database. We propose a deep network architecture, exploiting the residual paradigm, that learns how to map depth data to three channel images. A qualitative analysis of the images obtained with this approach clearly indicates that learning the optimal mapping preserves the richness of depth information better than current hand-crafted approaches. Experiments on the Washington, JHUIT-50 and BigBIRD public benchmark databases, using CaffeNet, VGG16, GoogleNet, and ResNet50 clearly showcase the power of our approach, with gains in performance of up to 16% compared to state of the art competitors on the depth channel only, leading to top performances when dealing with RGB-D data