Tanaka Kanji

h-index13

21papers

52citations

Novelty50%

AI Score29

Ranked #141,649 of 194,257 authors (top 73%)#46,570 in CV (top 79%)

21 Papers

3.9CVJun 28, 2023

Lifelong Change Detection: Continuous Domain Adaptation for Small Object Change Detection in Every Robot Navigation

Koji Takeda, Kanji Tanaka, Yoshimasa Nakamura

The recently emerging research area in robotics, ground view change detection, suffers from its ill-posed-ness because of visual uncertainty combined with complex nonlinear perspective projection. To regularize the ill-posed-ness, the commonly applied supervised learning methods (e.g., CSCD-Net) rely on manually annotated high-quality object-class-specific priors. In this work, we consider general application domains where no manual annotation is available and present a fully self-supervised approach. The present approach adopts the powerful and versatile idea that object changes detected during everyday robot navigation can be reused as additional priors to improve future change detection tasks. Furthermore, a robustified framework is implemented and verified experimentally in a new challenging practical application scenario: ground-view small object change detection.

4.6LGMar 13, 2024

Training Self-localization Models for Unseen Unfamiliar Places via Teacher-to-Student Data-Free Knowledge Transfer

Kenta Tsukahara, Kanji Tanaka, Daiki Iwata

A typical assumption in state-of-the-art self-localization models is that an annotated training dataset is available in the target workspace. However, this does not always hold when a robot travels in a general open-world. This study introduces a novel training scheme for open-world distributed robot systems. In our scheme, a robot ("student") can ask the other robots it meets at unfamiliar places ("teachers") for guidance. Specifically, a pseudo-training dataset is reconstructed from the teacher model and thereafter used for continual learning of the student model. Unlike typical knowledge transfer schemes, our scheme introduces only minimal assumptions on the teacher model, such that it can handle various types of open-set teachers, including uncooperative, untrainable (e.g., image retrieval engines), and blackbox teachers (i.e., data privacy). Rather than relying on the availability of private data of teachers as in existing methods, we propose to exploit an assumption that holds universally in self-localization tasks: "The teacher model is a self-localization system" and to reuse the self-localization system of a teacher as a sole accessible communication channel. We particularly focus on designing an excellent student/questioner whose interactions with teachers can yield effective question-and-answer sequences that can be used as pseudo-training datasets for the student self-localization model. When applied to a generic recursive knowledge distillation scenario, our approach exhibited stable and consistent performance improvement.

3.2ROMar 17, 2025

Dynamic-Dark SLAM: RGB-Thermal Cooperative Robot Vision Strategy for Multi-Person Tracking in Both Well-Lit and Low-Light Scenes

Tatsuro Sakai, Kanji Tanaka, Yuki Minase et al.

In robot vision, thermal cameras hold great potential for recognizing humans even in complete darkness. However, their application to multi-person tracking (MPT) has been limited due to data scarcity and the inherent difficulty of distinguishing individuals. In this study, we propose a cooperative MPT system that utilizes co-located RGB and thermal cameras, where pseudo-annotations (bounding boxes and person IDs) are used to train both RGB and thermal trackers. Evaluation experiments demonstrate that the thermal tracker performs robustly in both bright and dark environments. Moreover, the results suggest that a tracker-switching strategy -- guided by a binary brightness classifier -- is more effective for information integration than a tracker-fusion approach. As an application example, we present an image change pattern recognition (ICPR) method, the ``human-as-landmark,'' which combines two key properties: the thermal recognizability of humans in dark environments and the rich landmark characteristics -- appearance, geometry, and semantics -- of static objects (occluders). Whereas conventional SLAM focuses on mapping static landmarks in well-lit environments, the present study takes a first step toward a new Human-Only SLAM paradigm, ``Dynamic-Dark SLAM,'' which aims to map even dynamic landmarks in complete darkness. Additionally, this study demonstrates that knowledge transfer between thermal and depth modalities enables reliable person tracking using low-resolution 3D LiDAR data without RGB input, contributing an important advance toward cross-robot SLAM systems.

2.0CVMay 10, 2024

Zero-shot Degree of Ill-posedness Estimation for Active Small Object Change Detection

Koji Takeda, Kanji Tanaka, Yoshimasa Nakamura et al.

In everyday indoor navigation, robots often needto detect non-distinctive small-change objects (e.g., stationery,lost items, and junk, etc.) to maintain domain knowledge. Thisis most relevant to ground-view change detection (GVCD), a recently emerging research area in the field of computer vision.However, these existing techniques rely on high-quality class-specific object priors to regularize a change detector modelthat cannot be applied to semantically nondistinctive smallobjects. To address ill-posedness, in this study, we explorethe concept of degree-of-ill-posedness (DoI) from the newperspective of GVCD, aiming to improve both passive and activevision. This novel DoI problem is highly domain-dependent,and manually collecting fine-grained annotated training datais expensive. To regularize this problem, we apply the conceptof self-supervised learning to achieve efficient DoI estimationscheme and investigate its generalization to diverse datasets.Specifically, we tackle the challenging issue of obtaining self-supervision cues for semantically non-distinctive unseen smallobjects and show that novel "oversegmentation cues" from openvocabulary semantic segmentation can be effectively exploited.When applied to diverse real datasets, the proposed DoI modelcan boost state-of-the-art change detection models, and it showsstable and consistent improvements when evaluated on real-world datasets.

5.3ROSep 6, 2021

Deep SIMBAD: Active Landmark-based Self-localization Using Ranking -based Scene Descriptor

Tanaka Kanji

Landmark-based robot self-localization has recently garnered interest as a highly-compressive domain-invariant approach for performing visual place recognition (VPR) across domains (e.g., time of day, weather, and season). However, landmark-based self-localization can be an ill-posed problem for a passive observer (e.g., manual robot control), as many viewpoints may not provide an effective landmark view. In this study, we consider an active self-localization task by an active observer and present a novel reinforcement learning (RL)-based next-best-view (NBV) planner. Our contributions are as follows. (1) SIMBAD-based VPR: We formulate the problem of landmark-based compact scene description as SIMBAD (similarity-based pattern recognition) and further present its deep learning extension. (2) VPR-to-NBV knowledge transfer: We address the challenge of RL under uncertainty (i.e., active self-localization) by transferring the state recognition ability of VPR to the NBV. (3) NNQL-based NBV: We regard the available VPR as the experience database by adapting nearest-neighbor approximation of Q-learning (NNQL). The result shows an extremely compact data structure that compresses both the VPR and NBV into a single incremental inverted index. Experiments using the public NCLT dataset validated the effectiveness of the proposed approach.

1.4CVFeb 23, 2021

Domain-invariant NBV Planner for Active Cross-domain Self-localization

Kanji Tanaka

Pole-like landmark has received increasing attention as a domain-invariant visual cue for visual robot self-localization across domains (e.g., seasons, times of day, weathers). However, self-localization using pole-like landmarks can be ill-posed for a passive observer, as many viewpoints may not provide any pole-like landmark view. To alleviate this problem, we consider an active observer and explore a novel "domain-invariant" next-best-view (NBV) planner that attains consistent performance over different domains (i.e., maintenance-free), without requiring the expensive task of training data collection and retraining. In our approach, a novel multi-encoder deep convolutional neural network enables to detect domain invariant pole-like landmarks, which are then used as the sole input to a model-free deep reinforcement learning -based domain-invariant NBV planner. Further, we develop a practical system for active self-localization using sparse invariant landmarks and dense discriminative landmarks. In experiments, we demonstrate that the proposed method is effective both in efficient landmark detection and in discriminative self-localization.

0.9CVSep 16, 2019

Fault-Diagnosing SLAM for Varying Scale Change Detection

Sugimoto Takuma, Yamaguchi Kousuke, Tanaka Kanji

In this paper, we present a new fault diagnosis (FD) -based approach for detection of imagery changes that can detect significant changes as inconsistencies between different sub-modules (e.g., self-localizaiton) of visual SLAM. Unlike classical change detection approaches such as pairwise image comparison (PC) and anomaly detection (AD), neither the memorization of each map image nor the maintenance of up-to-date place-specific anomaly detectors are required in this FD approach. A significant challenge that is encountered when incorporating different SLAM sub-modules into FD involves dealing with the varying scales of objects that have changed (e.g., the appearance of small dangerous obstacles on the floor). To address this issue, we reconsider the bag-of-words (BoW) image representation, by exploiting its recent advances in terms of self-localization and change detection. As a key advantage, BoW image representation can be reorganized into any different scaling by simply cropping the original BoW image. Furthermore, we propose to combine different self-localization modules with strong and weak BoW features with different discriminativity, and to treat inconsistency between strong and weak self-localization as an indicator of change. The efficacy of the proposed approach for FD with/without AD and/or PC was experimentally validated.

0.9CVSep 15, 2019

Mining Minimal Map-Segments for Visual Place Classifiers

Tanaka Kanji

In visual place recognition (VPR), map segmentation (MS) is a preprocessing technique used to partition a given view-sequence map into place classes (i.e., map segments) so that each class has good place-specific training images for a visual place classifier (VPC). Existing approaches to MS implicitly/explicitly suppose that map segments have a certain size, or individual map segments are balanced in size. However, recent VPR systems showed that very small important map segments (minimal map segments) often suffice for VPC, and the remaining large unimportant portion of the map should be discarded to minimize map maintenance cost. Here, a new MS algorithm that can mine minimal map segments from a large view-sequence map is presented. To solve the inherently NP hard problem, MS is formulated as a video-segmentation problem and the efficient point-trajectory based paradigm of video segmentation is used. The proposed map representation was implemented with three types of VPC: deep convolutional neural network, bag-of-words, and object class detector, and each was integrated into a Monte Carlo localization algorithm (MCL) within a topometric VPR framework. Experiments using the publicly available NCLT dataset thoroughly investigate the efficacy of MS in terms of VPR performance.

0.9CVApr 7, 2019

Place-specific Background Modeling Using Recursive Autoencoders

Yamaguchi Kousuke, Tanaka Kanji, Sugimoto Takuma et al.

Image change detection (ICD) to detect changed objects in front of a vehicle with respect to a place-specific background model using an on-board monocular vision system is a fundamental problem in intelligent vehicle (IV). From the perspective of recent large-scale IV applications, it can be impractical in terms of space/time efficiency to train place-specific background models for every possible place. To address these issues, we introduce a new autoencoder (AE) based efficient ICD framework that combines the advantages of AE-based anomaly detection (AD) and AE-based image compression (IC). We propose a method that uses AE reconstruction errors as a single unified measure for training a minimal set of place-specific AEs and maintains detection accuracy. We introduce an efficient incremental recursive AE (rAE) training framework that recursively summarizes a large collection of background images into the AE set. The results of experiments on challenging cross-season ICD tasks validate the efficacy of the proposed approach.

0.9CVApr 7, 2019

Scalable Change Retrieval Using Deep 3D Neural Codes

Kojima Yusuke, Tanaka Kanji, Yang Naiming et al.

We present a novel scalable framework for image change detection (ICD) from an on-board 3D imagery system. We argue that existing ICD systems are constrained by the time required to align a given query image with individual reference image coordinates. We utilize an invariant coordinate system (ICS) to replace the time-consuming image alignment with an offline pre-processing procedure. Our key contribution is an extension of the traditional image comparison-based ICD tasks to setups of the image retrieval (IR) task. We replace each component of the 3D ICD system, i.e., (1) image modeling, (2) image alignment, and (3) image differencing, with significantly efficient variants from the bag-of-words (BoW) IR paradigm. Further, we train a deep 3D feature extractor in an unsupervised manner using an unsupervised Siamese network and automatically collected training data. We conducted experiments on a challenging cross-season ICD task using a publicly available dataset and thereby validate the efficacy of the proposed approach.

0.9CVApr 7, 2019

Long-Term Vehicle Localization by Recursive Knowledge Distillation

Hiroki Tomoe, Tanaka Kanji

Most of the current state-of-the-art frameworks for cross-season visual place recognition (CS-VPR) focus on domain adaptation (DA) to a single specific season. From the viewpoint of long-term CS-VPR, such frameworks do not scale well to sequential multiple domains (e.g., spring - summer - autumn - winter - ... ). The goal of this study is to develop a novel long-term ensemble learning (LEL) framework that allows for a constant cost retraining in long-term sequential-multi-domain CS-VPR (SMD-VPR), which only requires the memorization of a small constant number of deep convolutional neural networks (CNNs) and can retrain the CNN ensemble of every season at a small constant time/space cost. We frame our task as the multi-teacher multi-student knowledge distillation (MTMS-KD), which recursively compresses all the previous season's knowledge into a current CNN ensemble. We further address the issue of teacher-student-assignment (TSA) to achieve a good generalization/specialization tradeoff. Experimental results on SMD-VPR tasks validate the efficacy of the proposed approach.

1.8CVJan 22, 2019

Use of First and Third Person Views for Deep Intersection Classification

Koji Takeda, Kanji Tanaka

We explore the problem of intersection classification using monocular on-board passive vision, with the goal of classifying traffic scenes with respect to road topology. We divide the existing approaches into two broad categories according to the type of input data: (a) first person vision (FPV) approaches, which use an egocentric view sequence as the intersection is passed; and (b) third person vision (TPV) approaches, which use a single view immediately before entering the intersection. The FPV and TPV approaches each have advantages and disadvantages. Therefore, we aim to combine them into a unified deep learning framework. Experimental results show that the proposed FPV-TPV scheme outperforms previous methods and only requires minimal FPV/TPV measurements.

6.3ROSep 14, 2018

Detection-by-Localization: Maintenance-Free Change Object Detector

Tanaka Kanji

Recent researches demonstrate that self-localization performance is a very useful measure of likelihood-of-change (LoC) for change detection. In this paper, this "detection-by-localization" scheme is studied in a novel generalized task of object-level change detection. In our framework, a given query image is segmented into object-level subimages (termed "scene parts"), which are then converted to subimage-level pixel-wise LoC maps via the detection-by-localization scheme. Our approach models a self-localization system as a ranking function, outputting a ranked list of reference images, without requiring relevance score. Thanks to this new setting, we can generalize our approach to a broad class of self-localization systems. Our ranking based self-localization model allows to fuse self-localization results from different modalities via an unsupervised rank fusion derived from a field of multi-modal information retrieval (MMR).

1.7CVDec 24, 2017

Use of Generative Adversarial Network for Cross-Domain Change Detection

Yamaguchi Kousuke, Tanaka Kanji, Sugimoto Takuma

This paper addresses the problem of cross-domain change detection from a novel perspective of image-to-image translation. In general, change detection aims to identify interesting changes between a given query image and a reference image of the same scene taken at a different time. This problem becomes a challenging one when query and reference images involve different domains (e.g., time of the day, weather, and season) due to variations in object appearance and a limited amount of training examples. In this study, we address the above issue by leveraging a generative adversarial network (GAN). Our key concept is to use a limited amount of training data to train a GAN-based image translator that maps a reference image to a virtual image that cannot be discriminated from query domain images. This enables us to treat the cross-domain change detection task as an in-domain image comparison. This allows us to leverage the large body of literature on in-domain generic change detectors. In addition, we also consider the use of visual place recognition as a method for mining more appropriate reference images over the space of virtual images. Experiments validate efficacy of the proposed approach.

0.9CVSep 16, 2017

Long-Term Ensemble Learning of Visual Place Classifiers

Xiaoxiao Fei, Kanji Tanaka, Yichu Fang et al.

This paper addresses the problem of cross-season visual place classification (VPC) from a novel perspective of long-term map learning. Our goal is to enable transfer learning efficiently from one season to the next, at a small constant cost, and without wasting the robot's available long-term-memory by memorizing very large amounts of training data. To realize a good tradeoff between generalization and specialization abilities, we employ an ensemble of convolutional neural network (DCN) classifiers and consider the task of scheduling (when and which classifiers to retrain), given a previous season's DCN classifiers as the sole prior knowledge. We present a unified framework for retraining scheduling and discuss practical implementation strategies. Furthermore, we address the task of partitioning a robot's workspace into places to define place classes in an unsupervised manner, rather than using uniform partitioning, so as to maximize VPC performance. Experiments using the publicly available NCLT dataset revealed that retraining scheduling of a DCN classifier ensemble is crucial and performance is significantly increased by using planned scheduling.

0.9CVSep 15, 2017

Zero-Shot Learning to Manage a Large Number of Place-Specific Compressive Change Classifiers

Tanaka Kanji

With recent progress in large-scale map maintenance and long-term map learning, the task of change detection on a large-scale map from a visual image captured by a mobile robot has become a problem of increasing criticality. Previous approaches for change detection are typically based on image differencing and require the memorization of a prohibitively large number of mapped images in the above context. In contrast, this study follows the recent, efficient paradigm of change-classifier-learning and specifically employs a collection of place-specific change classifiers. Our change-classifier-learning algorithm is based on zero-shot learning (ZSL) and represents a place-specific change classifier by its training examples mined from an external knowledge base (EKB). The proposed algorithm exhibits several advantages. First, we are required to memorize only training examples (rather than the classifier itself), which can be further compressed in the form of bag-of-words (BoW). Secondly, we can incorporate the most recent map into the classifiers by straightforwardly adding or deleting a few training examples that correspond to these classifiers. Thirdly, we can share the BoW vocabulary with other related task scenarios (e.g., BoW-based self-localization), wherein the vocabulary is generally designed as a rich, continuously growing, and domain-adaptive knowledge base. In our contribution, the proposed algorithm is applied and evaluated on a practical long-term cross-season change detection system that consists of a large number of place-specific object-level change classifiers.

1.7CVJun 7, 2017

Unsupervised Place Discovery for Place-Specific Change Classifier

Fei Xiaoxiao, Tanaka Kanji

In this study, we address the problem of supervised change detection for robotic map learning applications, in which the aim is to train a place-specific change classifier (e.g., support vector machine (SVM)) to predict changes from a robot's view image. An open question is the manner in which to partition a robot's workspace into places (e.g., SVMs) to maximize the overall performance of change classifiers. This is a chicken-or-egg problem: if we have a well-trained change classifier, partitioning the robot's workspace into places is rather easy. However, training a change classifier requires a set of place-specific training data. In this study, we address this novel problem, which we term unsupervised place discovery. In addition, we present a solution powered by convolutional-feature-based visual place recognition, and validate our approach by applying it to two place-specific change classifiers, namely, nuisance and anomaly predictors.

8.4CVMar 3, 2016

Self-localization from Images with Small Overlap

Tanaka Kanji

With the recent success of visual features from deep convolutional neural networks (DCNN) in visual robot self-localization, it has become important and practical to address more general self-localization scenarios. In this paper, we address the scenario of self-localization from images with small overlap. We explicitly introduce a localization difficulty index as a decreasing function of view overlap between query and relevant database images and investigate performance versus difficulty for challenging cross-view self-localization tasks. We then reformulate the self-localization as a scalable bag-of-visual-features (BoVF) scene retrieval and present an efficient solution called PCA-NBNN, aiming to facilitate fast and yet discriminative correspondence between partially overlapping images. The proposed approach adopts recent findings in discriminativity preserving encoding of DCNN features using principal component analysis (PCA) and cross-domain scene matching using naive Bayes nearest neighbor distance metric (NBNN). We experimentally demonstrate that the proposed PCA-NBNN framework frequently achieves comparable results to previous DCNN features and that the BoVF model is significantly more efficient. We further address an important alternative scenario of "self-localization from images with NO overlap" and report the result.

2.1ROMar 3, 2016

Local Map Descriptor for Compressive Change Retrieval

Tanaka Kanji

Change detection, i.e., anomaly detection from local maps built by a mobile robot at multiple different times, is a challenging problem to solve in practice. Most previous work either cannot be applied to scenarios where the size of the map collection is large, or simply assumed that the robot self-location is globally known. In this paper, we tackle the problem of simultaneous self-localization and change detection, by reformulating the problem as a map retrieval problem, and propose a local map descriptor with a compressed bag-of-words (BoW) structure as a scalable solution. We make the following contributions. (1) To enable a direct comparison of the spatial layout of visual features between different local maps, the origin of the local map coordinate (termed "viewpoint") is planned by scene parsing and determined by our "viewpoint planner" to be invariant against small variations in self-location and changes, aiming at providing similar viewpoints for similar scenes (i.e., the relevant map pair). (2) We extend the BoW model to enable the use of not only the appearance (e.g., polestar) but also the spatial layout (e.g., spatial pyramid) of visual features with respect to the planned viewpoint. The key observation is that the planned viewpoint (i.e., the origin of local map coordinate) acts as a pseudo viewpoint that is usually required by spatial BoW (e.g., SPM) and also by anomaly detection (e.g., NN-d, LOF). (3) Experimental results on a challenging "loop-closing" scenario show that the proposed method outperforms previous BoW methods in self-localization, and furthermore, that the use of both appearance and pose information in change detection produces better results than the use of either information alone.

2.5CVSep 25, 2015

Incremental Loop Closure Verification by Guided Sampling

Kanji Tanaka

Loop closure detection, the task of identifying locations revisited by a robot in a sequence of odometry and perceptual observations, is typically formulated as a combination of two subtasks: (1) bag-of-words image retrieval and (2) post-verification using RANSAC geometric verification. The main contribution of this study is the proposal of a novel post-verification framework that achieves good precision recall trade-off in loop closure detection. This study is motivated by the fact that not all loop closure hypotheses are equally plausible (e.g., owing to mutual consistency between loop closure constraints) and that if we have evidence that one hypothesis is more plausible than the others, then it should be verified more frequently. We demonstrate that the problem of loop closure detection can be viewed as an instance of a multi-model hypothesize-and-verify framework and build guided sampling strategies on the framework where loop closures proposed using image retrieval are verified in a planned order (rather than in a conventional uniform order) to operate in a constant time. Experimental results using a stereo SLAM system confirm that the proposed strategy, the use of loop closure constraints and robot trajectory hypotheses as a guide, achieves promising results despite the fact that there exists a significant number of false positive constraints and hypotheses.

1.3CVMay 13, 2015

Leveraging Image based Prior for Visual Place Recognition

Tsukamoto Taisho, Tanaka Kanji

In this study, we propose a novel scene descriptor for visual place recognition. Unlike popular bag-of-words scene descriptors which rely on a library of vector quantized visual features, our proposed descriptor is based on a library of raw image data, such as publicly available photo collections from Google StreetView and Flickr. The library images need not to be associated with spatial information regarding the viewpoint and orientation of the scene. As a result, these images are cheaper than the database images; in addition, they are readily available. Our proposed descriptor directly mines the image library to discover landmarks (i.e., image patches) that suitably match an input query/database image. The discovered landmarks are then compactly described by their pose and shape (i.e., library image ID, bounding boxes) and used as a compact discriminative scene descriptor for the input image. We evaluate the effectiveness of our scene description framework by comparing its performance to that of previous approaches.