CVOct 13, 2023Code
TIDE: Temporally Incremental Disparity Estimation via Pattern Flow in Structured Light SystemRukun Qiao, Hiroshi Kawasaki, Hongbin Zha
We introduced Temporally Incremental Disparity Estimation Network (TIDE-Net), a learning-based technique for disparity computation in mono-camera structured light systems. In our hardware setting, a static pattern is projected onto a dynamic scene and captured by a monocular camera. Different from most former disparity estimation methods that operate in a frame-wise manner, our network acquires disparity maps in a temporally incremental way. Specifically, We exploit the deformation of projected patterns (named pattern flow ) on captured image sequences, to model the temporal information. Notably, this newly proposed pattern flow formulation reflects the disparity changes along the epipolar line, which is a special form of optical flow. Tailored for pattern flow, the TIDE-Net, a recurrent architecture, is proposed and implemented. For each incoming frame, our model fuses correlation volumes (from current frame) and disparity (from former frame) warped by pattern flow. From fused features, the final stage of TIDE-Net estimates the residual disparity rather than the full disparity, as conducted by many previous methods. Interestingly, this design brings clear empirical advantages in terms of efficiency and generalization ability. Using only synthetic data for training, our extensitve evaluation results (w.r.t. both accuracy and efficienty metrics) show superior performance than several SOTA models on unseen real data. The code is available on https://github.com/CodePointer/TIDENet.
CVSep 7, 2023Code
Underwater Image Enhancement by Transformer-based Diffusion Model with Non-uniform Sampling for Skip StrategyYi Tang, Takafumi Iwaguchi, Hiroshi Kawasaki
In this paper, we present an approach to image enhancement with diffusion model in underwater scenes. Our method adapts conditional denoising diffusion probabilistic models to generate the corresponding enhanced images by using the underwater images and the Gaussian noise as the inputs. Additionally, in order to improve the efficiency of the reverse process in the diffusion model, we adopt two different ways. We firstly propose a lightweight transformer-based denoising network, which can effectively promote the time of network forward per iteration. On the other hand, we introduce a skip sampling strategy to reduce the number of iterations. Besides, based on the skip sampling strategy, we propose two different non-uniform sampling methods for the sequence of the time step, namely piecewise sampling and searching with the evolutionary algorithm. Both of them are effective and can further improve performance by using the same steps against the previous uniform sampling. In the end, we conduct a relative evaluation of the widely used underwater enhancement datasets between the recent state-of-the-art methods and the proposed approach. The experimental results prove that our approach can achieve both competitive performance and high efficiency. Our code is available at \href{mailto:https://github.com/piggy2009/DM_underwater}{\color{blue}{https://github.com/piggy2009/DM\_underwater}}.
CVOct 5, 2022
MOTSLAM: MOT-assisted monocular dynamic SLAM using single-view depth estimationHanwei Zhang, Hideaki Uchiyama, Shintaro Ono et al.
Visual SLAM systems targeting static scenes have been developed with satisfactory accuracy and robustness. Dynamic 3D object tracking has then become a significant capability in visual SLAM with the requirement of understanding dynamic surroundings in various scenarios including autonomous driving, augmented and virtual reality. However, performing dynamic SLAM solely with monocular images remains a challenging problem due to the difficulty of associating dynamic features and estimating their positions. In this paper, we present MOTSLAM, a dynamic visual SLAM system with the monocular configuration that tracks both poses and bounding boxes of dynamic objects. MOTSLAM first performs multiple object tracking (MOT) with associated both 2D and 3D bounding box detection to create initial 3D objects. Then, neural-network-based monocular depth estimation is applied to fetch the depth of dynamic features. Finally, camera poses, object poses, and both static, as well as dynamic map points, are jointly optimized using a novel bundle adjustment. Our experiments on the KITTI dataset demonstrate that our system has reached best performance on both camera ego-motion and object tracking on monocular dynamic SLAM.
CVSep 26, 2023
Generalization of pixel-wise phase estimation by CNN and improvement of phase-unwrapping by MRF optimization for one-shot 3D scanHiroto Harada, Michihiro Mikamo, Ryo Furukawa et al.
Active stereo technique using single pattern projection, a.k.a. one-shot 3D scan, have drawn a wide attention from industry, medical purposes, etc. One severe drawback of one-shot 3D scan is sparse reconstruction. In addition, since spatial pattern becomes complicated for the purpose of efficient embedding, it is easily affected by noise, which results in unstable decoding. To solve the problems, we propose a pixel-wise interpolation technique for one-shot scan, which is applicable to any types of static pattern if the pattern is regular and periodic. This is achieved by U-net which is pre-trained by CG with efficient data augmentation algorithm. In the paper, to further overcome the decoding instability, we propose a robust correspondence finding algorithm based on Markov random field (MRF) optimization. We also propose a shape refinement algorithm based on b-spline and Gaussian kernel interpolation using explicitly detected laser curves. Experiments are conducted to show the effectiveness of the proposed method using real data with strong noises and textures.
CVOct 13, 2023
Online Adaptive Disparity Estimation for Dynamic Scenes in Structured Light SystemsRukun Qiao, Hiroshi Kawasaki, Hongbin Zha
In recent years, deep neural networks have shown remarkable progress in dense disparity estimation from dynamic scenes in monocular structured light systems. However, their performance significantly drops when applied in unseen environments. To address this issue, self-supervised online adaptation has been proposed as a solution to bridge this performance gap. Unlike traditional fine-tuning processes, online adaptation performs test-time optimization to adapt networks to new domains. Therefore, achieving fast convergence during the adaptation process is critical for attaining satisfactory accuracy. In this paper, we propose an unsupervised loss function based on long sequential inputs. It ensures better gradient directions and faster convergence. Our loss function is designed using a multi-frame pattern flow, which comprises a set of sparse trajectories of the projected pattern along the sequence. We estimate the sparse pseudo ground truth with a confidence mask using a filter-based method, which guides the online adaptation process. Our proposed framework significantly improves the online adaptation speed and achieves superior performance on unseen data.
CVOct 20, 2024
ActiveNeuS: Neural Signed Distance Fields for Active StereoKazuto Ichimaru, Takaki Ikeda, Diego Thomas et al.
3D-shape reconstruction in extreme environments, such as low illumination or scattering condition, has been an open problem and intensively researched. Active stereo is one of potential solution for such environments for its robustness and high accuracy. However, active stereo systems usually consist of specialized system configurations with complicated algorithms, which narrow their application. In this paper, we propose Neural Signed Distance Field for active stereo systems to enable implicit correspondence search and triangulation in generalized Structured Light. With our technique, textureless or equivalent surfaces by low light condition are successfully reconstructed even with a small number of captured images. Experiments were conducted to confirm that the proposed method could achieve state-of-the-art reconstruction quality under such severe condition. We also demonstrated that the proposed method worked in an underwater scenario.
CVMay 20, 2024
Depth Reconstruction with Neural Signed Distance Fields in Structured Light SystemsRukun Qiao, Hiroshi Kawasaki, Hongbin Zha
We introduce a novel depth estimation technique for multi-frame structured light setups using neural implicit representations of 3D space. Our approach employs a neural signed distance field (SDF), trained through self-supervised differentiable rendering. Unlike passive vision, where joint estimation of radiance and geometry fields is necessary, we capitalize on known radiance fields from projected patterns in structured light systems. This enables isolated optimization of the geometry field, ensuring convergence and network efficacy with fixed device positioning. To enhance geometric fidelity, we incorporate an additional color loss based on object surfaces during training. Real-world experiments demonstrate our method's superiority in geometric performance for few-shot scenarios, while achieving comparable results with increased pattern availability.
CVOct 20, 2024
Neural Active Structure-from-Motion in Dark and Textureless EnvironmentKazuto Ichimaru, Diego Thomas, Takafumi Iwaguchi et al.
Active 3D measurement, especially structured light (SL) has been widely used in various fields for its robustness against textureless or equivalent surfaces by low light illumination. In addition, reconstruction of large scenes by moving the SL system has become popular, however, there have been few practical techniques to obtain the system's precise pose information only from images, since most conventional techniques are based on image features, which cannot be retrieved under textureless environments. In this paper, we propose a simultaneous shape reconstruction and pose estimation technique for SL systems from an image set where sparsely projected patterns onto the scene are observed (i.e. no scene texture information), which we call Active SfM. To achieve this, we propose a full optimization framework of the volumetric shape that employs neural signed distance fields (Neural-SDF) for SL with the goal of not only reconstructing the scene shape but also estimating the poses for each motion of the system. Experimental results show that the proposed method is able to achieve accurate shape reconstruction as well as pose estimation from images where only projected patterns are observed.
CVJun 2, 2025
Neural shape reconstruction from multiple views with static pattern projectionRyo Furukawa, Kota Nishihara, Hiroshi Kawasaki
Active-stereo-based 3D shape measurement is crucial for various purposes, such as industrial inspection, reverse engineering, and medical systems, due to its strong ability to accurately acquire the shape of textureless objects. Active stereo systems typically consist of a camera and a pattern projector, tightly fixed to each other, and precise calibration between a camera and a projector is required, which in turn decreases the usability of the system. If a camera and a projector can be freely moved during shape scanning process, it will drastically increase the convenience of the usability of the system. To realize it, we propose a technique to recover the shape of the target object by capturing multiple images while both the camera and the projector are in motion, and their relative poses are auto-calibrated by our neural signed-distance-field (NeuralSDF) using novel volumetric differential rendering technique. In the experiment, the proposed method is evaluated by performing 3D reconstruction using both synthetic and real images.
CVSep 22, 2021
A Method For Adding Motion-Blur on Arbitrary Objects By using Auto-Segmentation and Color Compensation TechniquesMichihiro Mikamo, Ryo Furukawa, Hiroshi Kawasaki
When dynamic objects are captured by a camera, motion blur inevitably occurs. Such a blur is sometimes considered as just a noise, however, it sometimes gives an important effect to add dynamism in the scene for photographs or videos. Unlike the similar effects, such as defocus blur, which is now easily controlled even by smartphones, motion blur is still uncontrollable and makes undesired effects on photographs. In this paper, an unified framework to add motion blur on per-object basis is proposed. In the method, multiple frames are captured without motion blur and they are accumulated to create motion blur on target objects. To capture images without motion blur, shutter speed must be short, however, it makes captured images dark, and thus, a sensor gain should be increased to compensate it. Since a sensor gain causes a severe noise on image, we propose a color compensation algorithm based on non-linear filtering technique for solution. Another contribution is that our technique can be used to make HDR images for fast moving objects by using multi-exposure images. In the experiments, effectiveness of the method is confirmed by ablation study using several data sets.
CVAug 6, 2021
High-frequency shape recovery from shading by CNN and domain adaptationKodai Tokieda, Takafumi Iwaguchi, Hiroshi Kawasaki
Importance of structured-light based one-shot scanning technique is increasing because of its simple system configuration and ability of capturing moving objects. One severe limitation of the technique is that it can capture only sparse shape, but not high frequency shapes, because certain area of projection pattern is required to encode spatial information. In this paper, we propose a technique to recover high-frequency shapes by using shading information, which is captured by one-shot RGB-D sensor based on structured light with single camera. Since color image comprises shading information of object surface, high-frequency shapes can be recovered by shape from shading techniques. Although multiple images with different lighting positions are required for shape from shading techniques, we propose a learning based approach to recover shape from a single image. In addition, to overcome the problem of preparing sufficient amount of data for training, we propose a new data augmentation method for high-frequency shapes using synthetic data and domain adaptation. Experimental results are shown to confirm the effectiveness of the proposed method.
CVJul 7, 2021
PoseRN: A 2D pose refinement network for bias-free multi-view 3D human pose estimationAkihiko Sayo, Diego Thomas, Hiroshi Kawasaki et al.
We propose a new 2D pose refinement network that learns to predict the human bias in the estimated 2D pose. There are biases in 2D pose estimations that are due to differences between annotations of 2D joint locations based on annotators' perception and those defined by motion capture (MoCap) systems. These biases are crafted into publicly available 2D pose datasets and cannot be removed with existing error reduction approaches. Our proposed pose refinement network allows us to efficiently remove the human bias in the estimated 2D poses and achieve highly accurate multi-view 3D human pose estimation.
IVOct 31, 2020
Dense Pixel-wise Micro-motion Estimation of Object Surface by using Low Dimensional Embedding of Laser Speckle PatternRyusuke Sagawa, Yusuke Higuchi, Hiroshi Kawasaki et al.
This paper proposes a method of estimating micro-motion of an object at each pixel that is too small to detect under a common setup of camera and illumination. The method introduces an active-lighting approach to make the motion visually detectable. The approach is based on speckle pattern, which is produced by the mutual interference of laser light on object's surface and continuously changes its appearance according to the out-of-plane motion of the surface. In addition, speckle pattern becomes uncorrelated with large motion. To compensate such micro- and large motion, the method estimates the motion parameters up to scale at each pixel by nonlinear embedding of the speckle pattern into low-dimensional space. The out-of-plane motion is calculated by making the motion parameters spatially consistent across the image. In the experiments, the proposed method is compared with other measuring devices to prove the effectiveness of the method.
CVSep 9, 2019
Unified Underwater Structure-from-MotionKazuto Ichimaru, Yuichi Taguchi, Hiroshi Kawasaki
This paper shows that accurate underwater 3D shape reconstruction is possible using a single camera, observing a target through a refractive interface. We provide unified reconstruction techniques for a variety of scenarios such as single static camera and moving refractive interface, single moving camera and static refractive interface, and single moving camera and moving refractive interface. In our basic setup, we assume that the refractive interface is planar, and simultaneously estimate the unknown transformations of the planar interface and the camera, and the unknown target shape using bundle adjustment. We also extend it to relax the planarity assumption, which enables us to use waves of the refractive interface for the reconstruction task. Experiments with real data show the superiority of our method to existing methods.
IVMay 23, 2019
Underwater Stereo using Refraction-free Image Synthesized from Light Field CameraKazuto Ichimaru, Hiroshi Kawasaki
There is a strong demand on capturing underwater scenes without distortions caused by refraction. Since a light field camera can capture several light rays at each point of an image plane from various directions, if geometrically correct rays are chosen, it is possible to synthesize a refraction-free image. In this paper, we propose a novel technique to efficiently select such rays to synthesize a refraction-free image from an underwater image captured by a light field camera. In addition, we propose a stereo technique to reconstruct 3D shapes using a pair of our refraction-free images, which are central projection. In the experiment, we captured several underwater scenes by two light field cameras, synthesized refraction free images and applied stereo technique to reconstruct 3D shapes. The results are compared with previous techniques which are based on approximation, showing the strength of our method.
CVNov 21, 2018
CNN based dense underwater 3D scene reconstruction by transfer learning using bubble databaseKazuto Ichimaru, Ryo Furukawa, Hiroshi Kawasaki
Dense 3D shape acquisition of swimming human or live fish is an important research topic for sports, biological science and so on. For this purpose, active stereo sensor is usually used in the air, however it cannot be applied to the underwater environment because of refraction, strong light attenuation and severe interference of bubbles. Passive stereo is a simple solution for capturing dynamic scenes at underwater environment, however the shape with textureless surfaces or irregular reflections cannot be recovered. Recently, the stereo camera pair with a pattern projector for adding artificial textures on the objects is proposed. However, to use the system for underwater environment, several problems should be compensated, i.e., disturbance by fluctuation and bubbles. Simple solution is to use convolutional neural network for stereo to cancel the effects of bubbles and/or water fluctuation. Since it is not easy to train CNN with small size of database with large variation, we develop a special bubble generation device to efficiently create real bubble database of multiple size and density. In addition, we propose a transfer learning technique for multi-scale CNN to effectively remove bubbles and projected-patterns on the object. Further, we develop a real system and actually captured live swimming human, which has not been done before. Experiments are conducted to show the effectiveness of our method compared with the state of the art techniques.
CVAug 25, 2018
Multi-scale CNN stereo and pattern removal technique for underwater active stereo systemKazuto Ichimaru, Ryo Furukawa, Hiroshi Kawasaki
Demands on capturing dynamic scenes of underwater environments are rapidly growing. Passive stereo is applicable to capture dynamic scenes, however the shape with textureless surfaces or irregular reflections cannot be recovered by the technique. In our system, we add a pattern projector to the stereo camera pair so that artificial textures are augmented on the objects. To use the system at underwater environments, several problems should be compensated, i.e., refraction, disturbance by fluctuation and bubbles. Further, since surface of the objects are interfered by the bubbles, projected patterns, etc., those noises and patterns should be removed from captured images to recover original texture. To solve these problems, we propose three approaches; a depth-dependent calibration, Convolutional Neural Network(CNN)-stereo method and CNN-based texture recovery method. A depth-dependent calibration is our analysis to find the acceptable depth range for approximation by center projection to find the certain target depth for calibration. In terms of CNN stereo, unlike common CNNbased stereo methods which do not consider strong disturbances like refraction or bubbles, we designed a novel CNN architecture for stereo matching using multi-scale information, which is intended to be robust against such disturbances. Finally, we propose a multi-scale method for bubble and a projected-pattern removal method using CNNs to recover original textures. Experimental results are shown to prove the effectiveness of our method compared with the state of the art techniques. Furthermore, reconstruction of a live swimming fish is demonstrated to confirm the feasibility of our techniques.
CVJul 7, 2018
Representing a Partially Observed Non-Rigid 3D Human Using Eigen-Texture and Eigen-DeformationRyosuke Kimura, Akihiko Sayo, Fabian Lorenzo Dayrit et al.
Reconstruction of the shape and motion of humans from RGB-D is a challenging problem, receiving much attention in recent years. Recent approaches for full-body reconstruction use a statistic shape model, which is built upon accurate full-body scans of people in skin-tight clothes, to complete invisible parts due to occlusion. Such a statistic model may still be fit to an RGB-D measurement with loose clothes but cannot describe its deformations, such as clothing wrinkles. Observed surfaces may be reconstructed precisely from actual measurements, while we have no cues for unobserved surfaces. For full-body reconstruction with loose clothes, we propose to use lower dimensional embeddings of texture and deformation referred to as eigen-texturing and eigen-deformation, to reproduce views of even unobserved surfaces. Provided a full-body reconstruction from a sequence of partial measurements as 3D meshes, the texture and deformation of each triangle are then embedded using eigen-decomposition. Combined with neural-network-based coefficient regression, our method synthesizes the texture and deformation from arbitrary viewpoints. We evaluate our method using simulated data and visually demonstrate how our method works on real data.
CVOct 2, 2017
Temporal shape super-resolution by intra-frame motion encoding using high-fps structured lightYuki Shiba, Satoshi Ono, Ryo Furukawa et al.
One of the solutions of depth imaging of moving scene is to project a static pattern on the object and use just a single image for reconstruction. However, if the motion of the object is too fast with respect to the exposure time of the image sensor, patterns on the captured image are blurred and reconstruction fails. In this paper, we impose multiple projection patterns into each single captured image to realize temporal super resolution of the depth image sequences. With our method, multiple patterns are projected onto the object with higher fps than possible with a camera. In this case, the observed pattern varies depending on the depth and motion of the object, so we can extract temporal information of the scene from each single image. The decoding process is realized using a learning-based approach where no geometric calibration is needed. Experiments confirm the effectiveness of our method where sequential shapes are reconstructed from a single image. Both quantitative evaluations and comparisons with recent techniques were also conducted.
CVOct 2, 2017
Depth estimation using structured light flow -- analysis of projected pattern flow on an object's surface --Ryo Furukawa, Ryusuke Sagawa, Hiroshi Kawasaki
Shape reconstruction techniques using structured light have been widely researched and developed due to their robustness, high precision, and density. Because the techniques are based on decoding a pattern to find correspondences, it implicitly requires that the projected patterns be clearly captured by an image sensor, i.e., to avoid defocus and motion blur of the projected pattern. Although intensive researches have been conducted for solving defocus blur, few researches for motion blur and only solution is to capture with extremely fast shutter speed. In this paper, unlike the previous approaches, we actively utilize motion blur, which we refer to as a light flow, to estimate depth. Analysis reveals that minimum two light flows, which are retrieved from two projected patterns on the object, are required for depth estimation. To retrieve two light flows at the same time, two sets of parallel line patterns are illuminated from two video projectors and the size of motion blur of each line is precisely measured. By analyzing the light flows, i.e. lengths of the blurs, scene depth information is estimated. In the experiments, 3D shapes of fast moving objects, which are inevitably captured with motion blur, are successfully reconstructed by our technique.
CVSep 10, 2016
Simultaneous independent image display technique on multiple 3D objectsTakuto Hirukawa, Marco Visentini-Scarzanella, Hiroshi Kawasaki et al.
We propose a new system to visualize depth-dependent patterns and images on solid objects with complex geometry using multiple projectors. The system, despite consisting of conventional passive LCD projectors, is able to project different images and patterns depending on the spatial location of the object. The technique is based on the simple principle that multiple patterns projected from multiple projectors interfere constructively with each other when their patterns are projected on the same object. Previous techniques based on the same principle can only achieve 1) low resolution volume colorization or 2) high resolution images but only on a limited number of flat planes. In this paper, we discretize a 3D object into a number of 3D points so that high resolution images can be projected onto the complex shapes. We also propose a dynamic ranges expansion technique as well as an efficient optimization procedure based on epipolar constraints. Such technique can be used to the extend projection mapping to have spatial dependency, which is desirable for practical applications. We also demonstrate the system potential as a visual instructor for object placement and assembling. Experiments prove the effectiveness of our method.