Eric Marchand

CV
h-index13
8papers
237citations
Novelty53%
AI Score33

8 Papers

CVMar 27, 2023
JAWS: Just A Wild Shot for Cinematic Transfer in Neural Radiance Fields

Xi Wang, Robin Courant, Jinglei Shi et al.

This paper presents JAWS, an optimization-driven approach that achieves the robust transfer of visual cinematic features from a reference in-the-wild video clip to a newly generated clip. To this end, we rely on an implicit-neural-representation (INR) in a way to compute a clip that shares the same cinematic features as the reference clip. We propose a general formulation of a camera optimization problem in an INR that computes extrinsic and intrinsic camera parameters as well as timing. By leveraging the differentiability of neural representations, we can back-propagate our designed cinematic losses measured on proxy estimators through a NeRF network to the proposed cinematic parameters directly. We also introduce specific enhancements such as guidance maps to improve the overall quality and efficiency. Results display the capacity of our system to replicate well known camera sequences from movies, adapting the framing, camera parameters and timing of the generated video clip to maximize the similarity with the reference clip.

CVSep 16, 2022
TwistSLAM++: Fusing multiple modalities for accurate dynamic semantic SLAM

Mathieu Gonzalez, Eric Marchand, Amine Kacete et al.

Most classical SLAM systems rely on the static scene assumption, which limits their applicability in real world scenarios. Recent SLAM frameworks have been proposed to simultaneously track the camera and moving objects. However they are often unable to estimate the canonical pose of the objects and exhibit a low object tracking accuracy. To solve this problem we propose TwistSLAM++, a semantic, dynamic, SLAM system that fuses stereo images and LiDAR information. Using semantic information, we track potentially moving objects and associate them to 3D object detections in LiDAR scans to obtain their pose and size. Then, we perform registration on consecutive object scans to refine object pose estimation. Finally, object scans are used to estimate the shape of the object and constrain map points to lie on the estimated surface within the BA. We show on classical benchmarks that this fusion approach based on multimodal information improves the accuracy of object tracking.

CVDec 20, 2024
Toward Robust Neural Reconstruction from Sparse Point Sets

Amine Ouasfi, Shubhendu Jena, Eric Marchand et al.

We consider the challenging problem of learning Signed Distance Functions (SDF) from sparse and noisy 3D point clouds. In contrast to recent methods that depend on smoothness priors, our method, rooted in a distributionally robust optimization (DRO) framework, incorporates a regularization term that leverages samples from the uncertainty regions of the model to improve the learned SDFs. Thanks to tractable dual formulations, we show that this framework enables a stable and efficient optimization of SDFs in the absence of ground truth supervision. Using a variety of synthetic and real data evaluations from different modalities, we show that our DRO based learning framework can improve SDF learning with respect to baselines and the state-of-the-art methods.

ROFeb 24, 2022
TwistSLAM: Constrained SLAM in Dynamic Environment

Mathieu Gonzalez, Eric Marchand, Amine Kacete et al.

Classical visual simultaneous localization and mapping (SLAM) algorithms usually assume the environment to be rigid. This assumption limits the applicability of those algorithms as they are unable to accurately estimate the camera poses and world structure in real life scenes containing moving objects (e.g. cars, bikes, pedestrians, etc.). To tackle this issue, we propose TwistSLAM: a semantic, dynamic and stereo SLAM system that can track dynamic objects in the environment. Our algorithm creates clusters of points according to their semantic class. Thanks to the definition of inter-cluster constraints modeled by mechanical joints (function of the semantic class), a novel constrained bundle adjustment is then able to jointly estimate both poses and velocities of moving objects along with the classical world structure and camera trajectory. We evaluate our approach on several sequences from the public KITTI dataset and demonstrate quantitatively that it improves camera and object tracking compared to state-of-the-art approaches.

ROSep 15, 2021
S3LAM: Structured Scene SLAM

Mathieu Gonzalez, Eric Marchand, Amine Kacete et al.

We propose a new SLAM system that uses the semantic segmentation of objects and structures in the scene. Semantic information is relevant as it contains high level information which may make SLAM more accurate and robust. Our contribution is twofold: i) A new SLAM system based on ORB-SLAM2 that creates a semantic map made of clusters of points corresponding to objects instances and structures in the scene. ii) A modification of the classical Bundle Adjustment formulation to constrain each cluster using geometrical priors, which improves both camera localization and reconstruction and enables a better understanding of the scene. We evaluate our approach on sequences from several public datasets and show that it improves camera pose estimation with respect to state of the art.

CVMar 24, 2021
Tracking Pedestrian Heads in Dense Crowd

Ramana Sundararaman, Cedric De Almeida Braga, Eric Marchand et al.

Tracking humans in crowded video sequences is an important constituent of visual scene understanding. Increasing crowd density challenges visibility of humans, limiting the scalability of existing pedestrian trackers to higher crowd densities. For that reason, we propose to revitalize head tracking with Crowd of Heads Dataset (CroHD), consisting of 9 sequences of 11,463 frames with over 2,276,838 heads and 5,230 tracks annotated in diverse scenes. For evaluation, we proposed a new metric, IDEucl, to measure an algorithm's efficacy in preserving a unique identity for the longest stretch in image coordinate space, thus building a correspondence between pedestrian crowd motion and the performance of a tracking algorithm. Moreover, we also propose a new head detector, HeadHunter, which is designed for small head detection in crowded scenes. We extend HeadHunter with a Particle Filter and a color histogram based re-identification module for head tracking. To establish this as a strong baseline, we compare our tracker with existing state-of-the-art pedestrian trackers on CroHD and demonstrate superiority, especially in identity preserving tracking metrics. With a light-weight head detector and a tracker which is efficient at identity preservation, we believe our contributions will serve useful in advancement of pedestrian tracking in dense crowds.

CVFeb 3, 2020
L6DNet: Light 6 DoF Network for Robust and Precise Object Pose Estimation with Small Datasets

Mathieu Gonzalez, Amine Kacete, Albert Murienne et al.

Estimating the 3D pose of an object is a challenging task that can be considered within augmented reality or robotic applications. In this paper, we propose a novel approach to perform 6 DoF object pose estimation from a single RGB-D image. We adopt a hybrid pipeline in two stages: data-driven and geometric respectively. The data-driven step consists of a classification CNN to estimate the object 2D location in the image from local patches, followed by a regression CNN trained to predict the 3D location of a set of keypoints in the camera coordinate system. To extract the pose information, the geometric step consists in aligning the 3D points in the camera coordinate system with the corresponding 3D points in world coordinate system by minimizing a registration error, thus computing the pose. Our experiments on the standard dataset LineMod show that our approach is more robust and accurate than state-of-the-art methods. The approach is also validated to achieve a 6 DoF positioning task by visual servoing.

ROMay 24, 2017
Visual Servoing from Deep Neural Networks

Quentin Bateux, Eric Marchand, Jürgen Leitner et al.

We present a deep neural network-based method to perform high-precision, robust and real-time 6 DOF visual servoing. The paper describes how to create a dataset simulating various perturbations (occlusions and lighting conditions) from a single real-world image of the scene. A convolutional neural network is fine-tuned using this dataset to estimate the relative pose between two images of the same scene. The output of the network is then employed in a visual servoing control scheme. The method converges robustly even in difficult real-world settings with strong lighting variations and occlusions.A positioning error of less than one millimeter is obtained in experiments with a 6 DOF robot.