Edmond Boyer

CV
h-index21
27papers
1,097citations
Novelty49%
AI Score48

27 Papers

CVAug 30, 2023Code
MedShapeNet -- A Large-Scale Dataset of 3D Medical Shapes for Computer Vision

Jianning Li, Zongwei Zhou, Jiancheng Yang et al.

Prior to the deep learning era, shape was commonly used to describe the objects. Nowadays, state-of-the-art (SOTA) algorithms in medical imaging are predominantly diverging from computer vision, where voxel grids, meshes, point clouds, and implicit surface models are used. This is seen from numerous shape-related publications in premier vision conferences as well as the growing popularity of ShapeNet (about 51,300 models) and Princeton ModelNet (127,915 models). For the medical domain, we present a large collection of anatomical shapes (e.g., bones, organs, vessels) and 3D models of surgical instrument, called MedShapeNet, created to facilitate the translation of data-driven vision algorithms to medical applications and to adapt SOTA vision algorithms to medical problems. As a unique feature, we directly model the majority of shapes on the imaging data of real patients. As of today, MedShapeNet includes 23 dataset with more than 100,000 shapes that are paired with annotations (ground truth). Our data is freely accessible via a web interface and a Python application programming interface (API) and can be used for discriminative, reconstructive, and variational benchmarks as well as various applications in virtual, augmented, or mixed reality, and 3D printing. Exemplary, we present use cases in the fields of classification of brain tumors, facial and skull reconstructions, multi-class anatomy completion, education, and 3D printing. In future, we will extend the data and improve the interfaces. The project pages are: https://medshapenet.ikim.nrw/ and https://github.com/Jianningli/medshapenet-feedback

CVAug 31, 2022
Multi-View Reconstruction using Signed Ray Distance Functions (SRDF)

Pierre Zins, Yuanlu Xu, Edmond Boyer et al. · meta-ai

In this paper, we investigate a new optimization framework for multi-view 3D shape reconstructions. Recent differentiable rendering approaches have provided breakthrough performances with implicit shape representations though they can still lack precision in the estimated geometries. On the other hand multi-view stereo methods can yield pixel wise geometric accuracy with local depth predictions along viewing rays. Our approach bridges the gap between the two strategies with a novel volumetric shape representation that is implicit but parameterized with pixel depths to better materialize the shape surface with consistent signed distances along viewing rays. The approach retains pixel-accuracy while benefiting from volumetric integration in the optimization. To this aim, depths are optimized by evaluating, at each 3D location within the volumetric discretization, the agreement between the depth prediction consistency and the photometric consistency for the corresponding pixels. The optimization is agnostic to the associated photo-consistency term which can vary from a median-based baseline to more elaborate criteria learned functions. Our experiments demonstrate the benefit of the volumetric integration with depth predictions. They also show that our approach outperforms existing approaches over standard 3D benchmarks with better geometry estimations.

CVJun 12, 2023
4DHumanOutfit: a multi-subject 4D dataset of human motion sequences in varying outfits exhibiting large displacements

Matthieu Armando, Laurence Boissieux, Edmond Boyer et al.

This work presents 4DHumanOutfit, a new dataset of densely sampled spatio-temporal 4D human motion data of different actors, outfits and motions. The dataset is designed to contain different actors wearing different outfits while performing different motions in each outfit. In this way, the dataset can be seen as a cube of data containing 4D motion sequences along 3 axes with identity, outfit and motion. This rich dataset has numerous potential applications for the processing and creation of digital humans, e.g. augmented reality, avatar creation and virtual try on. 4DHumanOutfit is released for research purposes at https://kinovis.inria.fr/4dhumanoutfit/. In addition to image data and 4D reconstructions, the dataset includes reference solutions for each axis. We present independent baselines along each axis that demonstrate the value of these reference solutions for evaluation tasks.

CVOct 12, 2022
Teeth3DS+: An Extended Benchmark for Intraoral 3D Scans Analysis

Achraf Ben-Hamadou, Nour Neifar, Ahmed Rekik et al.

Intraoral 3D scanning is now widely adopted in modern dentistry and plays a central role in supporting key tasks such as tooth segmentation, detection, labeling, and dental landmark identification. Accurate analysis of these scans is essential for orthodontic and restorative treatment planning, as it enables automated workflows and minimizes the need for manual intervention. However, the development of robust learning-based solutions remains challenging due to the limited availability of high-quality public datasets and standardized benchmarks. This article presents Teeth3DS+, an extended public benchmark dedicated to intraoral 3D scan analysis. Developed in the context of the MICCAI 3DTeethSeg and 3DTeethLand challenges, Teeth3DS+ supports multiple fundamental tasks, including tooth detection, segmentation, labeling, 3D modeling, and dental landmark identification. The dataset consists of rigorously curated intraoral scans acquired using state-of-the-art scanners and validated by experienced orthodontists and dental surgeons. In addition to the data, Teeth3DS+ provides standardized data splits and evaluation protocols to enable fair and reproducible comparison of methods, with the goal of fostering progress in learning-based analysis of 3D dental scans. Detailed instructions for accessing the dataset are available at https://crns-smartvision.github.io/teeth3ds

CVNov 27, 2023
Deformation-Guided Unsupervised Non-Rigid Shape Matching

Aymen Merrouche, Joao Regateiro, Stefanie Wuhrer et al.

We present an unsupervised data-driven approach for non-rigid shape matching. Shape matching identifies correspondences between two shapes and is a fundamental step in many computer vision and graphics applications. Our approach is designed to be particularly robust when matching shapes digitized using 3D scanners that contain fine geometric detail and suffer from different types of noise including topological noise caused by the coalescence of spatially close surface regions. We build on two strategies. First, using a hierarchical patch based shape representation we match shapes consistently in a coarse to fine manner, allowing for robustness to noise. This multi-scale representation drastically reduces the dimensionality of the problem when matching at the coarsest scale, rendering unsupervised learning feasible. Second, we constrain this hierarchical matching to be reflected in 3D by fitting a patch-wise near-rigid deformation model. Using this constraint, we leverage spatial continuity at different scales to capture global shape properties, resulting in matchings that generalize well to data with different deformations and noise characteristics. Experiments demonstrate that our approach obtains significantly better results on raw 3D scans than state-of-the-art methods, while performing on-par on standard test scenarios.

CVSep 17, 2024
SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction

Marko Mihajlovic, Sergey Prokudin, Siyu Tang et al.

Digitizing 3D static scenes and 4D dynamic events from multi-view images has long been a challenge in computer vision and graphics. Recently, 3D Gaussian Splatting (3DGS) has emerged as a practical and scalable reconstruction method, gaining popularity due to its impressive reconstruction quality, real-time rendering capabilities, and compatibility with widely used visualization tools. However, the method requires a substantial number of input views to achieve high-quality scene reconstruction, introducing a significant practical bottleneck. This challenge is especially severe in capturing dynamic scenes, where deploying an extensive camera array can be prohibitively costly. In this work, we identify the lack of spatial autocorrelation of splat features as one of the factors contributing to the suboptimal performance of the 3DGS technique in sparse reconstruction settings. To address the issue, we propose an optimization strategy that effectively regularizes splat features by modeling them as the outputs of a corresponding implicit neural field. This results in a consistent enhancement of reconstruction quality across various scenarios. Our approach effectively handles static and dynamic cases, as demonstrated by extensive testing across different setups and scene complexities.

63.9CVApr 24
PAGaS: Pixel-Aligned 1DoF Gaussian Splatting for Depth Refinement

David Recasens, Robert Maier, Aljaz Bozic et al.

Gaussian Splatting (GS) has emerged as an efficient approach for high-quality novel view synthesis. While early GS variants struggled to accurately model the scene's geometry, recent advancements constraining the Gaussians' spread and shapes, such as 2D Gaussian Splatting, have significantly improved geometric fidelity. In this paper, we present Pixel-Aligned 1DoF Gaussian Splatting (PAGaS) that adapts the GS representation from novel view synthesis to the multi-view stereo depth task. Our key contribution is modeling a pixel's depth using one-degree-of-freedom (1DoF) Gaussians that remain tightly constrained during optimization. Unlike existing approaches, our Gaussians' positions and sizes are restricted by the back-projected pixel volumes, leaving depth as the sole degree of freedom to optimize. PAGaS produces highly detailed depths, as illustrated in Figure 1. We quantitatively validate these improvements on top of reference geometric and learning-based multi-view stereo baselines on challenging 3D reconstruction benchmarks. Code: davidrecasens.github.io/pagas

CVJul 29, 2024
VortSDF: 3D Modeling with Centroidal Voronoi Tesselation on Signed Distance Field

Diego Thomas, Briac Toussaint, Jean-Sebastien Franco et al.

Volumetric shape representations have become ubiquitous in multi-view reconstruction tasks. They often build on regular voxel grids as discrete representations of 3D shape functions, such as SDF or radiance fields, either as the full shape model or as sampled instantiations of continuous representations, as with neural networks. Despite their proven efficiency, voxel representations come with the precision versus complexity trade-off. This inherent limitation can significantly impact performance when moving away from simple and uncluttered scenes. In this paper we investigate an alternative discretization strategy with the Centroidal Voronoi Tesselation (CVT). CVTs allow to better partition the observation space with respect to shape occupancy and to focus the discretization around shape surfaces. To leverage this discretization strategy for multi-view reconstruction, we introduce a volumetric optimization framework that combines explicit SDF fields with a shallow color network, in order to estimate 3D shape properties over tetrahedral grids. Experimental results with Chamfer statistics validate this approach with unprecedented reconstruction quality on various scenarios such as objects, open scenes or human.

CVMay 29, 2023Code
3DTeethSeg'22: 3D Teeth Scan Segmentation and Labeling Challenge

Achraf Ben-Hamadou, Oussama Smaoui, Ahmed Rekik et al.

Teeth localization, segmentation, and labeling from intra-oral 3D scans are essential tasks in modern dentistry to enhance dental diagnostics, treatment planning, and population-based studies on oral health. However, developing automated algorithms for teeth analysis presents significant challenges due to variations in dental anatomy, imaging protocols, and limited availability of publicly accessible data. To address these challenges, the 3DTeethSeg'22 challenge was organized in conjunction with the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) in 2022, with a call for algorithms tackling teeth localization, segmentation, and labeling from intraoral 3D scans. A dataset comprising a total of 1800 scans from 900 patients was prepared, and each tooth was individually annotated by a human-machine hybrid algorithm. A total of 6 algorithms were evaluated on this dataset. In this study, we present the evaluation results of the 3DTeethSeg'22 challenge. The 3DTeethSeg'22 challenge code can be accessed at: https://github.com/abenhamadou/3DTeethSeg22_challenge

CVMar 15, 2024
ANIM: Accurate Neural Implicit Model for Human Reconstruction from a single RGB-D image

Marco Pesavento, Yuanlu Xu, Nikolaos Sarafianos et al.

Recent progress in human shape learning, shows that neural implicit models are effective in generating 3D human surfaces from limited number of views, and even from a single RGB image. However, existing monocular approaches still struggle to recover fine geometric details such as face, hands or cloth wrinkles. They are also easily prone to depth ambiguities that result in distorted geometries along the camera optical axis. In this paper, we explore the benefits of incorporating depth observations in the reconstruction process by introducing ANIM, a novel method that reconstructs arbitrary 3D human shapes from single-view RGB-D images with an unprecedented level of accuracy. Our model learns geometric details from both multi-resolution pixel-aligned and voxel-aligned features to leverage depth information and enable spatial relationships, mitigating depth ambiguities. We further enhance the quality of the reconstructed shape by introducing a depth-supervision strategy, which improves the accuracy of the signed distance field estimation of points that lie on the reconstructed surface. Experiments demonstrate that ANIM outperforms state-of-the-art works that use RGB, surface normals, point cloud or RGB-D data as input. In addition, we introduce ANIM-Real, a new multi-modal dataset comprising high-quality scans paired with consumer-grade RGB-D camera, and our protocol to fine-tune ANIM, enabling high-quality reconstruction from real-world human capture.

CVDec 28, 2023
HISR: Hybrid Implicit Surface Representation for Photorealistic 3D Human Reconstruction

Angtian Wang, Yuanlu Xu, Nikolaos Sarafianos et al. · meta-ai

Neural reconstruction and rendering strategies have demonstrated state-of-the-art performances due, in part, to their ability to preserve high level shape details. Existing approaches, however, either represent objects as implicit surface functions or neural volumes and still struggle to recover shapes with heterogeneous materials, in particular human skin, hair or clothes. To this aim, we present a new hybrid implicit surface representation to model human shapes. This representation is composed of two surface layers that represent opaque and translucent regions on the clothed human body. We segment different regions automatically using visual cues and learn to reconstruct two signed distance functions (SDFs). We perform surface-based rendering on opaque regions (e.g., body, face, clothes) to preserve high-fidelity surface normals and volume rendering on translucent regions (e.g., hair). Experiments demonstrate that our approach obtains state-of-the-art results on 3D human reconstructions, and also shows competitive performances on other objects.

CVSep 8, 2025
Matching Shapes Under Different Topologies: A Topology-Adaptive Deformation Guided Approach

Aymen Merrouche, Stefanie Wuhrer, Edmond Boyer

Non-rigid 3D mesh matching is a critical step in computer vision and computer graphics pipelines. We tackle matching meshes that contain topological artefacts which can break the assumption made by current approaches. While Functional Maps assume the deformation induced by the ground truth correspondences to be near-isometric, ARAP-like deformation-guided approaches assume the latter to be ARAP. Neither assumption holds in certain topological configurations of the input shapes. We are motivated by real-world scenarios such as per-frame multi-view reconstructions, often suffering from topological artefacts. To this end, we propose a topology-adaptive deformation model allowing changes in shape topology to align shape pairs under ARAP and bijective association constraints. Using this model, we jointly optimise for a template mesh with adequate topology and for its alignment with the shapes to be matched to extract correspondences. We show that, while not relying on any data-driven prior, our approach applies to highly non-isometric shapes and shapes with topological artefacts, including noisy per-frame multi-view reconstructions, even outperforming methods trained on large datasets in 3D alignment quality.

CVDec 11, 2024
NF3DM: Combining Neural Fields and Deformation Models for 3D Non-Rigid Motion Reconstruction

Aymen Merrouche, Stefanie Wuhrer, Edmond Boyer

We introduce a novel, data-driven approach for reconstructing temporally coherent 3D motion from unstructured and potentially partial observations of non-rigidly deforming shapes. Our goal is to achieve high-fidelity motion reconstructions for shapes that undergo near-isometric deformations, such as humans wearing loose clothing. The key novelty of our work lies in its ability to combine implicit shape representations with explicit mesh-based deformation models, enabling detailed and temporally coherent motion reconstructions without relying on parametric shape models or decoupling shape and motion. Each frame is represented as a neural field decoded from a feature space where observations over time are fused, hence preserving geometric details present in the input data. Temporal coherence is enforced with a near-isometric deformation constraint between adjacent frames that applies to the underlying surface in the neural field. Our method outperforms state-of-the-art approaches, as demonstrated by its application to human and animal motion sequences reconstructed from monocular depth videos.

IVOct 23, 2021
Vertebrae localization, segmentation and identification using a graph optimization and an anatomic consistency cycle

Di Meng, Edmond Boyer, Sergi Pujades

Vertebrae localization, segmentation and identification in CT images is key to numerous clinical applications. While deep learning strategies have brought to this field significant improvements over recent years, transitional and pathological vertebrae are still plaguing most existing approaches as a consequence of their poor representation in training datasets. Alternatively, proposed non-learning based methods take benefit of prior knowledge to handle such particular cases. In this work we propose to combine both strategies. To this purpose we introduce an iterative cycle in which individual vertebrae are recursively localized, segmented and identified using deep-networks, while anatomic consistency is enforced using statistical priors. In this strategy, the transitional vertebrae identification is handled by encoding their configurations in a graphical model that aggregates local deep-network predictions into an anatomically consistent final result. Our approach achieves state-of-the-art results on the VerSe20 challenge benchmark, and outperforms all methods on transitional vertebrae as well as the generalization to the VerSe19 challenge benchmark. Furthermore, our method can detect and report inconsistent spine regions that do not satisfy the anatomic consistency priors. Our code and model are openly available for research purposes.

CVSep 3, 2021
Neural Human Deformation Transfer

Jean Basset, Adnane Boukhayma, Stefanie Wuhrer et al.

We consider the problem of human deformation transfer, where the goal is to retarget poses between different characters. Traditional methods that tackle this problem require a clear definition of the pose, and use this definition to transfer poses between characters. In this work, we take a different approach and transform the identity of a character into a new identity without modifying the character's pose. This offers the advantage of not having to define equivalences between 3D human poses, which is not straightforward as poses tend to change depending on the identity of the character performing them, and as their meaning is highly contextual. To achieve the deformation transfer, we propose a neural encoder-decoder architecture where only identity information is encoded and where the decoder is conditioned on the pose. We use pose independent representations, such as isometry-invariant shape characteristics, to represent identity features. Our model uses these features to supervise the prediction of offsets from the deformed pose to the result of the transfer. We show experimentally that our method outperforms state-of-the-art methods both quantitatively and qualitatively, and generalises better to poses not seen during training. We also introduce a fine-tuning step that allows to obtain competitive results for extreme identities, and allows to transfer simple clothing.

CVSep 3, 2021
3D Human Shape Style Transfer

Joao Regateiro, Edmond Boyer

We consider the problem of modifying/replacing the shape style of a real moving character with those of an arbitrary static real source character. Traditional solutions follow a pose transfer strategy, from the moving character to the source character shape, that relies on skeletal pose parametrization. In this paper, we explore an alternative approach that transfers the source shape style onto the moving character. The expected benefit is to avoid the inherently difficult pose to shape conversion required with skeletal parametrization applied on real characters. To this purpose, we consider image style transfer techniques and investigate how to adapt them to 3D human shapes. Adaptive Instance Normalisation (AdaIN) and SPADE architectures have been demonstrated to efficiently and accurately transfer the style of an image onto another while preserving the original image structure. Where AdaIN contributes with a module to perform style transfer through the statistics of the subjects and SPADE contribute with a residual block architecture to refine the quality of the style transfer. We demonstrate that these approaches are extendable to the 3D shape domain by proposing a convolutional neural network that applies the same principle of preserving the shape structure (shape pose) while transferring the style of a new subject shape. The generated results are supervised through a discriminator module to evaluate the realism of the shape, whilst enforcing the decoder to synthesise plausible shapes and improve the style transfer for unseen subjects. Our experiments demonstrate an average of $\approx 56\%$ qualitative and quantitative improvements over the baseline in shape transfer through optimization-based and learning-based methods.

CVApr 16, 2021
Data-Driven 3D Reconstruction of Dressed Humans From Sparse Views

Pierre Zins, Yuanlu Xu, Edmond Boyer et al.

Recently, data-driven single-view reconstruction methods have shown great progress in modeling 3D dressed humans. However, such methods suffer heavily from depth ambiguities and occlusions inherent to single view inputs. In this paper, we tackle this problem by considering a small set of input views and investigate the best strategy to suitably exploit information from these views. We propose a data-driven end-to-end approach that reconstructs an implicit 3D representation of dressed humans from sparse camera views. Specifically, we introduce three key components: first a spatially consistent reconstruction that allows for arbitrary placement of the person in the input views using a perspective camera model; second an attention-based fusion layer that learns to aggregate visual information from several viewpoints; and third a mechanism that encodes local 3D patterns under the multi-view context. In the experiments, we show the proposed approach outperforms the state of the art on standard data both quantitatively and qualitatively. To demonstrate the spatially consistent reconstruction, we apply our approach to dynamic scenes. Additionally, we apply our method on real data acquired with a multi-camera platform and demonstrate our approach can obtain results comparable to multi-view stereo with dramatically less views.

CVMar 23, 2021
Dual Mesh Convolutional Networks for Human Shape Correspondence

Nitika Verma, Adnane Boukhayma, Jakob Verbeek et al.

Convolutional networks have been extremely successful for regular data structures such as 2D images and 3D voxel grids. The transposition to meshes is, however, not straight-forward due to their irregular structure. We explore how the dual, face-based representation of triangular meshes can be leveraged as a data structure for graph convolutional networks. In the dual mesh, each node (face) has a fixed number of neighbors, which makes the networks less susceptible to overfitting on the mesh topology, and also al-lows the use of input features that are naturally defined over faces, such as surface normals and face areas. We evaluate the dual approach on the shape correspondence task on theFaust human shape dataset and variants of it with differ-ent mesh topologies. Our experiments show that results of graph convolutional networks improve when defined over the dual rather than primal mesh. Moreover, our models that explicitly leverage the neighborhood regularity of dual meshes allow improving results further while being more robust to changes in the mesh topology.

CVDec 14, 2020
Articulated Shape Matching Using Laplacian Eigenfunctions and Unsupervised Point Registration

Diana Mateus, Radu Horaud, David Knossow et al.

Matching articulated shapes represented by voxel-sets reduces to maximal sub-graph isomorphism when each set is described by a weighted graph. Spectral graph theory can be used to map these graphs onto lower dimensional spaces and match shapes by aligning their embeddings in virtue of their invariance to change of pose. Classical graph isomorphism schemes relying on the ordering of the eigenvalues to align the eigenspaces fail when handling large data-sets or noisy data. We derive a new formulation that finds the best alignment between two congruent $K$-dimensional sets of points by selecting the best subset of eigenfunctions of the Laplacian matrix. The selection is done by matching eigenfunction signatures built with histograms, and the retained set provides a smart initialization for the alignment problem with a considerable impact on the overall performance. Dense shape matching casted into graph matching reduces then, to point registration of embeddings under orthogonal transformations; the registration is solved using the framework of unsupervised clustering and the EM algorithm. Maximal subset matching of non identical shapes is handled by defining an appropriate outlier class. Experimental results on challenging examples show how the algorithm naturally treats changes of topology, shape variations and different sampling densities.

CVDec 10, 2020
Topology-Adaptive Mesh Deformation for Surface Evolution, Morphing, and Multi-View Reconstruction

Andrei Zaharescu, Edmond Boyer, Radu Horaud

Triangulated meshes have become ubiquitous discrete-surface representations. In this paper we address the problem of how to maintain the manifold properties of a surface while it undergoes strong deformations that may cause topological changes. We introduce a new self-intersection removal algorithm, TransforMesh, and we propose a mesh evolution framework based on this algorithm. Numerous shape modelling applications use surface evolution in order to improve shape properties, such as appearance or accuracy. Both explicit and implicit representations can be considered for that purpose. However, explicit mesh representations, while allowing for accurate surface modelling, suffer from the inherent difficulty of reliably dealing with self-intersections and topological changes such as merges and splits. As a consequence, a majority of methods rely on implicit representations of surfaces, e.g. level-sets, that naturally overcome these issues. Nevertheless, these methods are based on volumetric discretizations, which introduce an unwanted precision-complexity trade-off. The method that we propose handles topological changes in a robust manner and removes self intersections, thus overcoming the traditional limitations of mesh-based approaches. To illustrate the effectiveness of TransforMesh, we describe two challenging applications, namely surface morphing and 3-D reconstruction.

CVDec 8, 2020
Human Motion Tracking by Registering an Articulated Surface to 3-D Points and Normals

Radu Horaud, Matti Niskanen, Guillaume Dewaele et al.

We address the problem of human motion tracking by registering a surface to 3-D data. We propose a method that iteratively computes two things: Maximum likelihood estimates for both the kinematic and free-motion parameters of a kinematic human-body representation, as well as probabilities that the data are assigned either to a body part, or to an outlier cluster. We introduce a new metric between observed points and normals on one side, and a parameterized surface on the other side, the latter being defined as a blending over a set of ellipsoids. We claim that this metric is well suited when one deals with either visual-hull or visual-shape observations. We illustrate the method by tracking human motions using sparse visual-shape data (3-D surface points and normals) gathered from imperfect silhouettes.

CVJul 20, 2020
Discrete Point Flow Networks for Efficient Point Cloud Generation

Roman Klokov, Edmond Boyer, Jakob Verbeek

Generative models have proven effective at modeling 3D shapes and their statistical variations. In this paper we investigate their application to point clouds, a 3D shape representation widely used in computer vision for which, however, only few generative models have yet been proposed. We introduce a latent variable model that builds on normalizing flows with affine coupling layers to generate 3D point clouds of an arbitrary size given a latent shape representation. To evaluate its benefits for shape modeling we apply this model for generation, autoencoding, and single-view shape reconstruction tasks. We improve over recent GAN-based models in terms of most metrics that assess generation and autoencoding. Compared to recent work based on continuous flows, our model offers a significant speedup in both training and inference times for similar or better performance. For single-view shape reconstruction we also obtain results on par with state-of-the-art voxel, point cloud, and mesh-based methods.

CVMar 21, 2020
Cross-modal Deep Face Normals with Deactivable Skip Connections

Victoria Fernandez Abrevaya, Adnane Boukhayma, Philip H. S. Torr et al.

We present an approach for estimating surface normals from in-the-wild color images of faces. While data-driven strategies have been proposed for single face images, limited available ground truth data makes this problem difficult. To alleviate this issue, we propose a method that can leverage all available image and normal data, whether paired or not, thanks to a novel cross-modal learning architecture. In particular, we enable additional training with single modality data, either color or normal, by using two encoder-decoder networks with a shared latent space. The proposed architecture also enables face details to be transferred between the image and normal domains, given paired data, through skip connections between the image encoder and normal decoder. Core to our approach is a novel module that we call deactivable skip connections, which allows integrating both the auto-encoded and image-to-normal branches within the same architecture that can be trained end-to-end. This allows learning of a rich latent space that can accurately capture the normal information. We compare against state-of-the-art methods and show that our approach can achieve significant improvements, both quantitative and qualitative, with natural face images.

CVAug 20, 2019
Probabilistic Reconstruction Networks for 3D Shape Inference from a Single Image

Roman Klokov, Jakob Verbeek, Edmond Boyer

We study end-to-end learning strategies for 3D shape inference from images, in particular from a single image. Several approaches in this direction have been investigated that explore different shape representations and suitable learning architectures. We focus instead on the underlying probabilistic mechanisms involved and contribute a more principled probabilistic inference-based reconstruction framework, which we coin Probabilistic Reconstruction Networks. This framework expresses image conditioned 3D shape inference through a family of latent variable models, and naturally decouples the choice of shape representations from the inference itself. Moreover, it suggests different options for the image conditioning and allows training in two regimes, using either Monte Carlo or variational approximation of the marginal likelihood. Using our Probabilistic Reconstruction Networks we obtain single image 3D reconstruction results that set a new state of the art on the ShapeNet dataset in terms of the intersection over union and earth mover's distance evaluation metrics. Interestingly, we obtain these results using a basic voxel grid representation, improving over recent work based on finer point cloud or mesh based representations.

CVFeb 10, 2019
A Decoupled 3D Facial Shape Model by Adversarial Training

Victoria Fernandez Abrevaya, Adnane Boukhayma, Stefanie Wuhrer et al.

Data-driven generative 3D face models are used to compactly encode facial shape data into meaningful parametric representations. A desirable property of these models is their ability to effectively decouple natural sources of variation, in particular identity and expression. While factorized representations have been proposed for that purpose, they are still limited in the variability they can capture and may present modeling artifacts when applied to tasks such as expression transfer. In this work, we explore a new direction with Generative Adversarial Networks and show that they contribute to better face modeling performances, especially in decoupling natural factors, while also achieving more diverse samples. To train the model we introduce a novel architecture that combines a 3D generator with a 2D discriminator that leverages conventional CNNs, where the two components are bridged by a geometry mapping layer. We further present a training scheme, based on auxiliary classifiers, to explicitly disentangle identity and expression attributes. Through quantitative and qualitative results on standard face datasets, we illustrate the benefits of our model and demonstrate that it outperforms competing state of the art methods in terms of decoupling and diversity.

CVJun 16, 2017
FeaStNet: Feature-Steered Graph Convolutions for 3D Shape Analysis

Nitika Verma, Edmond Boyer, Jakob Verbeek

Convolutional neural networks (CNNs) have massively impacted visual recognition in 2D images, and are now ubiquitous in state-of-the-art approaches. CNNs do not easily extend, however, to data that are not represented by regular grids, such as 3D shape meshes or other graph-structured data, to which traditional local convolution operators do not directly apply. To address this problem, we propose a novel graph-convolution operator to establish correspondences between filter weights and graph neighborhoods with arbitrary connectivity. The key novelty of our approach is that these correspondences are dynamically computed from features learned by the network, rather than relying on predefined static coordinates over the graph as in previous work. We obtain excellent experimental results that significantly improve over previous state-of-the-art shape correspondence results. This shows that our approach can learn effective shape representations from raw input coordinates, without relying on shape descriptors.

GRJan 6, 2016
Shape Animation with Combined Captured and Simulated Dynamics

Benjamin Allain, Li Wang, Jean-Sebastien Franco et al.

We present a novel volumetric animation generation framework to create new types of animations from raw 3D surface or point cloud sequence of captured real performances. The framework considers as input time incoherent 3D observations of a moving shape, and is thus particularly suitable for the output of performance capture platforms. In our system, a suitable virtual representation of the actor is built from real captures that allows seamless combination and simulation with virtual external forces and objects, in which the original captured actor can be reshaped, disassembled or reassembled from user-specified virtual physics. Instead of using the dominant surface-based geometric representation of the capture, which is less suitable for volumetric effects, our pipeline exploits Centroidal Voronoi tessellation decompositions as unified volumetric representation of the real captured actor, which we show can be used seamlessly as a building block for all processing stages, from capture and tracking to virtual physic simulation. The representation makes no human specific assumption and can be used to capture and re-simulate the actor with props or other moving scenery elements. We demonstrate the potential of this pipeline for virtual reanimation of a real captured event with various unprecedented volumetric visual effects, such as volumetric distortion, erosion, morphing, gravity pull, or collisions.