Lixiang Lin

CV
9papers
139citations
Novelty53%
AI Score44

9 Papers

CVMay 25, 2022Code
Multiview Textured Mesh Recovery by Differentiable Rendering

Lixiang Lin, Jianke Zhu, Yisu Zhang

Although having achieved the promising results on shape and color recovery through self-supervision, the multi-layer perceptrons-based methods usually suffer from heavy computational cost on learning the deep implicit surface representation. Since rendering each pixel requires a forward network inference, it is very computational intensive to synthesize a whole image. To tackle these challenges, we propose an effective coarse-to-fine approach to recover the textured mesh from multi-views in this paper. Specifically, a differentiable Poisson Solver is employed to represent the object's shape, which is able to produce topology-agnostic and watertight surfaces. To account for depth information, we optimize the shape geometry by minimizing the differences between the rendered mesh and the predicted depth from multi-view stereo. In contrast to the implicit neural representation on shape and color, we introduce a physically based inverse rendering scheme to jointly estimate the environment lighting and object's reflectance, which is able to render the high resolution image at real-time. The texture of the reconstructed mesh is interpolated from a learnable dense texture grid. We have conducted the extensive experiments on several multi-view stereo datasets, whose promising results demonstrate the efficacy of our proposed approach. The code is available at https://github.com/l1346792580123/diff.

CVNov 26, 2022
FastHuman: Reconstructing High-Quality Clothed Human in Minutes

Lixiang Lin, Songyou Peng, Qijun Gan et al.

We propose an approach for optimizing high-quality clothed human body shapes in minutes, using multi-view posed images. While traditional neural rendering methods struggle to disentangle geometry and appearance using only rendering loss, and are computationally intensive, our method uses a mesh-based patch warping technique to ensure multi-view photometric consistency, and sphere harmonics (SH) illumination to refine geometric details efficiently. We employ oriented point clouds' shape representation and SH shading, which significantly reduces optimization and rendering times compared to implicit methods. Our approach has demonstrated promising results on both synthetic and real-world datasets, making it an effective solution for rapidly generating high-quality human body shapes. Project page \href{https://l1346792580123.github.io/nccsfs/}{https://l1346792580123.github.io/nccsfs/}

CVApr 26, 2023
Multi-View Stereo Representation Revisit: Region-Aware MVSNet

Yisu Zhang, Jianke Zhu, Lixiang Lin

Deep learning-based multi-view stereo has emerged as a powerful paradigm for reconstructing the complete geometrically-detailed objects from multi-views. Most of the existing approaches only estimate the pixel-wise depth value by minimizing the gap between the predicted point and the intersection of ray and surface, which usually ignore the surface topology. It is essential to the textureless regions and surface boundary that cannot be properly reconstructed. To address this issue, we suggest to take advantage of point-to-surface distance so that the model is able to perceive a wider range of surfaces. To this end, we predict the distance volume from cost volume to estimate the signed distance of points around the surface. Our proposed RA-MVSNet is patch-awared, since the perception range is enhanced by associating hypothetical planes with a patch of surface. Therefore, it could increase the completion of textureless regions and reduce the outliers at the boundary. Moreover, the mesh topologies with fine details can be generated by the introduced distance volume. Comparing to the conventional deep learning-based multi-view stereo methods, our proposed RA-MVSNet approach obtains more complete reconstruction results by taking advantage of signed distance supervision. The experiments on both the DTU and Tanks \& Temples datasets demonstrate that our proposed approach achieves the state-of-the-art results.

78.9CVMar 10Code
OmniEdit: A Training-free framework for Lip Synchronization and Audio-Visual Editing

Lixiang Lin, Siyuan Jin, Jinshan Zhang

Lip synchronization and audio-visual editing have emerged as fundamental challenges in multimodal learning, underpinning a wide range of applications, including film production, virtual avatars, and telepresence. Despite recent progress, most existing methods for lip synchronization and audio-visual editing depend on supervised fine-tuning of pre-trained models, leading to considerable computational overhead and data requirements. In this paper, we present OmniEdit, a training-free framework designed for both lip synchronization and audio-visual editing. Our approach reformulates the editing paradigm by substituting the edit sequence in FlowEdit with the target sequence, yielding an unbiased estimation of the desired output. Moreover, by removing stochastic elements from the generation process, we establish a smooth and stable editing trajectory. Extensive experimental results validate the effectiveness and robustness of the proposed framework. Code is available at https://github.com/l1346792580123/OmniEdit.

CVNov 20, 2023
Semantic-Preserved Point-based Human Avatar

Lixiang Lin, Jianke Zhu

To enable realistic experience in AR/VR and digital entertainment, we present the first point-based human avatar model that embodies the entirety expressive range of digital humans. We employ two MLPs to model pose-dependent deformation and linear skinning (LBS) weights. The representation of appearance relies on a decoder and the features that attached to each point. In contrast to alternative implicit approaches, the oriented points representation not only provides a more intuitive way to model human avatar animation but also significantly reduces both training and inference time. Moreover, we propose a novel method to transfer semantic information from the SMPL-X model to the points, which enables to better understand human body movements. By leveraging the semantic information of points, we can facilitate virtual try-on and human avatar composition through exchanging the points of same category across different subjects. Experimental results demonstrate the efficacy of our presented method.

CVJan 6, 2021Code
Weakly-Supervised Multi-Face 3D Reconstruction

Jialiang Zhang, Lixiang Lin, Jianke Zhu et al.

3D face reconstruction plays a very important role in many real-world multimedia applications, including digital entertainment, social media, affection analysis, and person identification. The de-facto pipeline for estimating the parametric face model from an image requires to firstly detect the facial regions with landmarks, and then crop each face to feed the deep learning-based regressor. Comparing to the conventional methods performing forward inference for each detected instance independently, we suggest an effective end-to-end framework for multi-face 3D reconstruction, which is able to predict the model parameters of multiple instances simultaneously using single network inference. Our proposed approach not only greatly reduces the computational redundancy in feature extraction but also makes the deployment procedure much easier using the single network model. More importantly, we employ the same global camera model for the reconstructed faces in each image, which makes it possible to recover the relative head positions and orientations in the 3D scene. We have conducted extensive experiments to evaluate our proposed approach on the sparse and dense face alignment tasks. The experimental results indicate that our proposed approach is very promising on face alignment tasks without fully-supervision and pre-processing like detection and crop. Our implementation is publicly available at \url{https://github.com/kalyo-zjl/WM3DR}.

CVMay 29, 2023
FastMESH: Fast Surface Reconstruction by Hexagonal Mesh-based Neural Rendering

Yisu Zhang, Jianke Zhu, Lixiang Lin

Despite the promising results of multi-view reconstruction, the recent neural rendering-based methods, such as implicit surface rendering (IDR) and volume rendering (NeuS), not only incur a heavy computational burden on training but also have the difficulties in disentangling the geometric and appearance. Although having achieved faster training speed than implicit representation and hash coding, the explicit voxel-based method obtains the inferior results on recovering surface. To address these challenges, we propose an effective mesh-based neural rendering approach, named FastMESH, which only samples at the intersection of ray and mesh. A coarse-to-fine scheme is introduced to efficiently extract the initial mesh by space carving. More importantly, we suggest a hexagonal mesh model to preserve surface regularity by constraining the second-order derivatives of vertices, where only low level of positional encoding is engaged for neural rendering. The experiments demonstrate that our approach achieves the state-of-the-art results on both reconstruction and novel view synthesis. Besides, we obtain 10-fold acceleration on training comparing to the implicit representation-based methods.

CVJun 11, 2021
Topology-Preserved Human Reconstruction with Details

Lixiang Lin, Jianke Zhu

It is challenging to directly estimate the human geometry from a single image due to the high diversity and complexity of body shapes with the various clothing styles. Most of model-based approaches are limited to predict the shape and pose of a minimally clothed body with over-smoothing surface. While capturing the fine detailed geometries, the model-free methods are lack of the fixed mesh topology. To address these issues, we propose a novel topology-preserved human reconstruction approach by bridging the gap between model-based and model-free human reconstruction. We present an end-to-end neural network that simultaneously predicts the pixel-aligned implicit surface and an explicit mesh model built by graph convolutional neural network. Experiments on DeepHuman and our collected dataset showed that our approach is effective. The code will be made publicly available.

CVOct 21, 2019
Attribute-aware Pedestrian Detection in a Crowd

Jialiang Zhang, Lixiang Lin, Yang Li et al.

Pedestrian detection is an initial step to perform outdoor scene analysis, which plays an essential role in many real-world applications. Although having enjoyed the merits of deep learning frameworks from the generic object detectors, pedestrian detection is still a very challenging task due to heavy occlusion and highly crowded group. Generally, the conventional detectors are unable to differentiate individuals from each other effectively under such a dense environment. To tackle this critical problem, we propose an attribute-aware pedestrian detector to explicitly model people's semantic attributes in a high-level feature detection fashion. Besides the typical semantic features, center position, target's scale and offset, we introduce a pedestrian-oriented attribute feature to encode the high-level semantic differences among the crowd. Moreover, a novel attribute-feature-based Non-Maximum Suppression~(NMS) is proposed to distinguish the person from a highly overlapped group by adaptively rejecting the false-positive results in a very crowd settings. Furthermore, a novel ground truth target is designed to alleviate the difficulties caused by the attribute configuration and extremely class imbalance issues during training. Finally, we evaluate our proposed attribute-aware pedestrian detector on two benchmark datasets including CityPersons and CrowdHuman. The experimental results show that our approach outperforms state-of-the-art methods at a large margin on pedestrian detection.