CVMar 16, 2023
DINAR: Diffusion Inpainting of Neural Textures for One-Shot Human AvatarsDavid Svitov, Dmitrii Gudkov, Renat Bashirov et al.
We present DINAR, an approach for creating realistic rigged fullbody avatars from single RGB images. Similarly to previous works, our method uses neural textures combined with the SMPL-X body model to achieve photo-realistic quality of avatars while keeping them easy to animate and fast to infer. To restore the texture, we use a latent diffusion model and show how such model can be trained in the neural texture space. The use of the diffusion model allows us to realistically reconstruct large unseen regions such as the back of a person given the frontal view. The models in our pipeline are trained using 2D images and videos only. In the experiments, our approach achieves state-of-the-art rendering quality and good generalization to new poses and viewpoints. In particular, the approach improves state-of-the-art on the SnapshotPeople public benchmark.
CVMar 17, 2023
MoRF: Mobile Realistic Fullbody Avatars from a Monocular VideoRenat Bashirov, Alexey Larionov, Evgeniya Ustinova et al.
We present a system to create Mobile Realistic Fullbody (MoRF) avatars. MoRF avatars are rendered in real-time on mobile devices, learned from monocular videos, and have high realism. We use SMPL-X as a proxy geometry and render it with DNR (neural texture and image-2-image network). We improve on prior work, by overfitting per-frame warping fields in the neural texture space, allowing to better align the training signal between different frames. We also refine SMPL-X mesh fitting procedure to improve the overall avatar quality. In the comparisons to other monocular video-based avatar systems, MoRF avatars achieve higher image sharpness and temporal consistency. Participants of our user study also preferred avatars generated by MoRF.
CVDec 3, 2025
CloseUpAvatar: High-Fidelity Animatable Full-Body Avatars with Mixture of Multi-Scale TexturesDavid Svitov, Pietro Morerio, Lourdes Agapito et al.
We present a CloseUpAvatar - a novel approach for articulated human avatar representation dealing with more general camera motions, while preserving rendering quality for close-up views. CloseUpAvatar represents an avatar as a set of textured planes with two sets of learnable textures for low and high-frequency detail. The method automatically switches to high-frequency textures only for cameras positioned close to the avatar's surface and gradually reduces their impact as the camera moves farther away. Such parametrization of the avatar enables CloseUpAvatar to adjust rendering quality based on camera distance ensuring realistic rendering across a wider range of camera orientations than previous approaches. We provide experiments using the ActorsHQ dataset with high-resolution input images. CloseUpAvatar demonstrates both qualitative and quantitative improvements over existing methods in rendering from novel wide range camera positions, while maintaining high FPS by limiting the number of required primitives.
29.3CVMar 12
NBAvatar: Neural Billboards Avatars with Realistic Hand-Face InteractionDavid Svitov, Mahtab Dahaghin
We present NBAvatar - a method for realistic rendering of head avatars handling non-rigid deformations caused by hand-face interaction. We introduce a novel representation for animated avatars by combining the training of oriented planar primitives with neural rendering. Such a combination of explicit and implicit representations enables NBAvatar to handle temporally and pose-consistent geometry, along with fine-grained appearance details provided by the neural rendering technique. In our experiments, we demonstrate that NBAvatar implicitly learns color transformations caused by face-hand interactions and surpasses existing approaches in terms of novel-view and novel-pose rendering quality. Specifically, NBAvatar achieves up to 30% LPIPS reduction under high-resolution megapixel rendering compared to Gaussian-based avatar methods, while also improving PSNR and SSIM, and achieves higher structural similarity compared to the state-of-the-art hand-face interaction method InteractAvatar.
CVNov 15, 2020Code
AmphibianDetector: adaptive computation for moving objects detectionDavid Svitov, Sergey Alyamkin
Convolutional neural networks (CNN) allow achieving the highest accuracy for the task of object detection in images. Major challenges in further development of object detectors are false-positive detections and high demand of processing power. In this paper, we propose an approach to object detection which makes it possible to reduce the number of false-positive detections by processing only moving objects and reduce the required processing power for algorithm inference. The proposed approach is a modification of CNN already trained for object detection task. This method can be used to improve the accuracy of an existing system by applying minor changes to the algorithm. The efficiency of the proposed approach was demonstrated on the open dataset "CDNet2014 pedestrian". The implementation of the method proposed in the article is available on the GitHub: https://github.com/david-svitov/AmphibianDetector
CVApr 1, 2024
HAHA: Highly Articulated Gaussian Human Avatars with Textured Mesh PriorDavid Svitov, Pietro Morerio, Lourdes Agapito et al.
We present HAHA - a novel approach for animatable human avatar generation from monocular input videos. The proposed method relies on learning the trade-off between the use of Gaussian splatting and a textured mesh for efficient and high fidelity rendering. We demonstrate its efficiency to animate and render full-body human avatars controlled via the SMPL-X parametric model. Our model learns to apply Gaussian splatting only in areas of the SMPL-X mesh where it is necessary, like hair and out-of-mesh clothing. This results in a minimal number of Gaussians being used to represent the full avatar, and reduced rendering artifacts. This allows us to handle the animation of small body parts such as fingers that are traditionally disregarded. We demonstrate the effectiveness of our approach on two open datasets: SnapshotPeople and X-Humans. Our method demonstrates on par reconstruction quality to the state-of-the-art on SnapshotPeople, while using less than a third of Gaussians. HAHA outperforms previous state-of-the-art on novel poses from X-Humans both quantitatively and qualitatively.
CVNov 13, 2024
BillBoard Splatting (BBSplat): Learnable Textured Primitives for Novel View SynthesisDavid Svitov, Pietro Morerio, Lourdes Agapito et al.
We present billboard Splatting (BBSplat) - a novel approach for novel view synthesis based on textured geometric primitives. BBSplat represents the scene as a set of optimizable textured planar primitives with learnable RGB textures and alpha-maps to control their shape. BBSplat primitives can be used in any Gaussian Splatting pipeline as drop-in replacements for Gaussians. The proposed primitives close the rendering quality gap between 2D and 3D Gaussian Splatting (GS), enabling the accurate extraction of 3D mesh as in the 2DGS framework. Additionally, the explicit nature of planar primitives enables the use of the ray-tracing effects in rasterization. Our novel regularization term encourages textures to have a sparser structure, enabling an efficient compression that leads to a reduction in the storage space of the model up to x17 times compared to 3DGS. Our experiments show the efficiency of BBSplat on standard datasets of real indoor and outdoor scenes such as Tanks&Temples, DTU, and Mip-NeRF-360. Namely, we achieve a state-of-the-art PSNR of 29.72 for DTU at Full HD resolution.
CVMar 5, 2020
MarginDistillation: distillation for margin-based softmaxDavid Svitov, Sergey Alyamkin
The usage of convolutional neural networks (CNNs) in conjunction with a margin-based softmax approach demonstrates a state-of-the-art performance for the face recognition problem. Recently, lightweight neural network models trained with the margin-based softmax have been introduced for the face identification task for edge devices. In this paper, we propose a novel distillation method for lightweight neural network architectures that outperforms other known methods for the face recognition task on LFW, AgeDB-30 and Megaface datasets. The idea of the proposed method is to use class centers from the teacher network for the student network. Then the student network is trained to get the same angles between the class centers and the face embeddings, predicted by the teacher network.
CVApr 15, 2019
Low-Power Computer Vision: Status, Challenges, OpportunitiesSergei Alyamkin, Matthew Ardi, Alexander C. Berg et al.
Computer vision has achieved impressive progress in recent years. Meanwhile, mobile phones have become the primary computing platforms for millions of people. In addition to mobile phones, many autonomous systems rely on visual data for making decisions and some of these systems have limited energy (such as unmanned aerial vehicles also called drones and mobile robots). These systems rely on batteries and energy efficiency is critical. This article serves two main purposes: (1) Examine the state-of-the-art for low-power solutions to detect objects in images. Since 2015, the IEEE Annual International Low-Power Image Recognition Challenge (LPIRC) has been held to identify the most energy-efficient computer vision solutions. This article summarizes 2018 winners' solutions. (2) Suggest directions for research as well as opportunities for low-power computer vision.
CVOct 3, 2018
2018 Low-Power Image Recognition ChallengeSergei Alyamkin, Matthew Ardi, Achille Brighton et al.
The Low-Power Image Recognition Challenge (LPIRC, https://rebootingcomputing.ieee.org/lpirc) is an annual competition started in 2015. The competition identifies the best technologies that can classify and detect objects in images efficiently (short execution time and low energy consumption) and accurately (high precision). Over the four years, the winners' scores have improved more than 24 times. As computer vision is widely used in many battery-powered systems (such as drones and mobile phones), the need for low-power computer vision will become increasingly important. This paper summarizes LPIRC 2018 by describing the three different tracks and the winners' solutions.