Alexander Sautin

h-index3
2papers

2 Papers

CVApr 26, 2023
EasyPortrait -- Face Parsing and Portrait Segmentation Dataset

Karina Kvanchiani, Elizaveta Petrova, Karen Efremyan et al.

Recently, video conferencing apps have become functional by accomplishing such computer vision-based features as real-time background removal and face beautification. Limited variability in existing portrait segmentation and face parsing datasets, including head poses, ethnicity, scenes, and occlusions specific to video conferencing, motivated us to create a new dataset, EasyPortrait, for these tasks simultaneously. It contains 40,000 primarily indoor photos repeating video meeting scenarios with 13,705 unique users and fine-grained segmentation masks separated into 9 classes. Inappropriate annotation masks from other datasets caused a revision of annotator guidelines, resulting in EasyPortrait's ability to process cases, such as teeth whitening and skin smoothing. The pipeline for data mining and high-quality mask annotation via crowdsourcing is also proposed in this paper. In the ablation study experiments, we proved the importance of data quantity and diversity in head poses in our dataset for the effective learning of the model. The cross-dataset evaluation experiments confirmed the best domain generalization ability among portrait segmentation datasets. Moreover, we demonstrate the simplicity of training segmentation models on EasyPortrait without extra training tricks. The proposed dataset and trained models are publicly available.

CVDec 2, 2024
HaGRIDv2: 1M Images for Static and Dynamic Hand Gesture Recognition

Anton Nuzhdin, Alexander Nagaev, Alexander Sautin et al.

This paper proposes the second version of the widespread Hand Gesture Recognition dataset HaGRID -- HaGRIDv2. We cover 15 new gestures with conversation and control functions, including two-handed ones. Building on the foundational concepts proposed by HaGRID's authors, we implemented the dynamic gesture recognition algorithm and further enhanced it by adding three new groups of manipulation gestures. The ``no gesture" class was diversified by adding samples of natural hand movements, which allowed us to minimize false positives by 6 times. Combining extra samples with HaGRID, the received version outperforms the original in pre-training models for gesture-related tasks. Besides, we achieved the best generalization ability among gesture and hand detection datasets. In addition, the second version enhances the quality of the gestures generated by the diffusion model. HaGRIDv2, pre-trained models, and a dynamic gesture recognition algorithm are publicly available.