IVMar 7, 2022
Stepwise Feature Fusion: Local Guides GlobalJinfeng Wang, Qiming Huang, Feilong Tang et al.
Colonoscopy, currently the most efficient and recognized colon polyp detection technology, is necessary for early screening and prevention of colorectal cancer. However, due to the varying size and complex morphological features of colonic polyps as well as the indistinct boundary between polyps and mucosa, accurate segmentation of polyps is still challenging. Deep learning has become popular for accurate polyp segmentation tasks with excellent results. However, due to the structure of polyps image and the varying shapes of polyps, it easy for existing deep learning models to overfitting the current dataset. As a result, the model may not process unseen colonoscopy data. To address this, we propose a new State-Of-The-Art model for medical image segmentation, the SSFormer, which uses a pyramid Transformer encoder to improve the generalization ability of models. Specifically, our proposed Progressive Locality Decoder can be adapted to the pyramid Transformer backbone to emphasize local features and restrict attention dispersion. The SSFormer achieves statet-of-the-art performance in both learning and generalization assessment.
CVMar 6, 2022
A Robust Framework of Chromosome Straightening with ViT-Patch GANSifan Song, Jinfeng Wang, Fengrui Cheng et al.
Chromosomes carry the genetic information of humans. They exhibit non-rigid and non-articulated nature with varying degrees of curvature. Chromosome straightening is an important step for subsequent karyotype construction, pathological diagnosis and cytogenetic map development. However, robust chromosome straightening remains challenging, due to the unavailability of training images, distorted chromosome details and shapes after straightening, as well as poor generalization capability. In this paper, we propose a novel architecture, ViT-Patch GAN, consisting of a self-learned motion transformation generator and a Vision Transformer-based patch (ViT-Patch) discriminator. The generator learns the motion representation of chromosomes for straightening. With the help of the ViT-Patch discriminator, the straightened chromosomes retain more shape and banding pattern details. The experimental results show that the proposed method achieves better performance on Fréchet Inception Distance (FID), Learned Perceptual Image Patch Similarity (LPIPS) and downstream chromosome classification accuracy, and shows excellent generalization capability on a large dataset.
CVMar 4, 2021
A Novel Application of Image-to-Image Translation: Chromosome Straightening Framework by Learning from a Single ImageSifan Song, Daiyun Huang, Yalun Hu et al.
In medical imaging, chromosome straightening plays a significant role in the pathological study of chromosomes and in the development of cytogenetic maps. Whereas different approaches exist for the straightening task, typically geometric algorithms are used whose outputs are characterized by jagged edges or fragments with discontinued banding patterns. To address the flaws in the geometric algorithms, we propose a novel framework based on image-to-image translation to learn a pertinent mapping dependence for synthesizing straightened chromosomes with uninterrupted banding patterns and preserved details. In addition, to avoid the pitfall of deficient input chromosomes, we construct an augmented dataset using only one single curved chromosome image for training models. Based on this framework, we apply two popular image-to-image translation architectures, U-shape networks and conditional generative adversarial networks, to assess its efficacy. Experiments on a dataset comprised of 642 real-world chromosomes demonstrate the superiority of our framework, as compared to the geometric method in straightening performance, by rendering realistic and continued chromosome details. Furthermore, our straightened results improve the chromosome classification by 0.98%-1.39% mean accuracy.
CVFeb 1, 2021
Aurora Guard: Reliable Face Anti-Spoofing via Mobile Lighting SystemJian Zhang, Ying Tai, Taiping Yao et al.
Face authentication on mobile end has been widely applied in various scenarios. Despite the increasing reliability of cutting-edge face authentication/verification systems to variations like blinking eye and subtle facial expression, anti-spoofing against high-resolution rendering replay of paper photos or digital videos retains as an open problem. In this paper, we propose a simple yet effective face anti-spoofing system, termed Aurora Guard (AG). Our system firstly extracts the normal cues via light reflection analysis, and then adopts an end-to-end trainable multi-task Convolutional Neural Network (CNN) to accurately recover subjects' intrinsic depth and material map to assist liveness classification, along with the light CAPTCHA checking mechanism in the regression branch to further improve the system reliability. Experiments on public Replay-Attack and CASIA datasets demonstrate the merits of our proposed method over the state-of-the-arts. We also conduct extensive experiments on a large-scale dataset containing 12,000 live and diverse spoofing samples, which further validates the generalization ability of our method in the wild.