Mingzhen Shao

CV
5papers
26citations
Novelty51%
AI Score37

5 Papers

CVApr 19, 2023
Analyzing the Domain Shift Immunity of Deep Homography Estimation

Mingzhen Shao, Tolga Tasdizen, Sarang Joshi

Homography estimation serves as a fundamental technique for image alignment in a wide array of applications. The advent of convolutional neural networks has introduced learning-based methodologies that have exhibited remarkable efficacy in this realm. Yet, the generalizability of these approaches across distinct domains remains underexplored. Unlike other conventional tasks, CNN-driven homography estimation models show a distinctive immunity to domain shifts, enabling seamless deployment from one dataset to another without the necessity of transfer learning. This study explores the resilience of a variety of deep homography estimation models to domain shifts, revealing that the network architecture itself is not a contributing factor to this remarkable adaptability. By closely examining the models' focal regions and subjecting input images to a variety of modifications, we confirm that the models heavily rely on local textures such as edges and corner points for homography estimation. Moreover, our analysis underscores that the domain shift immunity itself is intricately tied to the utilization of these local textures.

CVJul 9, 2023
Random Position Adversarial Patch for Vision Transformers

Mingzhen Shao

Previous studies have shown the vulnerability of vision transformers to adversarial patches, but these studies all rely on a critical assumption: the attack patches must be perfectly aligned with the patches used for linear projection in vision transformers. Due to this stringent requirement, deploying adversarial patches for vision transformers in the physical world becomes impractical, unlike their effectiveness on CNNs. This paper proposes a novel method for generating an adversarial patch (G-Patch) that overcomes the alignment constraint, allowing the patch to launch a targeted attack at any position within the field of view. Specifically, instead of directly optimizing the patch using gradients, we employ a GAN-like structure to generate the adversarial patch. Our experiments show the effectiveness of the adversarial patch in achieving universal attacks on vision transformers, both in digital and physical-world scenarios. Additionally, further analysis reveals that the generated adversarial patch exhibits robustness to brightness restriction, color transfer, and random noise. Real-world attack experiments validate the effectiveness of the G-Patch to launch robust attacks even under some very challenging conditions.

CVJul 1, 2023
Brightness-Restricted Adversarial Attack Patch

Mingzhen Shao

Adversarial attack patches have gained increasing attention due to their practical applicability in physical-world scenarios. However, the bright colors used in attack patches represent a significant drawback, as they can be easily identified by human observers. Moreover, even though these attacks have been highly successful in deceiving target networks, which specific features of the attack patch contribute to its success are still unknown. Our paper introduces a brightness-restricted patch (BrPatch) that uses optical characteristics to effectively reduce conspicuousness while preserving image independence. We also conducted an analysis of the impact of various image features (such as color, texture, noise, and size) on the effectiveness of an attack patch in physical-world deployment. Our experiments show that attack patches exhibit strong redundancy to brightness and are resistant to color transfer and noise. Based on our findings, we propose some additional methods to further reduce the conspicuousness of BrPatch. Our findings also explain the robustness of attack patches observed in physical-world scenarios.

CVDec 29, 2025
Domain-Shift Immunity in Deep Deformable Registration via Local Feature Representations

Mingzhen Shao, Sarang Joshi

Deep learning has advanced deformable image registration, surpassing traditional optimization-based methods in both accuracy and efficiency. However, learning-based models are widely believed to be sensitive to domain shift, with robustness typically pursued through large and diverse training datasets, without explaining the underlying mechanisms. In this work, we show that domain-shift immunity is an inherent property of deep deformable registration models, arising from their reliance on local feature representations rather than global appearance for deformation estimation. To isolate and validate this mechanism, we introduce UniReg, a universal registration framework that decouples feature extraction from deformation estimation using fixed, pre-trained feature extractors and a UNet-based deformation network. Despite training on a single dataset, UniReg exhibits robust cross-domain and multi-modal performance comparable to optimization-based methods. Our analysis further reveals that failures of conventional CNN-based models under modality shift originate from dataset-induced biases in early convolutional layers. These findings identify local feature consistency as the key driver of robustness in learning-based deformable registration and motivate backbone designs that preserve domain-invariant local features.

CVMay 14, 2019
Improving Head Pose Estimation with a Combined Loss and Bounding Box Margin Adjustment

Mingzhen Shao, Zhun Sun, Mete Ozay et al.

We address a problem of estimating pose of a person's head from its RGB image. The employment of CNNs for the problem has contributed to significant improvement in accuracy in recent works. However, we show that the following two methods, despite their simplicity, can attain further improvement: (i) proper adjustment of the margin of bounding box of a detected face, and (ii) choice of loss functions. We show that the integration of these two methods achieve the new state-of-the-art on standard benchmark datasets for in-the-wild head pose estimation.