Hardware Resilience Properties of Text-Guided Image Classifiers
This provides a practical solution for improving model robustness against hardware failures in image classification, though it is incremental as it builds on existing methods like CLIP and GPT-3.
The paper tackles the problem of transient hardware errors in deployed image classification models by using enriched text embeddings from GPT-3 and CLIP to initialize the classification layer, achieving a 5.5x average increase in hardware reliability with only a 0.3% average accuracy drop.
This paper presents a novel method to enhance the reliability of image classification models during deployment in the face of transient hardware errors. By utilizing enriched text embeddings derived from GPT-3 with question prompts per class and CLIP pretrained text encoder, we investigate their impact as an initialization for the classification layer. Our approach achieves a remarkable $5.5\times$ average increase in hardware reliability (and up to $14\times$) across various architectures in the most critical layer, with minimal accuracy drop ($0.3\%$ on average) compared to baseline PyTorch models. Furthermore, our method seamlessly integrates with any image classification backbone, showcases results across various network architectures, decreases parameter and FLOPs overhead, and follows a consistent training recipe. This research offers a practical and efficient solution to bolster the robustness of image classification models against hardware failures, with potential implications for future studies in this domain. Our code and models are released at https://github.com/TalalWasim/TextGuidedResilience.