Mixed-Type Wafer Classification For Low Memory Devices Using Knowledge Distillation
This addresses the challenge of limited memory and labeled data for wafer manufacturing defect analysis, enabling real-time deployment in fabrication labs, though it is incremental as it builds on existing knowledge distillation techniques.
The paper tackles the problem of deploying complex deep learning models for mixed-type wafer defect pattern recognition on low-memory embedded devices by proposing an unsupervised knowledge distillation method, achieving model compression up to 10 times smaller without accuracy loss and outperforming state-of-the-art models.
Manufacturing wafers is an intricate task involving thousands of steps. Defect Pattern Recognition (DPR) of wafer maps is crucial for determining the root cause of production defects, which may further provide insight for yield improvement in wafer foundry. During manufacturing, various defects may appear standalone in the wafer or may appear as different combinations. Identifying multiple defects in a wafer is generally harder compared to identifying a single defect. Recently, deep learning methods have gained significant traction in mixed-type DPR. However, the complexity of defects requires complex and large models making them very difficult to operate on low-memory embedded devices typically used in fabrication labs. Another common issue is the unavailability of labeled data to train complex networks. In this work, we propose an unsupervised training routine to distill the knowledge of complex pre-trained models to lightweight deployment-ready models. We empirically show that this type of training compresses the model without sacrificing accuracy despite being up to 10 times smaller than the teacher model. The compressed model also manages to outperform contemporary state-of-the-art models.