Aligning Logits Generatively for Principled Black-Box Knowledge Distillation
This addresses model compression challenges in edge computing with limited data exchange, but appears incremental as it builds on existing B2KD frameworks.
The paper tackles the problem of Black-Box Knowledge Distillation (B2KD) for cloud-to-edge model compression with invisible data and models, proposing a new method called Mapping-Emulation KD (MEKD) that outperforms previous state-of-the-art approaches on various benchmarks.
Black-Box Knowledge Distillation (B2KD) is a formulated problem for cloud-to-edge model compression with invisible data and models hosted on the server. B2KD faces challenges such as limited Internet exchange and edge-cloud disparity of data distributions. In this paper, we formalize a two-step workflow consisting of deprivatization and distillation, and theoretically provide a new optimization direction from logits to cell boundary different from direct logits alignment. With its guidance, we propose a new method Mapping-Emulation KD (MEKD) that distills a black-box cumbersome model into a lightweight one. Our method does not differentiate between treating soft or hard responses, and consists of: 1) deprivatization: emulating the inverse mapping of the teacher function with a generator, and 2) distillation: aligning low-dimensional logits of the teacher and student models by reducing the distance of high-dimensional image points. For different teacher-student pairs, our method yields inspiring distillation performance on various benchmarks, and outperforms the previous state-of-the-art approaches.