Which Style Makes Me Attractive? Interpretable Control Discovery and Counterfactual Explanation on StyleGAN
This work addresses the need for better interpretability in GAN-based image manipulation and CNN explainability, offering incremental improvements by leveraging existing models for more concrete semantic disentanglement.
The paper tackles the problem of discovering interpretable controls in StyleGAN2's latent space for face generation by using face analysis models to define semantic criteria, and applies these controls to generate counterfactual explanations for CNN classifiers to assess if they learn intended semantics, with experiments showing effectiveness across various criteria.
The semantically disentangled latent subspace in GAN provides rich interpretable controls in image generation. This paper includes two contributions on semantic latent subspace analysis in the scenario of face generation using StyleGAN2. First, we propose a novel approach to disentangle latent subspace semantics by exploiting existing face analysis models, e.g., face parsers and face landmark detectors. These models provide the flexibility to construct various criterions with very concrete and interpretable semantic meanings (e.g., change face shape or change skin color) to restrict latent subspace disentanglement. Rich latent space controls unknown previously can be discovered using the constructed criterions. Second, we propose a new perspective to explain the behavior of a CNN classifier by generating counterfactuals in the interpretable latent subspaces we discovered. This explanation helps reveal whether the classifier learns semantics as intended. Experiments on various disentanglement criterions demonstrate the effectiveness of our approach. We believe this approach contributes to both areas of image manipulation and counterfactual explainability of CNNs. The code is available at \url{https://github.com/prclibo/ice}.