How to Not Measure Disentanglement
This addresses a methodological gap for researchers in representation learning, but it is incremental as it focuses on improving evaluation rather than introducing a new paradigm.
The paper tackles the problem of evaluating disentangled representations by analyzing existing metrics and finding they lack theoretical guarantees and consistent correlation with qualitative studies. It proposes a new metric that is proven to satisfy two basic desirable properties for accurate scoring.
To evaluate disentangled representations several metrics have been proposed. However, theoretical guarantees for conventional metrics of disentanglement are missing. Moreover, conventional metrics do not have a consistent correlation with the outcomes of qualitative studies. In this paper we analyze metrics of disentanglement and their properties. We conclude that existing metrics of disentanglement were created to reflect different characteristics of disentanglement and do not satisfy two basic desirable properties: (1) assign a high score to representations that are disentangled according to the definition; and (2) assign a low score to representations that are entangled according to the definition. In addition, we propose a new metric of disentanglement and prove that it satisfies both of the properties.