Towards Learning Fine-Grained Disentangled Representations from Speech
This work addresses a gap in speech processing for disentangled representations, but appears incremental as it reviews existing efforts and introduces a concept without demonstrated impact.
The paper tackles the problem of learning fine-grained disentangled representations from speech, proposing a novel concept in this area, but does not report concrete results or numbers.
Learning disentangled representations of high-dimensional data is currently an active research area. However, compared to the field of computer vision, less work has been done for speech processing. In this paper, we provide a review of two representative efforts on this topic and propose the novel concept of fine-grained disentangled speech representation learning.