ConcateNet: Dialogue Separation Using Local And Global Feature Concatenation
This addresses the problem of isolating dialogue signals for broadcast applications, offering incremental improvements in generalization for out-of-domain scenarios.
The paper tackles dialogue separation from mixtures like movies or TV programs by proposing ConcateNet, which processes local and global features for better generalization, achieving competitive performance on in-domain datasets and improved generalization on an out-of-domain broadcast dataset compared to state-of-the-art noise-reduction methods.
Dialogue separation involves isolating a dialogue signal from a mixture, such as a movie or a TV program. This can be a necessary step to enable dialogue enhancement for broadcast-related applications. In this paper, ConcateNet for dialogue separation is proposed, which is based on a novel approach for processing local and global features aimed at better generalization for out-of-domain signals. ConcateNet is trained using a noise reduction-focused, publicly available dataset and evaluated using three datasets: two noise reduction-focused datasets (in-domain), which show competitive performance for ConcateNet, and a broadcast-focused dataset (out-of-domain), which verifies the better generalization performance for the proposed architecture compared to considered state-of-the-art noise-reduction methods.