TSCAN : Dialog Structure discovery using SCAN
This addresses the need for unsupervised dialog structure discovery, which is incremental as it applies existing methods to a new domain.
The paper tackles the problem of discovering dialog structure by clustering utterances without predefined labels or ontologies, achieving interpretable clusters with self-generated labels using an adaptation of SCAN and BERT.
Can we discover dialog structure by dividing utterances into labelled clusters. Can these labels be generated from the data. Typically for dialogs we need an ontology and use that to discover structure, however by using unsupervised classification and self-labelling we are able to intuit this structure without any labels or ontology. In this paper we apply SCAN (Semantic Clustering using Nearest Neighbors) to dialog data. We used BERT for pretext task and an adaptation of SCAN for clustering and self labeling. These clusters are used to identify transition probabilities and create the dialog structure. The self-labelling method used for SCAN makes these structures interpretable as every cluster has a label. As the approach is unsupervised, evaluation metrics is a challenge, we use statistical measures as proxies for structure quality