Measuring Conversational Fluidity in Automated Dialogue Agents
This work addresses the need for better evaluation metrics in dialogue systems, but it is incremental as it builds on existing tools and methods.
The authors tackled the problem of evaluating conversational fluidity in dialogue agents by developing an automated method that combines NLP tools and human ratings into a classifier, resulting in improved metrics over existing ones.
We present an automated evaluation method to measure fluidity in conversational dialogue systems. The method combines various state of the art Natural Language tools into a classifier, and human ratings on these dialogues to train an automated judgment model. Our experiments show that the results are an improvement on existing metrics for measuring fluidity.