Detecting Response Generation Not Requiring Factual Judgment
This addresses the challenge of ensuring factuality in LLM outputs for dialogue systems, though it is incremental as it focuses on a specific classification task.
The study tackled the problem of balancing attractiveness and factuality in dialogue responses by predicting sentences that do not require factual judgment, such as agreements or personal opinions, and achieved about 88% classification accuracy on a newly created dataset.
With the remarkable development of large language models (LLMs), ensuring the factuality of output has become a challenge. However, having all the contents of the response with given knowledge or facts is not necessarily a good thing in dialogues. This study aimed to achieve both attractiveness and factuality in a dialogue response for which a task was set to predict sentences that do not require factual correctness judgment such as agreeing, or personal opinions/feelings. We created a dataset, dialogue dataset annotated with fact-check-needed label (DDFC), for this task via crowdsourcing, and classification tasks were performed on several models using this dataset. The model with the highest classification accuracy could yield about 88% accurate classification results.