CL AIAug 29, 2022

Personal Attribute Prediction from Conversations

arXiv:2209.09619v10.85 citationsh-index: 13

Originality Incremental advance

AI Analysis

This work addresses the need for automated personal attribute prediction in applications like chatbots and recommendations, though it is incremental as it builds on existing pre-trained language models with new supervision strategies.

The paper tackles the problem of predicting personal attributes from conversations to enrich personal knowledge bases, proposing a framework that achieves the best performance compared to twelve baselines on two real-world datasets in terms of nDCG and MRR.

Personal knowledge bases (PKBs) are critical to many applications, such as Web-based chatbots and personalized recommendation. Conversations containing rich personal knowledge can be regarded as a main source to populate the PKB. Given a user, a user attribute, and user utterances from a conversational system, we aim to predict the personal attribute value for the user, which is helpful for the enrichment of PKBs. However, there are three issues existing in previous studies: (1) manually labeled utterances are required for model training; (2) personal attribute knowledge embedded in both utterances and external resources is underutilized; (3) the performance on predicting some difficult personal attributes is unsatisfactory. In this paper, we propose a framework DSCGN based on the pre-trained language model with a noise-robust loss function to predict personal attributes from conversations without requiring any labeled utterances. We yield two categories of supervision, i.e., document-level supervision via a distant supervision strategy and contextualized word-level supervision via a label guessing method, by mining the personal attribute knowledge embedded in both unlabeled utterances and external resources to fine-tune the language model. Extensive experiments over two real-world data sets (i.e., a profession data set and a hobby data set) show our framework obtains the best performance compared with all the twelve baselines in terms of nDCG and MRR.

View on arXiv PDF

Similar