Xianling Mao

CL
h-index14
16papers
545citations
Novelty49%
AI Score43

16 Papers

CLAug 23, 2022
Improving Personality Consistency in Conversation by Persona Extending

Yifan Liu, Wei Wei, Jiayi Liu et al. · microsoft-research

Endowing chatbots with a consistent personality plays a vital role for agents to deliver human-like interactions. However, existing personalized approaches commonly generate responses in light of static predefined personas depicted with textual description, which may severely restrict the interactivity of human and the chatbot, especially when the agent needs to answer the query excluded in the predefined personas, which is so-called out-of-predefined persona problem (named OOP for simplicity). To alleviate the problem, in this paper we propose a novel retrieval-to-prediction paradigm consisting of two subcomponents, namely, (1) Persona Retrieval Model (PRM), it retrieves a persona from a global collection based on a Natural Language Inference (NLI) model, the inferred persona is consistent with the predefined personas; and (2) Posterior-scored Transformer (PS-Transformer), it adopts a persona posterior distribution that further considers the actual personas used in the ground response, maximally mitigating the gap between training and inferring. Furthermore, we present a dataset called IT-ConvAI2 that first highlights the OOP problem in personalized dialogue. Extensive experiments on both IT-ConvAI2 and ConvAI2 demonstrate that our proposed model yields considerable improvements in both automatic metrics and human evaluations.

CLOct 22, 2023Code
MIRACLE: Towards Personalized Dialogue Generation with Latent-Space Multiple Personal Attribute Control

Zhenyi Lu, Wei Wei, Xiaoye Qu et al.

Personalized dialogue systems aim to endow the chatbot agent with more anthropomorphic traits for human-like interactions. Previous approaches have explored explicitly user profile modeling using text descriptions, implicit derivation of user embeddings, or utilizing handicraft prompts for ChatGPT-like models. However, textual personas are limited in describing multi-faceted attributes (\emph{e.g.}, \emph{language style, inner character nuances}), implicit embedding suffers from personality sparsity, and handicraft prompts lack fine-grained and stable controllability. Hence, these approaches may struggle with complex personalized dialogue generation tasks that require generating controllable responses with multiple personal attributes. To this end, we propose \textbf{\textsc{Miracle}}, a novel personalized dialogue generation method through \textbf{M}ult\textbf{I}ple Pe\textbf{R}sonal \textbf{A}ttributes \textbf{C}ontrol within \textbf{L}atent-Space \textbf{E}nergy-based Models. ttributes \textbf{C}ontrol within \textbf{L}atent-Space \textbf{E}nergy-based Models. Specifically, our approach first disentangles complex personality into multi-faceted attributes. Subsequently, we employ a conditional variational auto-encoder to align with the dense personalized responses within a latent joint attribute space. We have also tailored a dedicated energy function and customized the ordinary differential equations sampling method to offer flexible attribute composition and precise attribute control. Extensive experiments demonstrate that \textsc{Miracle} outperforms several strong baselines in terms of personality controllability and response generation quality. Our dataset and code are available at \url{https://github.com/LZY-the-boys/MIRACLE}

CLJan 4, 2024
Joint Multi-Facts Reasoning Network For Complex Temporal Question Answering Over Knowledge Graph

Rikui Huang, Wei Wei, Xiaoye Qu et al.

Temporal Knowledge Graph (TKG) is an extension of regular knowledge graph by attaching the time scope. Existing temporal knowledge graph question answering (TKGQA) models solely approach simple questions, owing to the prior assumption that each question only contains a single temporal fact with explicit/implicit temporal constraints. Hence, they perform poorly on questions which own multiple temporal facts. In this paper, we propose \textbf{\underline{J}}oint \textbf{\underline{M}}ulti \textbf{\underline{F}}acts \textbf{\underline{R}}easoning \textbf{\underline{N}}etwork (JMFRN), to jointly reasoning multiple temporal facts for accurately answering \emph{complex} temporal questions. Specifically, JMFRN first retrieves question-related temporal facts from TKG for each entity of the given complex question. For joint reasoning, we design two different attention (\ie entity-aware and time-aware) modules, which are suitable for universal settings, to aggregate entities and timestamps information of retrieved facts. Moreover, to filter incorrect type answers, we introduce an additional answer type discrimination task. Extensive experiments demonstrate our proposed method significantly outperforms the state-of-art on the well-known complex temporal question benchmark TimeQuestions.

34.5CVApr 26
Do Protective Perturbations Really Protect Portrait Privacy under Real-world Image Transformations?

Ruiqing Sun, Xingshan Yao, Zhijing Wu et al.

Proactive defense methods protect portrait images from unauthorized editing or talking face generation (TFG) by introducing pixel-level protective perturbations, and have already attracted increasing attention for privacy protection. In real-world scenarios, images inevitably undergo various transformations during cross-device display and dissemination--such as scale transformations and color compression--that directly alter pixel values. However, it remains unclear whether such pixel-level modifications affect the effectiveness of existing proactive defense methods that rely on pixel-level perturbations. To solve this problem, we conduct a systematic evaluation of representative proactive defenses under image transformation. The evaluated methods are selected to span different generation architectures such as diffusion and GAN-based models, as well as defense scopes covering both portrait and natural images, and are assessed using both qualitative and quantitative metrics for subjective and objective comparison. Experimental results indicate that defense methods based on pixel-level perturbations struggle to withstand common image transformations, posing a risk of defense failure in real-world applications. To further highlight this risk, we propose a simple yet effective purification framework by leveraging the vulnerabilities induced by real-world image transformations. Experimental results demonstrate that the proposed method can efficiently remove protective perturbations with low computational cost, highlighting previously overlooked risks to the research community.

CLJun 4, 2024
Personalized Topic Selection Model for Topic-Grounded Dialogue

Shixuan Fan, Wei Wei, Xiaofei Wen et al.

Recently, the topic-grounded dialogue (TGD) system has become increasingly popular as its powerful capability to actively guide users to accomplish specific tasks through topic-guided conversations. Most existing works utilize side information (\eg topics or personas) in isolation to enhance the topic selection ability. However, due to disregarding the noise within these auxiliary information sources and their mutual influence, current models tend to predict user-uninteresting and contextually irrelevant topics. To build user-engaging and coherent dialogue agent, we propose a \textbf{P}ersonalized topic s\textbf{E}lection model for \textbf{T}opic-grounded \textbf{D}ialogue, named \textbf{PETD}, which takes account of the interaction of side information to selectively aggregate such information for more accurately predicting subsequent topics. Specifically, we evaluate the correlation between global topics and personas and selectively incorporate the global topics aligned with user personas. Furthermore, we propose a contrastive learning based persona selector to filter out irrelevant personas under the constraint of lacking pertinent persona annotations. Throughout the selection and generation, diverse relevant side information is considered. Extensive experiments demonstrate that our proposed method can generate engaging and diverse responses, outperforming state-of-the-art baselines across various evaluation metrics.

IRFeb 23, 2022
Multi-view Intent Disentangle Graph Networks for Bundle Recommendation

Sen Zhao, Wei Wei, Ding Zou et al.

Bundle recommendation aims to recommend the user a bundle of items as a whole. Nevertheless, they usually neglect the diversity of the user's intents on adopting items and fail to disentangle the user's intents in representations. In the real scenario of bundle recommendation, a user's intent may be naturally distributed in the different bundles of that user (Global view), while a bundle may contain multiple intents of a user (Local view). Each view has its advantages for intent disentangling: 1) From the global view, more items are involved to present each intent, which can demonstrate the user's preference under each intent more clearly. 2) From the local view, it can reveal the association among items under each intent since items within the same bundle are highly correlated to each other. To this end, we propose a novel model named Multi-view Intent Disentangle Graph Networks (MIDGN), which is capable of precisely and comprehensively capturing the diversity of the user's intent and items' associations at the finer granularity. Specifically, MIDGN disentangles the user's intents from two different perspectives, respectively: 1) In the global level, MIDGN disentangles the user's intent coupled with inter-bundle items; 2) In the Local level, MIDGN disentangles the user's intent coupled with items within each bundle. Meanwhile, we compare the user's intents disentangled from different views under the contrast learning framework to improve the learned intents. Extensive experiments conducted on two benchmark datasets demonstrate that MIDGN outperforms the state-of-the-art methods by over 10.7% and 26.8%, respectively.

CLJun 6, 2021
Emotion-aware Chat Machine: Automatic Emotional Response Generation for Human-like Emotional Interaction

Wei Wei, Jiayi Liu, Xianling Mao et al.

The consistency of a response to a given post at semantic-level and emotional-level is essential for a dialogue system to deliver human-like interactions. However, this challenge is not well addressed in the literature, since most of the approaches neglect the emotional information conveyed by a post while generating responses. This article addresses this problem by proposing a unifed end-to-end neural architecture, which is capable of simultaneously encoding the semantics and the emotions in a post for generating more intelligent responses with appropriately expressed emotions. Extensive experiments on real-world data demonstrate that the proposed method outperforms the state-of-the-art methods in terms of both content coherence and emotion appropriateness.

CLJun 2, 2021
Exploiting Global Contextual Information for Document-level Named Entity Recognition

Zanbo Wang, Wei Wei, Xianling Mao et al.

Most existing named entity recognition (NER) approaches are based on sequence labeling models, which focus on capturing the local context dependencies. However, the way of taking one sentence as input prevents the modeling of non-sequential global context, which is useful especially when local context information is limited or ambiguous. To this end, we propose a model called Global Context enhanced Document-level NER (GCDoc) to leverage global contextual information from two levels, i.e., both word and sentence. At word-level, a document graph is constructed to model a wider range of dependencies between words, then obtain an enriched contextual representation for each word via graph neural networks (GNN). To avoid the interference of noise information, we further propose two strategies. First we apply the epistemic uncertainty theory to find out tokens whose representations are less reliable, thereby helping prune the document graph. Then a selective auxiliary classifier is proposed to effectively learn the weight of edges in document graph and reduce the importance of noisy neighbour nodes. At sentence-level, for appropriately modeling wider context beyond single sentence, we employ a cross-sentence module which encodes adjacent sentences and fuses it with the current sentence representation via attention and gating mechanisms. Extensive experiments on two benchmark NER datasets (CoNLL 2003 and Ontonotes 5.0 English dataset) demonstrate the effectiveness of our proposed model. Our model reaches F1 score of 92.22 (93.40 with BERT) on CoNLL 2003 dataset and 88.32 (90.49 with BERT) on Ontonotes 5.0 dataset, achieving new state-of-the-art performance.

CLNov 15, 2020
Target Guided Emotion Aware Chat Machine

Wei Wei, Jiayi Liu, Xianling Mao et al.

The consistency of a response to a given post at semantic-level and emotional-level is essential for a dialogue system to deliver human-like interactions. However, this challenge is not well addressed in the literature, since most of the approaches neglect the emotional information conveyed by a post while generating responses. This article addresses this problem by proposing a unifed end-to-end neural architecture, which is capable of simultaneously encoding the semantics and the emotions in a post and leverage target information for generating more intelligent responses with appropriately expressed emotions. Extensive experiments on real-world data demonstrate that the proposed method outperforms the state-of-the-art methods in terms of both content coherence and emotion appropriateness.

CLNov 13, 2020
A Survey on Recent Advances in Sequence Labeling from Deep Learning Models

Zhiyong He, Zanbo Wang, Wei Wei et al.

Sequence labeling (SL) is a fundamental research problem encompassing a variety of tasks, e.g., part-of-speech (POS) tagging, named entity recognition (NER), text chunking, etc. Though prevalent and effective in many downstream applications (e.g., information retrieval, question answering, and knowledge graph embedding), conventional sequence labeling approaches heavily rely on hand-crafted or language-specific features. Recently, deep learning has been employed for sequence labeling tasks due to its powerful capability in automatically learning complex features of instances and effectively yielding the stat-of-the-art performances. In this paper, we aim to present a comprehensive review of existing deep learning-based sequence labeling models, which consists of three related tasks, e.g., part-of-speech tagging, named entity recognition, and text chunking. Then, we systematically present the existing approaches base on a scientific taxonomy, as well as the widely-used experimental datasets and popularly-adopted evaluation metrics in the SL domain. Furthermore, we also present an in-depth analysis of different SL models on the factors that may affect the performance and future directions in the SL domain.

CLOct 21, 2020
STN4DST: A Scalable Dialogue State Tracking based on Slot Tagging Navigation

Puhai Yang, Heyan Huang, Xianling Mao

Scalability for handling unknown slot values is a important problem in dialogue state tracking (DST). As far as we know, previous scalable DST approaches generally rely on either the candidate generation from slot tagging output or the span extraction in dialogue context. However, the candidate generation based DST often suffers from error propagation due to its pipelined two-stage process; meanwhile span extraction based DST has the risk of generating invalid spans in the lack of semantic constraints between start and end position pointers. To tackle the above drawbacks, in this paper, we propose a novel scalable dialogue state tracking method based on slot tagging navigation, which implements an end-to-end single-step pointer to locate and extract slot value quickly and accurately by the joint learning of slot tagging and slot value position prediction in the dialogue context, especially for unknown slot values. Extensive experiments over several benchmark datasets show that the proposed model performs better than state-of-the-art baselines greatly.

CLApr 21, 2020
Learning Relation Ties with a Force-Directed Graph in Distant Supervised Relation Extraction

Yuming Shang, Heyan Huang, Xin Sun et al.

Relation ties, defined as the correlation and mutual exclusion between different relations, are critical for distant supervised relation extraction. Existing approaches model this property by greedily learning local dependencies. However, they are essentially limited by failing to capture the global topology structure of relation ties. As a result, they may easily fall into a locally optimal solution. To solve this problem, in this paper, we propose a novel force-directed graph based relation extraction model to comprehensively learn relation ties. Specifically, we first build a graph according to the global co-occurrence of relations. Then, we borrow the idea of Coulomb's Law from physics and introduce the concept of attractive force and repulsive force to this graph to learn correlation and mutual exclusion between relations. Finally, the obtained relation representations are applied as an inter-dependent relation classifier. Experimental results on a large scale benchmark dataset demonstrate that our model is capable of modeling global relation ties and significantly outperforms other baselines. Furthermore, the proposed force-directed graph can be used as a module to augment existing relation extraction systems and improve their performance.

CLDec 20, 2019
When to Talk: Chatbot Controls the Timing of Talking during Multi-turn Open-domain Dialogue Generation

Tian Lan, Xianling Mao, Heyan Huang et al.

Despite the multi-turn open-domain dialogue systems have attracted more and more attention and made great progress, the existing dialogue systems are still very boring. Nearly all the existing dialogue models only provide a response when the user's utterance is accepted. But during daily conversations, humans always decide whether to continue to utter an utterance based on the context. Intuitively, a dialogue model that can control the timing of talking autonomously based on the conversation context can chat with humans more naturally. In this paper, we explore the dialogue system that automatically controls the timing of talking during the conversation. Specifically, we adopt the decision module for the existing dialogue models. Furthermore, modeling conversation context effectively is very important for controlling the timing of talking. So we also adopt the graph neural networks to process the context with the natural graph structure. Extensive experiments on two benchmarks show that controlling the timing of talking can effectively improve the quality of dialogue generation, and the proposed methods significantly improve the accuracy of the timing of talking. In addition, we have publicly released the codes of our proposed model.

CLSep 17, 2019
Generative Dialog Policy for Task-oriented Dialog Systems

Tian Lan, Xianling Mao, Heyan Huang

There is an increasing demand for task-oriented dialogue systems which can assist users in various activities such as booking tickets and restaurant reservations. In order to complete dialogues effectively, dialogue policy plays a key role in task-oriented dialogue systems. As far as we know, the existing task-oriented dialogue systems obtain the dialogue policy through classification, which can assign either a dialogue act and its corresponding parameters or multiple dialogue acts without their corresponding parameters for a dialogue action. In fact, a good dialogue policy should construct multiple dialogue acts and their corresponding parameters at the same time. However, it's hard for existing classification-based methods to achieve this goal. Thus, to address the issue above, we propose a novel generative dialogue policy learning method. Specifically, the proposed method uses attention mechanism to find relevant segments of given dialogue context and input utterance and then constructs the dialogue policy by a seq2seq way for task-oriented dialogue systems. Extensive experiments on two benchmark datasets show that the proposed model significantly outperforms the state-of-the-art baselines. In addition, we have publicly released our codes.

CVSep 5, 2019
Stack-VS: Stacked Visual-Semantic Attention for Image Caption Generation

Wei Wei, Ling Cheng, Xianling Mao et al.

Recently, automatic image caption generation has been an important focus of the work on multimodal translation task. Existing approaches can be roughly categorized into two classes, i.e., top-down and bottom-up, the former transfers the image information (called as visual-level feature) directly into a caption, and the later uses the extracted words (called as semanticlevel attribute) to generate a description. However, previous methods either are typically based one-stage decoder or partially utilize part of visual-level or semantic-level information for image caption generation. In this paper, we address the problem and propose an innovative multi-stage architecture (called as Stack-VS) for rich fine-gained image caption generation, via combining bottom-up and top-down attention models to effectively handle both visual-level and semantic-level information of an input image. Specifically, we also propose a novel well-designed stack decoder model, which is constituted by a sequence of decoder cells, each of which contains two LSTM-layers work interactively to re-optimize attention weights on both visual-level feature vectors and semantic-level attribute embeddings for generating a fine-gained image caption. Extensive experiments on the popular benchmark dataset MSCOCO show the significant improvements on different evaluation metrics, i.e., the improvements on BLEU-4/CIDEr/SPICE scores are 0.372, 1.226 and 0.216, respectively, as compared to the state-of-the-arts.

CLAug 24, 2019
Position-Aware Self-Attention based Neural Sequence Labeling

Wei Wei, Zanbo Wang, Xianling Mao et al.

Sequence labeling is a fundamental task in natural language processing and has been widely studied. Recently, RNN-based sequence labeling models have increasingly gained attentions. Despite superior performance achieved by learning the long short-term (i.e., successive) dependencies, the way of sequentially processing inputs might limit the ability to capture the non-continuous relations over tokens within a sentence. To tackle the problem, we focus on how to effectively model successive and discrete dependencies of each token for enhancing the sequence labeling performance. Specifically, we propose an innovative attention-based model (called position-aware selfattention, i.e., PSA) as well as a well-designed self-attentional context fusion layer within a neural network architecture, to explore the positional information of an input sequence for capturing the latent relations among tokens. Extensive experiments on three classical tasks in sequence labeling domain, i.e., partof-speech (POS) tagging, named entity recognition (NER) and phrase chunking, demonstrate our proposed model outperforms the state-of-the-arts without any external knowledge, in terms of various metrics.