28.6CVMay 21
Cambrian-P: Pose-Grounded Video UnderstandingJihan Yang, Zifan Zhao, Xichen Pan et al.
Camera pose matters. The position and orientation of each viewpoint define a shared spatial coordinate frame that relates observations across video frames. Yet this signal is largely absent from multimodal LLMs (MLLMs) for video understanding, which process frames as isolated 2D snapshots, instead of the persistent scene humans perceive. We revisit pose as a lightweight supervisory signal and introduce Cambrian-P, a video MLLM augmented with per-frame learnable camera tokens and a pose regression head. With a carefully designed sampling scheme, the model achieves substantial gains of 4.5-6.5% on spatial reasoning benchmarks such as VSI-Bench, generalizes across eight additional spatial and general video QA benchmarks, and, as a byproduct, achieves state of the art streaming pose estimation on ScanNet. Surprisingly, training on pseudo-annotated poses from in-the-wild video further improves general video QA benchmarks, showing pose helps beyond spatial reasoning. Together, these results position camera pose as a fundamental signal for video models that reason about the physical world.
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text UnderstandingHu Xu, Gargi Ghosh, Po-Yao Huang et al.
We present VideoCLIP, a contrastive approach to pre-train a unified model for zero-shot video and text understanding, without using any labels on downstream tasks. VideoCLIP trains a transformer for video and text by contrasting temporally overlapping positive video-text pairs with hard negatives from nearest neighbor retrieval. Our experiments on a diverse series of downstream tasks, including sequence-level text-video retrieval, VideoQA, token-level action localization, and action segmentation reveal state-of-the-art performance, surpassing prior work, and in some cases even outperforming supervised approaches. Code is made available at https://github.com/pytorch/fairseq/tree/main/examples/MMPT.
VLM: Task-agnostic Video-Language Model Pre-training for Video UnderstandingHu Xu, Gargi Ghosh, Po-Yao Huang et al.
We present a simplified, task-agnostic multi-modal pre-training approach that can accept either video or text input, or both for a variety of end tasks. Existing pre-training are task-specific by adopting either a single cross-modal encoder that requires both modalities, limiting their use for retrieval-style end tasks or more complex multitask learning with two unimodal encoders, limiting early cross-modal fusion. We instead introduce new pretraining masking schemes that better mix across modalities (e.g. by forcing masks for text to predict the closest video embeddings) while also maintaining separability (e.g. unimodal predictions are sometimes required, without using all the input). Experimental results show strong performance across a wider range of tasks than any previous methods, often outperforming task-specific pre-training. Code is made available at https://github.com/pytorch/fairseq/tree/main/examples/MMPT.
Understanding Pre-trained BERT for Aspect-based Sentiment AnalysisHu Xu, Lei Shu, Philip S. Yu et al.
This paper analyzes the pre-trained hidden representations learned from reviews on BERT for tasks in aspect-based sentiment analysis (ABSA). Our work is motivated by the recent progress in BERT-based language models for ABSA. However, it is not clear how the general proxy task of (masked) language model trained on unlabeled corpus without annotations of aspects or opinions can provide important features for downstream tasks in ABSA. By leveraging the annotated datasets in ABSA, we investigate both the attentions and the learned representations of BERT pre-trained on reviews. We found that BERT uses very few self-attention heads to encode context words (such as prepositions or pronouns that indicating an aspect) and opinion words for an aspect. Most features in the representation of an aspect are dedicated to the fine-grained semantics of the domain (or product category) and the aspect itself, instead of carrying summarized opinions from its context. We hope this investigation can help future research in improving self-supervised learning, unsupervised learning and fine-tuning for ABSA. The pre-trained model and code can be found at https://github.com/howardhsu/BERT-for-RRC-ABSA.
NUANCED: Natural Utterance Annotation for Nuanced Conversation with Estimated DistributionsZhiyu Chen, Honglei Liu, Hu Xu et al.
Existing conversational systems are mostly agent-centric, which assumes the user utterances would closely follow the system ontology (for NLU or dialogue state tracking). However, in real-world scenarios, it is highly desirable that the users can speak freely in their own way. It is extremely hard, if not impossible, for the users to adapt to the unknown system ontology. In this work, we attempt to build a user-centric dialogue system. As there is no clean mapping for a user's free form utterance to an ontology, we first model the user preferences as estimated distributions over the system ontology and map the users' utterances to such distributions. Learning such a mapping poses new challenges on reasoning over existing knowledge, ranging from factoid knowledge, commonsense knowledge to the users' own situations. To this end, we build a new dataset named NUANCED that focuses on such realistic settings for conversational recommendation. Collected via dialogue simulation and paraphrasing, NUANCED contains 5.1k dialogues, 26k turns of high-quality user responses. We conduct experiments, showing both the usefulness and challenges of our problem setting. We believe NUANCED can serve as a valuable resource to push existing research from the agent-centric system to the user-centric system. The code and data is publicly available at \url{https://github.com/facebookresearch/nuanced}.
A Failure of Aspect Sentiment Classifiers and an Adaptive Re-weighting SolutionHu Xu, Bing Liu, Lei Shu et al.
Aspect-based sentiment classification (ASC) is an important task in fine-grained sentiment analysis.~Deep supervised ASC approaches typically model this task as a pair-wise classification task that takes an aspect and a sentence containing the aspect and outputs the polarity of the aspect in that sentence. However, we discovered that many existing approaches fail to learn an effective ASC classifier but more like a sentence-level sentiment classifier because they have difficulty to handle sentences with different polarities for different aspects.~This paper first demonstrates this problem using several state-of-the-art ASC models. It then proposes a novel and general adaptive re-weighting (ARW) scheme to adjust the training to dramatically improve ASC for such complex sentences. Experimental results show that the proposed framework is effective \footnote{The dataset and code are available at \url{https://github.com/howardhsu/ASC_failure}.}.
30.1CLAug 30, 2019Code
Modeling Multi-Action Policy for Task-Oriented DialoguesLei Shu, Hu Xu, Bing Liu et al.
Dialogue management (DM) plays a key role in the quality of the interaction with the user in a task-oriented dialogue system. In most existing approaches, the agent predicts only one DM policy action per turn. This significantly limits the expressive power of the conversational agent and introduces unwanted turns of interactions that may challenge users' patience. Longer conversations also lead to more errors and the system needs to be more robust to handle them. In this paper, we compare the performance of several models on the task of predicting multiple acts for each turn. A novel policy model is proposed based on a recurrent cell called gated Continue-Act-Slots (gCAS) that overcomes the limitations of the existing models. Experimental results show that gCAS outperforms other approaches. The code is available at https://leishu02.github.io/
Flexibly-Structured Model for Task-Oriented DialoguesLei Shu, Piero Molino, Mahdi Namazifar et al.
This paper proposes a novel end-to-end architecture for task-oriented dialogue systems. It is based on a simple and practical yet very effective sequence-to-sequence approach, where language understanding and state tracking tasks are modeled jointly with a structured copy-augmented sequential decoder and a multi-label decoder for each slot. The policy engine and language generation tasks are modeled jointly following that. The copy-augmented sequential decoder deals with new or unknown values in the conversation, while the multi-label decoder combined with the sequential decoder ensures the explicit assignment of values to slots. On the generation part, slot binary classifiers are used to improve performance. This architecture is scalable to real-world scenarios and is shown through an empirical evaluation to achieve state-of-the-art performance on both the Cambridge Restaurant dataset and the Stanford in-car assistant dataset\footnote{The code is available at \url{https://github.com/uber-research/FSDM}}
Review Conversational Reading ComprehensionHu Xu, Bing Liu, Lei Shu et al.
Inspired by conversational reading comprehension (CRC), this paper studies a novel task of leveraging reviews as a source to build an agent that can answer multi-turn questions from potential consumers of online businesses. We first build a review CRC dataset and then propose a novel task-aware pre-tuning step running between language model (e.g., BERT) pre-training and domain-specific fine-tuning. The proposed pre-tuning requires no data annotation, but can greatly enhance the performance on our end task. Experimental results show that the proposed approach is highly effective and has competitive performance as the supervised approach. The dataset is available at \url{https://github.com/howardhsu/RCRC}
24.7CLMar 12, 2024
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLMSainbayar Sukhbaatar, Olga Golovneva, Vasu Sharma et al. · meta-ai, mit
We investigate efficient methods for training Large Language Models (LLMs) to possess capabilities in multiple specialized domains, such as coding, math reasoning and world knowledge. Our method, named Branch-Train-MiX (BTX), starts from a seed model, which is branched to train experts in embarrassingly parallel fashion with high throughput and reduced communication cost. After individual experts are asynchronously trained, BTX brings together their feedforward parameters as experts in Mixture-of-Expert (MoE) layers and averages the remaining parameters, followed by an MoE-finetuning stage to learn token-level routing. BTX generalizes two special cases, the Branch-Train-Merge method, which does not have the MoE finetuning stage to learn routing, and sparse upcycling, which omits the stage of training experts asynchronously. Compared to alternative approaches, BTX achieves the best accuracy-efficiency tradeoff.
Achieving Forgetting Prevention and Knowledge Transfer in Continual LearningZixuan Ke, Bing Liu, Nianzu Ma et al.
Continual learning (CL) learns a sequence of tasks incrementally with the goal of achieving two main objectives: overcoming catastrophic forgetting (CF) and encouraging knowledge transfer (KT) across tasks. However, most existing techniques focus only on overcoming CF and have no mechanism to encourage KT, and thus do not do well in KT. Although several papers have tried to deal with both CF and KT, our experiments show that they suffer from serious CF when the tasks do not have much shared knowledge. Another observation is that most current CL methods do not use pre-trained models, but it has been shown that such models can significantly improve the end task performance. For example, in natural language processing, fine-tuning a BERT-like pre-trained language model is one of the most effective approaches. However, for CL, this approach suffers from serious CF. An interesting question is how to make the best use of pre-trained models for CL. This paper proposes a novel model called CTR to solve these problems. Our experimental results demonstrate the effectiveness of CTR
9.3CLJul 14, 2021
HTLM: Hyper-Text Pre-Training and Prompting of Language ModelsArmen Aghajanyan, Dmytro Okhonko, Mike Lewis et al.
We introduce HTLM, a hyper-text language model trained on a large-scale web crawl. Modeling hyper-text has a number of advantages: (1) it is easily gathered at scale, (2) it provides rich document-level and end-task-adjacent supervision (e.g. class and id attributes often encode document category information), and (3) it allows for new structured prompting that follows the established semantics of HTML (e.g. to do zero-shot summarization by infilling title tags for a webpage that contains the input text). We show that pretraining with a BART-style denoising loss directly on simplified HTML provides highly effective transfer for a wide range of end tasks and supervision levels. HTLM matches or exceeds the performance of comparably sized text-only LMs for zero-shot prompting and fine-tuning for classification benchmarks, while also setting new state-of-the-art performance levels for zero-shot summarization. We also find that hyper-text prompts provide more value to HTLM, in terms of data efficiency, than plain text prompts do for existing LMs, and that HTLM is highly effective at auto-prompting itself, by simply generating the most likely hyper-text formatting for any available training data. We will release all code and models to support future HTLM research.
5.5LGMar 27, 2021
Co-Imitation Learning without Expert DemonstrationKun-Peng Ning, Hu Xu, Kun Zhu et al.
Imitation learning is a primary approach to improve the efficiency of reinforcement learning by exploiting the expert demonstrations. However, in many real scenarios, obtaining expert demonstrations could be extremely expensive or even impossible. To overcome this challenge, in this paper, we propose a novel learning framework called Co-Imitation Learning (CoIL) to exploit the past good experiences of the agents themselves without expert demonstration. Specifically, we train two different agents via letting each of them alternately explore the environment and exploit the peer agent's experience. While the experiences could be valuable or misleading, we propose to estimate the potential utility of each piece of experience with the expected gain of the value function. Thus the agents can selectively imitate from each other by emphasizing the more useful experiences while filtering out noisy ones. Experimental results on various tasks show significant superiority of the proposed Co-Imitation Learning framework, validating that the agents can benefit from each other without external supervision.
31.1CLSep 25, 2020
Controllable Text Generation with Focused VariationLei Shu, Alexandros Papangelis, Yi-Chia Wang et al.
This work introduces Focused-Variation Network (FVN), a novel model to control language generation. The main problems in previous controlled language generation models range from the difficulty of generating text according to the given attributes, to the lack of diversity of the generated texts. FVN addresses these issues by learning disjoint discrete latent spaces for each attribute inside codebooks, which allows for both controllability and diversity, while at the same time generating fluent text. We evaluate FVN on two text generation datasets with annotated content and style, and show state-of-the-art performance as assessed by automatic and human evaluations.
0.3CLSep 14, 2020
EasyASR: A Distributed Machine Learning Platform for End-to-end Automatic Speech RecognitionChengyu Wang, Mengli Cheng, Xu Hu et al.
We present EasyASR, a distributed machine learning platform for training and serving large-scale Automatic Speech Recognition (ASR) models, as well as collecting and processing audio data at scale. Our platform is built upon the Machine Learning Platform for AI of Alibaba Cloud. Its main functionality is to support efficient learning and inference for end-to-end ASR models on distributed GPU clusters. It allows users to learn ASR models with either pre-defined or user-customized network architectures via simple user interface. On EasyASR, we have produced state-of-the-art results over several public datasets for Mandarin speech recognition.
3.3ASAug 4, 2020
Weakly Supervised Construction of ASR Systems with Massive Video DataMengli Cheng, Chengyu Wang, Xu Hu et al.
Building Automatic Speech Recognition (ASR) systems from scratch is significantly challenging, mostly due to the time-consuming and financially-expensive process of annotating a large amount of audio data with transcripts. Although several unsupervised pre-training models have been proposed, applying such models directly might still be sub-optimal if more labeled, training data could be obtained without a large cost. In this paper, we present a weakly supervised framework for constructing ASR systems with massive video data. As videos often contain human-speech audios aligned with subtitles, we consider videos as an important knowledge source, and propose an effective approach to extract high-quality audios aligned with transcripts from videos based on Optical Character Recognition (OCR). The underlying ASR model can be fine-tuned to fit any domain-specific target training datasets after weakly supervised pre-training. Extensive experiments show that our framework can easily produce state-of-the-art results on six public datasets for Mandarin speech recognition.
31.1CLMay 30, 2020
User Memory Reasoning for Conversational RecommendationHu Xu, Seungwhan Moon, Honglei Liu et al.
We study a conversational recommendation model which dynamically manages users' past (offline) preferences and current (online) requests through a structured and cumulative user memory knowledge graph, to allow for natural interactions and accurate recommendations. For this study, we create a new Memory Graph (MG) <--> Conversational Recommendation parallel corpus called MGConvRex with 7K+ human-to-human role-playing dialogs, grounded on a large-scale user memory bootstrapped from real-world user scenarios. MGConvRex captures human-level reasoning over user memory and has disjoint training/testing sets of users for zero-shot (cold-start) reasoning for recommendation. We propose a simple yet expandable formulation for constructing and updating the MG, and a reasoning model that predicts optimal dialog policies and recommendation items in unconstrained graph space. The prediction of our proposed model inherits the graph structure, providing a natural way to explain the model's recommendation. Experiments are conducted for both offline metrics and online simulation, showing competitive results.
31.1CLApr 28, 2020
DomBERT: Domain-oriented Language Model for Aspect-based Sentiment AnalysisHu Xu, Bing Liu, Lei Shu et al.
This paper focuses on learning domain-oriented language models driven by end tasks, which aims to combine the worlds of both general-purpose language models (such as ELMo and BERT) and domain-specific language understanding. We propose DomBERT, an extension of BERT to learn from both in-domain corpus and relevant domain corpora. This helps in learning domain language models with low-resources. Experiments are conducted on an assortment of tasks in aspect-based sentiment analysis, demonstrating promising results.
0.6CLMay 15, 2019
Controlled CNN-based Sequence Labeling for Aspect ExtractionLei Shu, Hu Xu, Bing Liu
One key task of fine-grained sentiment analysis on reviews is to extract aspects or features that users have expressed opinions on. This paper focuses on supervised aspect extraction using a modified CNN called controlled CNN (Ctrl). The modified CNN has two types of control modules. Through asynchronous parameter updating, it prevents over-fitting and boosts CNN's performance significantly. This model achieves state-of-the-art results on standard aspect extraction datasets. To the best of our knowledge, this is the first paper to apply control modules to aspect extraction.
32.9CLApr 3, 2019
BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment AnalysisHu Xu, Bing Liu, Lei Shu et al.
Question-answering plays an important role in e-commerce as it allows potential customers to actively seek crucial information about products or services to help their purchase decision making. Inspired by the recent success of machine reading comprehension (MRC) on formal documents, this paper explores the potential of turning customer reviews into a large source of knowledge that can be exploited to answer user questions.~We call this problem Review Reading Comprehension (RRC). To the best of our knowledge, no existing work has been done on RRC. In this work, we first build an RRC dataset called ReviewRC based on a popular benchmark for aspect-based sentiment analysis. Since ReviewRC has limited training examples for RRC (and also for aspect-based sentiment analysis), we then explore a novel post-training approach on the popular language model BERT to enhance the performance of fine-tuning of BERT for RRC. To show the generality of the approach, the proposed post-training is also applied to some other review-based tasks such as aspect extraction and aspect sentiment classification in aspect-based sentiment analysis. Experimental results demonstrate that the proposed post-training is highly effective. The datasets and code are available at https://www.cs.uic.edu/~hxu/.
4.7CLMay 25, 2018
Lifelong Domain Word Embedding via Meta-LearningHu Xu, Bing Liu, Lei Shu et al.
Learning high-quality domain word embeddings is important for achieving good performance in many NLP tasks. General-purpose embeddings trained on large-scale corpora are often sub-optimal for domain-specific applications. However, domain-specific tasks often do not have large in-domain corpora for training high-quality domain embeddings. In this paper, we propose a novel lifelong learning setting for domain embedding. That is, when performing the new domain embedding, the system has seen many past domains, and it tries to expand the new in-domain corpus by exploiting the corpora from the past domains via meta-learning. The proposed meta-learner characterizes the similarities of the contexts of the same word in many domain corpora, which helps retrieve relevant data from the past domains to expand the new domain corpus. Experimental results show that domain embeddings produced from such a process improve the performance of the downstream tasks.
32.7CLMay 11, 2018
Double Embeddings and CNN-based Sequence Labeling for Aspect ExtractionHu Xu, Bing Liu, Lei Shu et al.
One key task of fine-grained sentiment analysis of product reviews is to extract product aspects or features that users have expressed opinions on. This paper focuses on supervised aspect extraction using deep learning. Unlike other highly sophisticated supervised deep learning models, this paper proposes a novel and yet simple CNN model employing two types of pre-trained embeddings for aspect extraction: general-purpose embeddings and domain-specific embeddings. Without using any additional supervision, this model achieves surprisingly good results, outperforming state-of-the-art sophisticated existing methods. To our knowledge, this paper is the first to report such double embeddings based CNN model for aspect extraction and achieve very good results.
16.3LGJan 17, 2018
Unseen Class Discovery in Open-world ClassificationLei Shu, Hu Xu, Bing Liu
This paper concerns open-world classification, where the classifier not only needs to classify test examples into seen classes that have appeared in training but also reject examples from unseen or novel classes that have not appeared in training. Specifically, this paper focuses on discovering the hidden unseen classes of the rejected examples. Clearly, without prior knowledge this is difficult. However, we do have the data from the seen training classes, which can tell us what kind of similarity/difference is expected for examples from the same class or from different classes. It is reasonable to assume that this knowledge can be transferred to the rejected examples and used to discover the hidden unseen classes in them. This paper aims to solve this problem. It first proposes a joint open classification model with a sub-model for classifying whether a pair of examples belongs to the same or different classes. This sub-model can serve as a distance function for clustering to discover the hidden classes of the rejected examples. Experimental results show that the proposed model is highly promising.
0.3CLDec 6, 2017
Product Function Need Recognition via Semi-supervised Attention NetworkHu Xu, Sihong Xie, Lei Shu et al.
Functionality is of utmost importance to customers when they purchase products. However, it is unclear to customers whether a product can really satisfy their needs on functions. Further, missing functions may be intentionally hidden by the manufacturers or the sellers. As a result, a customer needs to spend a fair amount of time before purchasing or just purchase the product on his/her own risk. In this paper, we first identify a novel QA corpus that is dense on product functionality information \footnote{The annotated corpus can be found at \url{https://www.cs.uic.edu/~hxu/}.}. We then design a neural network called Semi-supervised Attention Network (SAN) to discover product functions from questions. This model leverages unlabeled data as contextual information to perform semi-supervised sequence labeling. We conduct experiments to show that the extracted function have both high coverage and accuracy, compared with a wide spectrum of baselines.
3.2CLDec 6, 2017
Dual Attention Network for Product Compatibility and Function Satisfiability AnalysisHu Xu, Sihong Xie, Lei Shu et al.
Product compatibility and their functionality are of utmost importance to customers when they purchase products, and to sellers and manufacturers when they sell products. Due to the huge number of products available online, it is infeasible to enumerate and test the compatibility and functionality of every product. In this paper, we address two closely related problems: product compatibility analysis and function satisfiability analysis, where the second problem is a generalization of the first problem (e.g., whether a product works with another product can be considered as a special function). We first identify a novel question and answering corpus that is up-to-date regarding product compatibility and functionality information. To allow automatic discovery product compatibility and functionality, we then propose a deep learning model called Dual Attention Network (DAN). Given a QA pair for a to-be-purchased product, DAN learns to 1) discover complementary products (or functions), and 2) accurately predict the actual compatibility (or satisfiability) of the discovered products (or functions). The challenges addressed by the model include the briefness of QAs, linguistic patterns indicating compatibility, and the appropriate fusion of questions and answers. We conduct experiments to quantitatively and qualitatively show that the identified products and functions have both high coverage and accuracy, compared with a wide spectrum of baselines.
40.6CLSep 25, 2017
DOC: Deep Open Classification of Text DocumentsLei Shu, Hu Xu, Bing Liu
Traditional supervised learning makes the closed-world assumption that the classes appeared in the test data must have appeared in training. This also applies to text learning or text classification. As learning is used increasingly in dynamic open environments where some new/test documents may not belong to any of the training classes, identifying these novel documents during classification presents an important problem. This problem is called open-world classification or open classification. This paper proposes a novel deep learning based approach. It outperforms existing state-of-the-art techniques dramatically.
1.0CLMay 29, 2017
Supervised Complementary Entity Recognition with Augmented Key-value Pairs of KnowledgeHu Xu, Lei Shu, Philip S. Yu
Extracting opinion targets is an important task in sentiment analysis on product reviews and complementary entities (products) are one important type of opinion targets that may work together with the reviewed product. In this paper, we address the problem of Complementary Entity Recognition (CER) as a supervised sequence labeling with the capability of expanding domain knowledge as key-value pairs from unlabeled reviews, by automatically learning and enhancing knowledge-based features. We use Conditional Random Field (CRF) as the base learner and augment CRF with knowledge-based features (called the Knowledge-based CRF or KCRF for short). We conduct experiments to show that KCRF effectively improves the performance of supervised CER task.
9.5CLApr 29, 2017
Lifelong Learning CRF for Supervised Aspect ExtractionLei Shu, Hu Xu, Bing Liu
This paper makes a focused contribution to supervised aspect extraction. It shows that if the system has performed aspect extraction from many past domains and retained their results as knowledge, Conditional Random Fields (CRF) can leverage this knowledge in a lifelong learning manner to extract in a new domain markedly better than the traditional CRF without using this prior knowledge. The key innovation is that even after CRF training, the model can still improve its extraction with experiences in its applications.
2.3CLDec 23, 2016
Supervised Opinion Aspect Extraction by Exploiting Past Extraction ResultsLei Shu, Bing Liu, Hu Xu et al.
One of the key tasks of sentiment analysis of product reviews is to extract product aspects or features that users have expressed opinions on. In this work, we focus on using supervised sequence labeling as the base approach to performing the task. Although several extraction methods using sequence labeling methods such as Conditional Random Fields (CRF) and Hidden Markov Models (HMM) have been proposed, we show that this supervised approach can be significantly improved by exploiting the idea of concept sharing across multiple domains. For example, "screen" is an aspect in iPhone, but not only iPhone has a screen, many electronic devices have screens too. When "screen" appears in a review of a new domain (or product), it is likely to be an aspect too. Knowing this information enables us to do much better extraction in the new domain. This paper proposes a novel extraction method exploiting this idea in the context of supervised sequence labeling. Experimental results show that it produces markedly better results than without using the past information.
3.6CLDec 14, 2016
Mining Compatible/Incompatible Entities from Question and Answering via Yes/No Answer Classification using Distant Label ExpansionHu Xu, Lei Shu, Jingyuan Zhang et al.
Product Community Question Answering (PCQA) provides useful information about products and their features (aspects) that may not be well addressed by product descriptions and reviews. We observe that a product's compatibility issues with other products are frequently discussed in PCQA and such issues are more frequently addressed in accessories, i.e., via a yes/no question "Does this mouse work with windows 10?". In this paper, we address the problem of extracting compatible and incompatible products from yes/no questions in PCQA. This problem can naturally have a two-stage framework: first, we perform Complementary Entity (product) Recognition (CER) on yes/no questions; second, we identify the polarities of yes/no answers to assign the complementary entities a compatibility label (compatible, incompatible or unknown). We leverage an existing unsupervised method for the first stage and a 3-class classifier by combining a distant PU-learning method (learning from positive and unlabeled examples) together with a binary classifier for the second stage. The benefit of using distant PU-learning is that it can help to expand more implicit yes/no answers without using any human annotated data. We conduct experiments on 4 products to show that the proposed method is effective.
1.6CLDec 4, 2016
CER: Complementary Entity Recognition via Knowledge Expansion on Large Unlabeled Product ReviewsHu Xu, Sihong Xie, Lei Shu et al.
Product reviews contain a lot of useful information about product features and customer opinions. One important product feature is the complementary entity (products) that may potentially work together with the reviewed product. Knowing complementary entities of the reviewed product is very important because customers want to buy compatible products and avoid incompatible ones. In this paper, we address the problem of Complementary Entity Recognition (CER). Since no existing method can solve this problem, we first propose a novel unsupervised method to utilize syntactic dependency paths to recognize complementary entities. Then we expand category-level domain knowledge about complementary entities using only a few general seed verbs on a large amount of unlabeled reviews. The domain knowledge helps the unsupervised method to adapt to different products and greatly improves the precision of the CER task. The advantage of the proposed method is that it does not require any labeled data for training. We conducted experiments on 7 popular products with about 1200 reviews in total to demonstrate that the proposed approach is effective.