Changchang Zeng

CL
h-index7
3papers
123citations
Novelty17%
AI Score16

3 Papers

0.2CLSep 29, 2021
Analysing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets

Changchang. Zeng, Shaobo. Li

Machine reading comprehension (MRC) is a challenging natural language processing (NLP) task. Recently, the emergence of pre-trained models (PTM) has brought this research field into a new era, in which the training objective plays a key role. The masked language model (MLM) is a self-supervised training objective that widely used in various PTMs. With the development of training objectives, many variants of MLM have been proposed, such as whole word masking, entity masking, phrase masking, span masking, and so on. In different MLM, the length of the masked tokens is different. Similarly, in different machine reading comprehension tasks, the length of the answer is also different, and the answer is often a word, phrase, or sentence. Thus, in MRC tasks with different answer lengths, whether the length of MLM is related to performance is a question worth studying. If this hypothesis is true, it can guide us how to pre-train the MLM model with a relatively suitable mask length distribution for MRC task. In this paper, we try to uncover how much of MLM's success in the machine reading comprehension tasks comes from the correlation between masking length distribution and answer length in MRC dataset. In order to address this issue, herein, (1) we propose four MRC tasks with different answer length distributions, namely short span extraction task, long span extraction task, short multiple-choice cloze task, long multiple-choice cloze task; (2) four Chinese MRC datasets are created for these tasks; (3) we also have pre-trained four masked language models according to the answer length distributions of these datasets; (4) ablation experiments are conducted on the datasets to verify our hypothesis. The experimental results demonstrate that our hypothesis is true.

3.7HCSep 6, 2021
Big Data driven Product Design: A Survey

Huafeng Quan, Shaobo Li, Changchang Zeng et al.

With the improvement of living standards, user requirements of modern products are becoming increasingly more diversified and personalized. Traditional product design methods can no longer satisfy the market needs due to their strong subjectivity, small survey scope, poor real-time data, and lack of visual display, which calls for the development of big data driven product design methodology. Big data in the product lifecycle contains valuable information for guiding product design, such as customer preferences, market demands, product evaluation, and visual display: online product reviews reflect customer evaluations and requirements; product images contain information of shape,color, and texture which can inspire designers to get initial design schemes more quickly or even directly generate new product images. How to efficiently collect product design related data and exploit them effectively during the whole product design process is thus critical to modern product design. This paper aims to conduct a comprehensive survey on big data driven product design. It will help researchers and practitioners to comprehend the latest development of relevant studies and applications centered on how big data can be processed, analyzed, and exploited in aiding product design. We first introduce several representative traditional product design methods and highlight their limitations. Then we discuss current and potential applications of textual data, image data, audio data, and video data in product design cycles. Finally, major deficiencies of existing data driven product design studies and future research directions are summarized. We believe that this study can draw increasing attention to modern data driven product design.

4.6CLJun 21, 2020
A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics and Benchmark Datasets

Changchang Zeng, Shaobo Li, Qin Li et al.

Machine Reading Comprehension (MRC) is a challenging Natural Language Processing(NLP) research field with wide real-world applications. The great progress of this field in recent years is mainly due to the emergence of large-scale datasets and deep learning. At present, a lot of MRC models have already surpassed human performance on various benchmark datasets despite the obvious giant gap between existing MRC models and genuine human-level reading comprehension. This shows the need for improving existing datasets, evaluation metrics, and models to move current MRC models toward "real" understanding. To address the current lack of comprehensive survey of existing MRC tasks, evaluation metrics, and datasets, herein, (1) we analyze 57 MRC tasks and datasets and propose a more precise classification method of MRC tasks with 4 different attributes; (2) we summarized 9 evaluation metrics of MRC tasks, 7 attributes and 10 characteristics of MRC datasets; (3) We also discuss key open issues in MRC research and highlighted future research directions. In addition, we have collected, organized, and published our data on the companion website(https://mrc-datasets.github.io/) where MRC researchers could directly access each MRC dataset, papers, baseline projects, and the leaderboard.