Zibo Yi

h-index6

3papers

71citations

Novelty53%

AI Score25

Ranked #165,037 of 194,257 authors (top 85%)#28,064 in CL (top 91%)

3 Papers

0.3CLJul 11, 2022

SummScore: A Comprehensive Evaluation Metric for Summary Quality Based on Cross-Encoder

Wuhang Lin, Shasha Li, Chen Zhang et al.

Text summarization models are often trained to produce summaries that meet human quality requirements. However, the existing evaluation metrics for summary text are only rough proxies for summary quality, suffering from low correlation with human scoring and inhibition of summary diversity. To solve these problems, we propose SummScore, a comprehensive metric for summary quality evaluation based on CrossEncoder. Firstly, by adopting the original-summary measurement mode and comparing the semantics of the original text, SummScore gets rid of the inhibition of summary diversity. With the help of the text-matching pre-training Cross-Encoder, SummScore can effectively capture the subtle differences between the semantics of summaries. Secondly, to improve the comprehensiveness and interpretability, SummScore consists of four fine-grained submodels, which measure Coherence, Consistency, Fluency, and Relevance separately. We use semi-supervised multi-rounds of training to improve the performance of our model on extremely limited annotated data. Extensive experiments show that SummScore significantly outperforms existing evaluation metrics in the above four dimensions in correlation with human scoring. We also provide the quality evaluation results of SummScore on 16 mainstream summarization models for later research.

2.0IRJul 11, 2022

Topic-Grained Text Representation-based Model for Document Retrieval

Mengxue Du, Shasha Li, Jie Yu et al.

Document retrieval enables users to find their required documents accurately and quickly. To satisfy the requirement of retrieval efficiency, prevalent deep neural methods adopt a representation-based matching paradigm, which saves online matching time by pre-storing document representations offline. However, the above paradigm consumes vast local storage space, especially when storing the document as word-grained representations. To tackle this, we present TGTR, a Topic-Grained Text Representation-based Model for document retrieval. Following the representation-based matching paradigm, TGTR stores the document representations offline to ensure retrieval efficiency, whereas it significantly reduces the storage requirements by using novel topicgrained representations rather than traditional word-grained. Experimental results demonstrate that compared to word-grained baselines, TGTR is consistently competitive with them on TREC CAR and MS MARCO in terms of retrieval accuracy, but it requires less than 1/10 of the storage space required by them. Moreover, TGTR overwhelmingly surpasses global-grained baselines in terms of retrieval accuracy.

1.8CLMay 9, 2017

Drug-drug Interaction Extraction via Recurrent Neural Network with Multiple Attention Layers

Zibo Yi, Shasha Li, Jie Yu et al.

Drug-drug interaction (DDI) is a vital information when physicians and pharmacists intend to co-administer two or more drugs. Thus, several DDI databases are constructed to avoid mistakenly combined use. In recent years, automatically extracting DDIs from biomedical text has drawn researchers' attention. However, the existing work utilize either complex feature engineering or NLP tools, both of which are insufficient for sentence comprehension. Inspired by the deep learning approaches in natural language processing, we propose a recur- rent neural network model with multiple attention layers for DDI classification. We evaluate our model on 2013 SemEval DDIExtraction dataset. The experiments show that our model classifies most of the drug pairs into correct DDI categories, which outperforms the existing NLP or deep learning methods.