Samuel Yang

1.4CLJul 30, 2022

Masked Autoencoders As The Unified Learners For Pre-Trained Sentence Representation

Alexander Liu, Samuel Yang

Despite the progresses on pre-trained language models, there is a lack of unified frameworks for pre-trained sentence representation. As such, it calls for different pre-training methods for specific scenarios, and the pre-trained models are likely to be limited by their universality and representation quality. In this work, we extend the recently proposed MAE style pre-training strategy, RetroMAE, such that it may effectively support a wide variety of sentence representation tasks. The extended framework consists of two stages, with RetroMAE conducted throughout the process. The first stage performs RetroMAE over generic corpora, like Wikipedia, BookCorpus, etc., from which the base model is learned. The second stage takes place on domain-specific data, e.g., MS MARCO and NLI, where the base model is continuingly trained based on RetroMAE and contrastive learning. The pre-training outputs at the two stages may serve different applications, whose effectiveness are verified with comprehensive experiments. Concretely, the base model are proved to be effective for zero-shot retrieval, with remarkable performances achieved on BEIR benchmark. The continuingly pre-trained models further benefit more downstream tasks, including the domain-specific dense retrieval on MS MARCO, Natural Questions, and the sentence embeddings' quality for standard STS and transfer tasks in SentEval. The empirical insights of this work may inspire the future design of sentence representation pre-training. Our pre-trained models and source code will be released to the public communities.

9.2CYJul 30, 2020

Artificial Intelligence in the Battle against Coronavirus (COVID-19): A Survey and Future Research Directions

Thanh Thi Nguyen, Quoc Viet Hung Nguyen, Dung Tien Nguyen et al.

Artificial intelligence (AI) has been applied widely in our daily lives in a variety of ways with numerous success stories. AI has also contributed to dealing with the coronavirus disease (COVID-19) pandemic, which has been happening around the globe. This paper presents a survey of AI methods being used in various applications in the fight against the COVID-19 outbreak and outlines the crucial role of AI research in this unprecedented battle. We touch on areas where AI plays as an essential component, from medical image processing, data analytics, text mining and natural language processing, the Internet of Things, to computational biology and medicine. A summary of COVID-19 related data sources that are available for research purposes is also presented. Research directions on exploring the potential of AI and enhancing its capability and power in the pandemic battle are thoroughly discussed. We identify 13 groups of problems related to the COVID-19 pandemic and highlight promising AI methods and tools that can be used to address these problems. It is envisaged that this study will provide AI researchers and the wider community with an overview of the current status of AI applications, and motivate researchers to harness AI's potential in the fight against COVID-19.

Samuel Yang

2 Papers