CLApr 26, 2023
HeySQuAD: A Spoken Question Answering DatasetYijing Wu, SaiKrishna Rallabandi, Ravisutha Srinivasamurthy et al.
Spoken question answering (SQA) systems are critical for digital assistants and other real-world use cases, but evaluating their performance is a challenge due to the importance of human-spoken questions. This study presents a new large-scale community-shared SQA dataset called HeySQuAD, which includes 76k human-spoken questions, 97k machine-generated questions, and their corresponding textual answers from the SQuAD QA dataset. Our goal is to measure the ability of machines to accurately understand noisy spoken questions and provide reliable answers. Through extensive testing, we demonstrate that training with transcribed human-spoken and original SQuAD questions leads to a significant improvement (12.51%) in answering human-spoken questions compared to training with only the original SQuAD textual questions. Moreover, evaluating with a higher-quality transcription can lead to a further improvement of 2.03%. This research has significant implications for the development of SQA systems and their ability to meet the needs of users in real-world scenarios.
CLSep 15, 2023
Self-training Strategies for Sentiment Analysis: An Empirical StudyHaochen Liu, Sai Krishna Rallabandi, Yijing Wu et al.
Sentiment analysis is a crucial task in natural language processing that involves identifying and extracting subjective sentiment from text. Self-training has recently emerged as an economical and efficient technique for developing sentiment analysis models by leveraging a small amount of labeled data and a large amount of unlabeled data. However, given a set of training data, how to utilize them to conduct self-training makes a significant difference in the final performance of the model. We refer to this methodology as the self-training strategy. In this paper, we present an empirical study of various self-training strategies for sentiment analysis. First, we investigate the influence of the self-training strategy and hyper-parameters on the performance of traditional small language models (SLMs) in various few-shot settings. Second, we also explore the feasibility of leveraging large language models (LLMs) to help self-training. We propose and empirically compare several self-training strategies with the intervention of LLMs. Extensive experiments are conducted on three real-world sentiment analysis datasets.