An Empirical Study of Benchmarking Chinese Aspect Sentiment Quad Prediction
This work addresses data scarcity for researchers in aspect-level sentiment analysis, though it is incremental as it focuses on dataset expansion and benchmarking.
The authors tackled the problem of limited data for aspect sentiment quad prediction (ASQP) by constructing two large Chinese datasets with over 10,000 samples each, which have higher density and more words per sentence than existing datasets, and they were the first to evaluate GPT models on ASQP, revealing performance issues.
Aspect sentiment quad prediction (ASQP) is a critical subtask of aspect-level sentiment analysis. Current ASQP datasets are characterized by their small size and low quadruple density, which hinders technical development. To expand capacity, we construct two large Chinese ASQP datasets crawled from multiple online platforms. The datasets hold several significant characteristics: larger size (each with 10,000+ samples) and rich aspect categories, more words per sentence, and higher density than existing ASQP datasets. Moreover, we are the first to evaluate the performance of Generative Pre-trained Transformer (GPT) series models on ASQP and exhibit potential issues. The experiments with state-of-the-art ASQP baselines underscore the need to explore additional techniques to address ASQP, as well as the importance of further investigation into methods to improve the performance of GPTs.