CL AIMay 25

A Controlled Synthetic Benchmark for Educational Aspect-Based Sentiment Analysis

arXiv:2605.2550273.2

Predicted impact top 82% in CL · last 90 daysOriginality Synthesis-oriented

AI Analysis

For researchers in educational NLP, this provides a reproducible synthetic benchmark to develop ABSA methods where public labeled data is scarce.

This study introduces a controlled synthetic benchmark for educational aspect-based sentiment analysis (ABSA) using 10,000 synthetic course reviews with a 20-aspect schema. The best model (BERT) achieves a micro-F1 of 0.2930 on held-out detection, while GPT-based inference reaches 0.2519, and external evaluation on real data yields 0.4593 micro-F1, indicating partial transfer.

Educational aspect-based sentiment analysis (ABSA) can support course improvement, but public aspect-labeled student feedback remains scarce because educational reviews are private, institution-specific, and expensive to annotate. This study introduces a controlled synthetic benchmark for educational ABSA built from 10,000 synthetic course reviews with explicit train-validation-test splits and a 20-aspect pedagogical schema spanning instructional quality, assessment and course management, learning demand, learning environment, and engagement. The corpus is generated with sampled target labels, sampled nuance attributes, and a realism-tuned prompt refined through a three-cycle judge-editor procedure. On the resulting benchmark, local baselines with TF-IDF, two-step transformers, and joint encoders show that the task is nontrivial; the strongest untuned model, BERT, reaches a held-out detection micro-F1 of 0.2760, while a modest lower-rate BERT schedule improves this to 0.2930. Full-test GPT-based inference with gpt-5.2 reaches 0.2519 micro-F1 in zero-shot mode and 0.2501 with retrieval-based few-shot prompting, placing batch inference above the classical baseline and close to the compact joint encoders. A conservative external evaluation on 2,829 mapped student-feedback reviews from Herath et al. yields a micro-F1 of 0.4593 for BERT on a 9-aspect overlap, indicating partial synthetic-to-real transfer. Realism and faithfulness analyses are reported as generator diagnostics that clarify how the benchmark was stabilized and where label noise remains. The study therefore contributes a synthetic educational ABSA corpus, a documented generation procedure, and a reproducible benchmark setting for a domain in which public labeled data remain difficult to obtain.

View on arXiv PDF

Similar