CLNov 24, 2022
Tapping the Potential of Coherence and Syntactic Features in Neural Models for Automatic Essay ScoringXinying Qiu, Shuxuan Liao, Jiajun Xie et al.
In the prompt-specific holistic score prediction task for Automatic Essay Scoring, the general approaches include pre-trained neural model, coherence model, and hybrid model that incorporate syntactic features with neural model. In this paper, we propose a novel approach to extract and represent essay coherence features with prompt-learning NSP that shows to match the state-of-the-art AES coherence model, and achieves the best performance for long essays. We apply syntactic feature dense embedding to augment BERT-based model and achieve the best performance for hybrid methodology for AES. In addition, we explore various ideas to combine coherence, syntactic information and semantic embeddings, which no previous study has done before. Our combined model also performs better than the SOTA available for combined model, even though it does not outperform our syntactic enhanced neural model. We further offer analyses that can be useful for future study.
CLApr 6, 2022
Yunshan Cup 2020: Overview of the Part-of-Speech Tagging Task for Low-resourced LanguagesYingwen Fu, Jinyi Chen, Nankai Lin et al.
The Yunshan Cup 2020 track focused on creating a framework for evaluating different methods of part-of-speech (POS). There were two tasks for this track: (1) POS tagging for the Indonesian language, and (2) POS tagging for the Lao tagging. The Indonesian dataset is comprised of 10000 sentences from Indonesian news within 29 tags. And the Lao dataset consists of 8000 sentences within 27 tags. 25 teams registered for the task. The methods of participants ranged from feature-based to neural networks using either classical machine learning techniques or ensemble methods. The best performing results achieve an accuracy of 95.82% for Indonesian and 93.03%, showing that neural sequence labeling models significantly outperform classic feature-based methods and rule-based methods.
CLNov 24, 2022
InDEX: Indonesian Idiom and Expression Dataset for Cloze TestXinying Qiu, Guofeng Shi
We propose InDEX, an Indonesian Idiom and Expression dataset for cloze test. The dataset contains 10438 unique sentences for 289 idioms and expressions for which we generate 15 different types of distractors, resulting in a large cloze-style corpus. Many baseline models of cloze test reading comprehension apply BERT with random initialization to learn embedding representation. But idioms and fixed expressions are different such that the literal meaning of the phrases may or may not be consistent with their contextual meaning. Therefore, we explore different ways to combine static and contextual representations for a stronger baseline model. Experimentations show that combining definition and random initialization will better support cloze test model performance for idioms whether independently or mixed with fixed expressions. While for fixed expressions with no special meaning, static embedding with random initialization is sufficient for cloze test model.
CLJul 6, 2024
Cross-Lingual Word Alignment for ASEAN Languages with Contrastive LearningJingshen Zhang, Xinying Qiu, Teng Shen et al.
Cross-lingual word alignment plays a crucial role in various natural language processing tasks, particularly for low-resource languages. Recent study proposes a BiLSTM-based encoder-decoder model that outperforms pre-trained language models in low-resource settings. However, their model only considers the similarity of word embedding spaces and does not explicitly model the differences between word embeddings. To address this limitation, we propose incorporating contrastive learning into the BiLSTM-based encoder-decoder framework. Our approach introduces a multi-view negative sampling strategy to learn the differences between word pairs in the shared cross-lingual embedding space. We evaluate our model on five bilingual aligned datasets spanning four ASEAN languages: Lao, Vietnamese, Thai, and Indonesian. Experimental results demonstrate that integrating contrastive learning consistently improves word alignment accuracy across all datasets, confirming the effectiveness of the proposed method in low-resource scenarios. We will release our data set and code to support future research on ASEAN or more low-resource word alignment.
CLJul 3, 2024
Comparing Feature-based and Context-aware Approaches to PII Generalization Level PredictionKailin Zhang, Xinying Qiu
Protecting Personal Identifiable Information (PII) in text data is crucial for privacy, but current PII generalization methods face challenges such as uneven data distributions and limited context awareness. To address these issues, we propose two approaches: a feature-based method using machine learning to improve performance on structured inputs, and a novel context-aware framework that considers the broader context and semantic relationships between the original text and generalized candidates. The context-aware approach employs Multilingual-BERT for text representation, functional transformations, and mean squared error scoring to evaluate candidates. Experiments on the WikiReplace dataset demonstrate the effectiveness of both methods, with the context-aware approach outperforming the feature-based one across different scales. This work contributes to advancing PII generalization techniques by highlighting the importance of feature selection, ensemble learning, and incorporating contextual information for better privacy protection in text anonymization.
CLJul 11, 2024
System Report for CCL24-Eval Task 7: Multi-Error Modeling and Fluency-Targeted Pre-training for Chinese Essay EvaluationJingshen Zhang, Xiangyu Yang, Xinkai Su et al.
This system report presents our approaches and results for the Chinese Essay Fluency Evaluation (CEFE) task at CCL-2024. For Track 1, we optimized predictions for challenging fine-grained error types using binary classification models and trained coarse-grained models on the Chinese Learner 4W corpus. In Track 2, we enhanced performance by constructing a pseudo-dataset with multiple error types per sentence. For Track 3, where we achieved first place, we generated fluency-rated pseudo-data via back-translation for pre-training and used an NSP-based strategy with Symmetric Cross Entropy loss to capture context and mitigate long dependencies. Our methods effectively address key challenges in Chinese Essay Fluency Evaluation.
CLMar 3, 2024
Controlling Cloze-test Question Item Difficulty with PLM-based Surrogate Models for IRT AssessmentJingshen Zhang, Jiajun Xie, Xinying Qiu
Item difficulty plays a crucial role in adaptive testing. However, few works have focused on generating questions of varying difficulty levels, especially for multiple-choice (MC) cloze tests. We propose training pre-trained language models (PLMs) as surrogate models to enable item response theory (IRT) assessment, avoiding the need for human test subjects. We also propose two strategies to control the difficulty levels of both the gaps and the distractors using ranking rules to reduce invalid distractors. Experimentation on a benchmark dataset demonstrates that our proposed framework and methods can effectively control and evaluate the difficulty levels of MC cloze tests.
CLJul 16, 2025
DualReward: A Dynamic Reinforcement Learning Framework for Cloze Tests Distractor GenerationTianyou Huang, Xinglu Chen, Jingshen Zhang et al.
This paper introduces DualReward, a novel reinforcement learning framework for automatic distractor generation in cloze tests. Unlike conventional approaches that rely primarily on supervised learning or static generative models, our method employs a dual reward structure with adaptive scaling that differentiates between human-created gold standard distractors and model-generated candidates. The framework dynamically adjusts reward signal intensity based on model performance and confidence. We evaluate our approach on both passage-level (CLOTH-F) and sentence-level (MCQ) cloze test datasets, demonstrating consistent improvements over state-of-the-art baselines. Experimental results show that our adaptive reward scaling mechanism provides modest but consistent benefits on homogeneous datasets (CLOTH-F) and more substantial improvements (3.48-3.86% in P@1) on diverse, cross-domain data (MCQ), suggesting its particular effectiveness for handling varied question types and domains. Our work offers a flexible framework that effectively balances learning from reliable human examples while exploring novel, high-quality distractors for automated test generation.
CLJul 16, 2025
COLA-GEC: A Bidirectional Framework for Enhancing Grammatical Acceptability and Error CorrectionXiangyu Yang, Xinying Qiu
Grammatical Error Correction (GEC) and grammatical acceptability judgment (COLA) are core tasks in natural language processing, sharing foundational grammatical knowledge yet typically evolving independently. This paper introduces COLA-GEC, a novel bidirectional framework that enhances both tasks through mutual knowledge transfer. First, we augment grammatical acceptability models using GEC datasets, significantly improving their performance across multiple languages. Second, we integrate grammatical acceptability signals into GEC model training via a dynamic loss function, effectively guiding corrections toward grammatically acceptable outputs. Our approach achieves state-of-the-art results on several multilingual benchmarks. Comprehensive error analysis highlights remaining challenges, particularly in punctuation error correction, providing insights for future improvements in grammatical modeling.
CLJul 16, 2025
Cross-Domain Transfer and Few-Shot Learning for Personal Identifiable Information RecognitionJunhong Ye, Xu Yuan, Xinying Qiu
Accurate recognition of personally identifiable information (PII) is central to automated text anonymization. This paper investigates the effectiveness of cross-domain model transfer, multi-domain data fusion, and sample-efficient learning for PII recognition. Using annotated corpora from healthcare (I2B2), legal (TAB), and biography (Wikipedia), we evaluate models across four dimensions: in-domain performance, cross-domain transferability, fusion, and few-shot learning. Results show legal-domain data transfers well to biographical texts, while medical domains resist incoming transfer. Fusion benefits are domain-specific, and high-quality recognition is achievable with only 10% of training data in low-specialization domains.
CLJun 5, 2024
Readability-guided Idiom-aware Sentence Simplification (RISS) for ChineseJingshen Zhang, Xinglu Chen, Xinying Qiu et al.
Chinese sentence simplification faces challenges due to the lack of large-scale labeled parallel corpora and the prevalence of idioms. To address these challenges, we propose Readability-guided Idiom-aware Sentence Simplification (RISS), a novel framework that combines data augmentation techniques with lexcial simplification. RISS introduces two key components: (1) Readability-guided Paraphrase Selection (RPS), a method for mining high-quality sentence pairs, and (2) Idiom-aware Simplification (IAS), a model that enhances the comprehension and simplification of idiomatic expressions. By integrating RPS and IAS using multi-stage and multi-task learning strategies, RISS outperforms previous state-of-the-art methods on two Chinese sentence simplification datasets. Furthermore, RISS achieves additional improvements when fine-tuned on a small labeled dataset. Our approach demonstrates the potential for more effective and accessible Chinese text simplification.
CLJul 9, 2021
Learning Syntactic Dense Embedding with Correlation Graph for Automatic Readability AssessmentXinying Qiu, Yuan Chen, Hanwu Chen et al.
Deep learning models for automatic readability assessment generally discard linguistic features traditionally used in machine learning models for the task. We propose to incorporate linguistic features into neural network models by learning syntactic dense embeddings based on linguistic features. To cope with the relationships between the features, we form a correlation graph among features and use it to learn their embeddings so that similar features will be represented by similar embeddings. Experiments with six data sets of two proficiency levels demonstrate that our proposed methodology can complement BERT-only model to achieve significantly better performances for automatic readability assessment.