LG CLNov 4, 2024

Leveraging Label Semantics and Meta-Label Refinement for Multi-Label Question Classification

Shi Dong, Xiaobei Niu, Rui Zhong, Zhifeng Wang, Mingzhang Zuo

arXiv:2411.01841v39.28 citationsh-index: 12Has CodeKnowledge-Based Systems

Originality Incremental advance

AI Analysis

It addresses label ambiguity and imbalance for personalized learning in online education, representing an incremental improvement with specific gains.

This paper tackles the problem of fine-grained multi-label question classification in online education, where overlapping labels and imbalanced distributions hinder accuracy, and introduces RR2QC, a retrieval reranking method that leverages label semantics and meta-label refinement to outperform existing methods in Precision@K and F1 scores on multiple datasets.

Accurate annotation of educational resources is crucial for effective personalized learning and resource recommendation in online education. However, fine-grained knowledge labels often overlap or share similarities, making it difficult for existing multi-label classification methods to differentiate them. The label distribution imbalance due to sparsity of human annotations further intensifies these challenges. To address these issues, this paper introduces RR2QC, a novel Retrieval Reranking method to multi-label Question Classification by leveraging label semantics and meta-label refinement. First, RR2QC improves the pre-training strategy by utilizing semantic relationships within and across label groups. Second, it introduces a class center learning task to align questions with label semantics during downstream training. Finally, this method decomposes labels into meta-labels and uses a meta-label classifier to rerank the retrieved label sequences. In doing so, RR2QC enhances the understanding and prediction capability of long-tail labels by learning from meta-labels that frequently appear in other labels. Additionally, a mathematical LLM is used to generate solutions for questions, extracting latent information to further refine the model's insights. Experimental results show that RR2QC outperforms existing methods in Precision@K and F1 scores across multiple educational datasets, demonstrating its effectiveness for online education applications. The code and datasets are available at https://github.com/78Erii/RR2QC.

View on arXiv PDF Code

Similar