Leveraging Label Semantics and Meta-Label Refinement for Multi-Label Question Classification
It addresses label ambiguity and imbalance for personalized learning in online education, representing an incremental improvement with specific gains.
This paper tackles the problem of fine-grained multi-label question classification in online education, where overlapping labels and imbalanced distributions hinder accuracy, and introduces RR2QC, a retrieval reranking method that leverages label semantics and meta-label refinement to outperform existing methods in Precision@K and F1 scores on multiple datasets.
Accurate annotation of educational resources is crucial for effective personalized learning and resource recommendation in online education. However, fine-grained knowledge labels often overlap or share similarities, making it difficult for existing multi-label classification methods to differentiate them. The label distribution imbalance due to sparsity of human annotations further intensifies these challenges. To address these issues, this paper introduces RR2QC, a novel Retrieval Reranking method to multi-label Question Classification by leveraging label semantics and meta-label refinement. First, RR2QC improves the pre-training strategy by utilizing semantic relationships within and across label groups. Second, it introduces a class center learning task to align questions with label semantics during downstream training. Finally, this method decomposes labels into meta-labels and uses a meta-label classifier to rerank the retrieved label sequences. In doing so, RR2QC enhances the understanding and prediction capability of long-tail labels by learning from meta-labels that frequently appear in other labels. Additionally, a mathematical LLM is used to generate solutions for questions, extracting latent information to further refine the model's insights. Experimental results show that RR2QC outperforms existing methods in Precision@K and F1 scores across multiple educational datasets, demonstrating its effectiveness for online education applications. The code and datasets are available at https://github.com/78Erii/RR2QC.