LG CLApr 19, 2023

ESimCSE Unsupervised Contrastive Learning Jointly with UDA Semi-Supervised Learning for Large Label System Text Classification Mode

Ruan Lu, Zhou HangCheng, Ran Meng, Zhao Jin, Qin JiaoYu, Wei Feng, Wang ChenZi

arXiv:2304.13140v12.01 citationsh-index: 9

Originality Synthesis-oriented

AI Analysis

This addresses text classification challenges like multiple tags and noise for NLP applications, but it appears incremental as it combines existing techniques.

The paper tackled text classification with large tag systems by combining ESimCSE unsupervised contrastive learning and UDA semi-supervised learning, achieving accuracy improvements of 8% on a public dataset and 10-15% on an operational dataset.

The challenges faced by text classification with large tag systems in natural language processing tasks include multiple tag systems, uneven data distribution, and high noise. To address these problems, the ESimCSE unsupervised comparative learning and UDA semi-supervised comparative learning models are combined through the use of joint training techniques in the models.The ESimCSE model efficiently learns text vector representations using unlabeled data to achieve better classification results, while UDA is trained using unlabeled data through semi-supervised learning methods to improve the prediction performance of the models and stability, and further improve the generalization ability of the model. In addition, adversarial training techniques FGM and PGD are used in the model training process to improve the robustness and reliability of the model. The experimental results show that there is an 8% and 10% accuracy improvement relative to Baseline on the public dataset Ruesters as well as on the operational dataset, respectively, and a 15% improvement in manual validation accuracy can be achieved on the operational dataset, indicating that the method is effective.

View on arXiv PDF

Similar