Leveraging Large Language Models for Concept Graph Recovery and Question Answering in NLP Education
This work addresses the underexplored use of LLMs for domain-specific educational queries, introducing a new benchmark and pipeline, but it is incremental as it builds on existing LLM capabilities.
The study tackled the problem of applying large language models (LLMs) to educational scenarios, specifically for concept graph recovery and question-answering in NLP, resulting in a 3% F1 score improvement in zero-shot concept graph recovery and up to 26% F1 score enhancement in new benchmark tasks.
In the domain of Natural Language Processing (NLP), Large Language Models (LLMs) have demonstrated promise in text-generation tasks. However, their educational applications, particularly for domain-specific queries, remain underexplored. This study investigates LLMs' capabilities in educational scenarios, focusing on concept graph recovery and question-answering (QA). We assess LLMs' zero-shot performance in creating domain-specific concept graphs and introduce TutorQA, a new expert-verified NLP-focused benchmark for scientific graph reasoning and QA. TutorQA consists of five tasks with 500 QA pairs. To tackle TutorQA queries, we present CGLLM, a pipeline integrating concept graphs with LLMs for answering diverse questions. Our results indicate that LLMs' zero-shot concept graph recovery is competitive with supervised methods, showing an average 3% F1 score improvement. In TutorQA tasks, LLMs achieve up to 26% F1 score enhancement. Moreover, human evaluation and analysis show that CGLLM generates answers with more fine-grained concepts.