CLJun 21, 2024

A SMART Mnemonic Sounds like "Glue Tonic": Mixing LLMs with Student Feedback to Make Mnemonic Learning Stick

Nishant Balepur, Matthew Shu, Alexander Hoyle, Alison Robey, Shi Feng, Seraphina Goldfarb-Tarrant, Jordan Boyd-Graber

arXiv:2406.15352v215.729 citationsh-index: 42Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of creating personalized and effective educational tools for students by leveraging student feedback to improve mnemonic generation, though it is incremental as it builds on prior mnemonic generation methods.

The paper tackles the problem of generating effective keyword mnemonics for students by training a model called SMART on real student feedback, including expressed and observed preferences, to align it with what truly aids learning. The result is that SMART matches GPT-4's performance in expert assessments at lower costs, demonstrating the utility of diverse feedback for LLM alignment in education.

Keyword mnemonics are memorable explanations that link new terms to simpler keywords. Prior work generates mnemonics for students, but they do not train models using mnemonics students prefer and aid learning. We build SMART, a mnemonic generator trained on feedback from real students learning new terms. To train SMART, we first fine-tune LLaMA-2 on a curated set of user-written mnemonics. We then use LLM alignment to enhance SMART: we deploy mnemonics generated by SMART in a flashcard app to find preferences on mnemonics students favor. We gather 2684 preferences from 45 students across two types: expressed (inferred from ratings) and observed (inferred from student learning), yielding three key findings. First, expressed and observed preferences disagree; what students think is helpful does not always capture what is truly helpful. Second, Bayesian models can synthesize complementary data from multiple preference types into a single effectiveness signal. SMART is tuned via Direct Preference Optimization on this signal, which resolves ties and missing labels in the typical method of pairwise comparisons, augmenting data for LLM output quality gains. Third, mnemonic experts assess SMART as matching GPT-4 at much lower deployment costs, showing the utility of capturing diverse student feedback to align LLMs in education.

View on arXiv PDF Code

Similar