LMdiff: A Visual Diff Tool to Compare Language Models
This tool addresses the problem of comparing language models for NLP researchers and practitioners, but it is incremental as it builds on existing visualization and analysis techniques.
The authors tackled the difficulty of comparing language model outputs by introducing LMdiff, a visual tool that contrasts probability distributions of two models and helps generate hypotheses about model behavior through token-by-token analysis and phrase identification from corpora.
While different language models are ubiquitous in NLP, it is hard to contrast their outputs and identify which contexts one can handle better than the other. To address this question, we introduce LMdiff, a tool that visually compares probability distributions of two models that differ, e.g., through finetuning, distillation, or simply training with different parameter sizes. LMdiff allows the generation of hypotheses about model behavior by investigating text instances token by token and further assists in choosing these interesting text instances by identifying the most interesting phrases from large corpora. We showcase the applicability of LMdiff for hypothesis generation across multiple case studies. A demo is available at http://lmdiff.net .