Filling the Gap: Is Commonsense Knowledge Generation useful for Natural Language Inference?
This addresses the problem of limited commonsense coverage in NLI for AI researchers, but the results are incremental as improvements are partial and not universal.
The study investigated whether generating commonsense knowledge with Large Language Models improves Natural Language Inference, finding it did not consistently boost overall accuracy but helped distinguish entailing instances and moderately improved handling of contradictory and neutral cases.
Natural Language Inference (NLI) is the task of determining the semantic entailment of a premise for a given hypothesis. The task aims to develop systems that emulate natural human inferential processes where commonsense knowledge plays a major role. However, existing commonsense resources lack sufficient coverage for a variety of premise-hypothesis pairs. This study explores the potential of Large Language Models as commonsense knowledge generators for NLI along two key dimensions: their reliability in generating such knowledge and the impact of that knowledge on prediction accuracy. We adapt and modify existing metrics to assess LLM factuality and consistency in generating in this context. While explicitly incorporating commonsense knowledge does not consistently improve overall results, it effectively helps distinguish entailing instances and moderately improves distinguishing contradictory and neutral inferences.