CLApr 14, 2024

DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness

arXiv:2404.09206v126 citationsh-index: 8SemEval
Originality Synthesis-oriented
AI Analysis

This work addresses robustness in biomedical natural language inference for clinical trials, but it is incremental as it builds on existing methods like DeBERTa with specific augmentations.

The paper tackled the problem of biases in natural language inference for clinical trial reports by developing a data augmentation technique using semantic perturbations and domain-specific vocabulary replacement, achieving rankings of 12th in faithfulness and 8th in consistency out of 32 participants on the NLI4CT 2024 benchmark.

Safe and reliable natural language inference is critical for extracting insights from clinical trial reports but poses challenges due to biases in large pre-trained language models. This paper presents a novel data augmentation technique to improve model robustness for biomedical natural language inference in clinical trials. By generating synthetic examples through semantic perturbations and domain-specific vocabulary replacement and adding a new task for numerical and quantitative reasoning, we introduce greater diversity and reduce shortcut learning. Our approach, combined with multi-task learning and the DeBERTa architecture, achieved significant performance gains on the NLI4CT 2024 benchmark compared to the original language models. Ablation studies validate the contribution of each augmentation method in improving robustness. Our best-performing model ranked 12th in terms of faithfulness and 8th in terms of consistency, respectively, out of the 32 participants.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes