CLAINov 16, 2023

LongBoX: Evaluating Transformers on Long-Sequence Clinical Tasks

arXiv:2311.09564v13 citationsh-index: 30Has Code
Originality Synthesis-oriented
AI Analysis

This addresses the need for systematic evaluation of LLMs on long medical texts like electronic health records, which is crucial for improving their clinical applicability, though it is incremental as it builds on prior general-domain findings.

The authors tackled the problem of evaluating large language models on long-sequence clinical tasks by introducing LongBoX, a collection of seven medical datasets, and found that both medical and general domain LLMs struggle on this benchmark, with mixed results from long-sequence handling techniques.

Many large language models (LLMs) for medicine have largely been evaluated on short texts, and their ability to handle longer sequences such as a complete electronic health record (EHR) has not been systematically explored. Assessing these models on long sequences is crucial since prior work in the general domain has demonstrated performance degradation of LLMs on longer texts. Motivated by this, we introduce LongBoX, a collection of seven medical datasets in text-to-text format, designed to investigate model performance on long sequences. Preliminary experiments reveal that both medical LLMs (e.g., BioGPT) and strong general domain LLMs (e.g., FLAN-T5) struggle on this benchmark. We further evaluate two techniques designed for long-sequence handling: (i) local-global attention, and (ii) Fusion-in-Decoder (FiD). Our results demonstrate mixed results with long-sequence handling - while scores on some datasets increase, there is substantial room for improvement. We hope that LongBoX facilitates the development of more effective long-sequence techniques for the medical domain. Data and source code are available at https://github.com/Mihir3009/LongBoX.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes