CL IRMar 3, 2025

Cancer Type, Stage and Prognosis Assessment from Pathology Reports using LLMs

Rachit Saluja, Jacob Rosenthal, Yoav Artzi, David J. Pisapia, Benjamin L. Liechty, Mert R. Sabuncu

arXiv:2503.01194v14.91 citationsh-index: 66Has CodeSci Rep

Originality Synthesis-oriented

AI Analysis

This work addresses the need for automated analysis of medical texts in pathology, which could assist healthcare professionals, but it is incremental as it applies existing LLMs to a new domain without a major methodological breakthrough.

The researchers tackled the problem of extracting cancer type, stage, and prognosis from unstructured pathology reports using large language models (LLMs), and developed instruction-tuned models that achieved superior performance in zero-shot tasks compared to other evaluated models.

Large Language Models (LLMs) have shown significant promise across various natural language processing tasks. However, their application in the field of pathology, particularly for extracting meaningful insights from unstructured medical texts such as pathology reports, remains underexplored and not well quantified. In this project, we leverage state-of-the-art language models, including the GPT family, Mistral models, and the open-source Llama models, to evaluate their performance in comprehensively analyzing pathology reports. Specifically, we assess their performance in cancer type identification, AJCC stage determination, and prognosis assessment, encompassing both information extraction and higher-order reasoning tasks. Based on a detailed analysis of their performance metrics in a zero-shot setting, we developed two instruction-tuned models: Path-llama3.1-8B and Path-GPT-4o-mini-FT. These models demonstrated superior performance in zero-shot cancer type identification, staging, and prognosis assessment compared to the other models evaluated.

View on arXiv PDF Code

Similar