Do LLMs Work on Charts? Designing Few-Shot Prompts for Chart Question Answering and Summarization
This work addresses the challenge of making chart analysis more accessible and generalizable for users by leveraging LLMs without expensive fine-tuning, though it is incremental as it builds on existing prompting techniques.
The authors tackled the problem of applying large language models (LLMs) to chart-related tasks like question answering and summarization, which require handling both data and visual features, by proposing PromptChart, a multimodal few-shot prompting framework that achieved state-of-the-art results on benchmarks.
A number of tasks have been proposed recently to facilitate easy access to charts such as chart QA and summarization. The dominant paradigm to solve these tasks has been to fine-tune a pretrained model on the task data. However, this approach is not only expensive but also not generalizable to unseen tasks. On the other hand, large language models (LLMs) have shown impressive generalization capabilities to unseen tasks with zero- or few-shot prompting. However, their application to chart-related tasks is not trivial as these tasks typically involve considering not only the underlying data but also the visual features in the chart image. We propose PromptChart, a multimodal few-shot prompting framework with LLMs for chart-related applications. By analyzing the tasks carefully, we have come up with a set of prompting guidelines for each task to elicit the best few-shot performance from LLMs. We further propose a strategy to inject visual information into the prompts. Our experiments on three different chart-related information consumption tasks show that with properly designed prompts LLMs can excel on the benchmarks, achieving state-of-the-art.