SODBench: A Large Language Model Approach to Documenting Spreadsheet Operations
This addresses the lack of systematic documentation for spreadsheets, which hinders automation and collaboration for knowledge workers, though it is incremental as it builds on existing LLM methods for code generation.
The paper tackles the problem of documenting spreadsheet operations by introducing SOD, a task to generate human-readable explanations from spreadsheet code, and evaluates five LLMs on a benchmark of 111 snippets, finding they can produce accurate documentation.
Numerous knowledge workers utilize spreadsheets in business, accounting, and finance. However, a lack of systematic documentation methods for spreadsheets hinders automation, collaboration, and knowledge transfer, which risks the loss of crucial institutional knowledge. This paper introduces Spreadsheet Operations Documentation (SOD), an AI task that involves generating human-readable explanations from spreadsheet operations. Many previous studies have utilized Large Language Models (LLMs) for generating spreadsheet manipulation code; however, translating that code into natural language for SOD is a less-explored area. To address this, we present a benchmark of 111 spreadsheet manipulation code snippets, each paired with a corresponding natural language summary. We evaluate five LLMs, GPT-4o, GPT-4o-mini, LLaMA-3.3-70B, Mixtral-8x7B, and Gemma2-9B, using BLEU, GLEU, ROUGE-L, and METEOR metrics. Our findings suggest that LLMs can generate accurate spreadsheet documentation, making SOD a feasible prerequisite step toward enhancing reproducibility, maintainability, and collaborative workflows in spreadsheets, although there are challenges that need to be addressed.