PLAISEJan 31, 2023

FLAME: A small language model for spreadsheet formulas

Stanford
arXiv:2301.13779v222 citationsh-index: 65
AI Analysis

This work addresses the challenge of deploying AI assistance in spreadsheet environments for end-users by providing a more efficient, domain-specific solution, though it is incremental in focusing on a niche application.

The authors tackled the problem of expensive and large language models for spreadsheet formula assistance by introducing FLAME, a small transformer model trained on Excel formulas, which outperformed larger models like Codex and CodeT5 in 10 out of 14 evaluation settings for repair and completion tasks.

Spreadsheets are a vital tool for end-user data management. Using large language models for formula authoring assistance in these environments can be difficult, as these models are expensive to train and challenging to deploy due to their size (up to billions of parameters). We present FLAME, a transformer-based model trained exclusively on Excel formulas that leverages domain insights to achieve competitive performance while being substantially smaller (60M parameters) and training on two orders of magnitude less data. We curate a training dataset using sketch deduplication, introduce an Excel-specific formula tokenizer, and use domain-specific versions of masked span prediction and noisy auto-encoding as pre-training objectives. We evaluate FLAME on formula repair, formula completion, and similarity-based formula retrieval. FLAME can outperform much larger models, such as the Davinci (175B) and Cushman (12B) variants of Codex and CodeT5 (220M), in 10 of 14 evaluation settings for the repair and completion tasks. For formula retrieval, FLAME outperforms CodeT5, CodeBERT, and GraphCodeBERT.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes