OR-Toolformer: Modeling and Solving Operations Research Problems with Tool Augmented Large Language Models
This work addresses privacy and cost issues in applying LLMs to operations research problems, offering a generalizable solution for domain-specific applications, though it is incremental as it builds on existing tool-augmented fine-tuning methods.
The paper tackled the challenge of using large language models for operations research tasks while addressing privacy and cost concerns by introducing OR-Toolformer, which fine-tunes an open-source model with tool augmentation, achieving up to 80.1% execution accuracy on benchmarks and a 21 percentage-point improvement in zero-shot evaluation on unseen problems.
Large language models (LLMs) demonstrate strong mathematical reasoning, but reliance on closed-source APIs for OR tasks raises privacy concerns, and training open-source models from scratch incurs high compute costs. We introduce OR-Toolformer, which fine-tunes Llama-3.1-8B-Instruct with a semi-automatic data synthesis pipeline that generates diverse OR problem-answer pairs and augments the model with external solvers to produce API calls. On three of four standard benchmarks, OR-Toolformer achieves up to 80.1% execution accuracy, exceeding size-matched baselines by over 4.3%. In zero-shot evaluation on two unseen OR problem types, it attains 54% average accuracy, a 21 percentage-point improvement over the strongest baseline. These findings validate the efficacy of tool-augmented fine-tuning LLMs for accurate and generalizable OR problem modeling and solving.