ProcessTBench: An LLM Plan Generation Dataset for Process Mining
This provides a new benchmark for researchers to study LLMs in process mining scenarios, though it is incremental as it builds on existing datasets.
The authors tackled the lack of complex datasets for evaluating LLMs in plan generation by introducing ProcessTBench, a synthetic dataset that extends TaskBench to support features like paraphrased queries, multiple languages, and parallel actions.
Large Language Models (LLMs) have shown significant promise in plan generation. Yet, existing datasets often lack the complexity needed for advanced tool use scenarios - such as handling paraphrased query statements, supporting multiple languages, and managing actions that can be done in parallel. These scenarios are crucial for evaluating the evolving capabilities of LLMs in real-world applications. Moreover, current datasets don't enable the study of LLMs from a process perspective, particularly in scenarios where understanding typical behaviors and challenges in executing the same process under different conditions or formulations is crucial. To address these gaps, we present the ProcessTBench synthetic dataset, an extension of the TaskBench dataset specifically designed to evaluate LLMs within a process mining framework.