SEAIJun 2, 2025

Flow2Code: Evaluating Large Language Models for Flowchart-based Code Generation Capability

arXiv:2506.02073v19 citationsh-index: 3Has CodeACL
Originality Synthesis-oriented
AI Analysis

This addresses a gap in benchmarking for flowchart-based code generation, which is incremental as it extends existing code generation evaluation methods.

The paper tackles the problem of evaluating large language models for generating code from flowcharts, presenting Flow2Code as a new benchmark spanning 15 programming languages with 5,622 code segments and 16,866 flowcharts. Results show current LLMs cannot generate code perfectly from flowcharts, but supervised fine-tuning significantly improves performance.

While large language models (LLMs) show promise in code generation, existing benchmarks neglect the flowchart-based code generation. To promote further research on flowchart-based code generation, this work presents Flow2Code, a novel benchmark for flowchart-based code generation evaluation. The evaluation dataset spans 15 programming languages and includes 5,622 code segments paired with 16,866 flowcharts of three types: code, UML, and pseudocode. Extensive experiments with 13 multimodal LLMs reveal that current LLMs can not generate code based on flowcharts perfectly. Besides, experiment results show that the supervised fine-tuning technique contributes greatly to the models' performance. We publicly release our code and datasets at https://github.com/hml-github/Flow2Code.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes