FPMoE: A Sparse Mixture-of-Experts Approach to Functional Code Generation

Loc Pham, Lang Hong Nguyet Anh, Thanh Le-Cong

arXiv:2605.2784962.6h-index: 12Has Code

Predicted impact top 10% in PL · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the underperformance of LLMs on functional programming languages by providing a lightweight, open-source model that outperforms fine-tuned baselines, benefiting developers and researchers working with Haskell, OCaml, and Scala.

FPMoE introduces a sparse Mixture-of-Experts model for functional code generation that uses language-specific experts and a shared expert to avoid cross-language interference while capturing shared functional abstractions. It achieves performance comparable to models 2-10x larger, matching DeepSeek-Coder-6.7B, Qwen2.5-Coder-14B-Instruct, and Qwen3-Coder-30B-A3B with only 3B active parameters.

Despite rapid progress in LLM-based code generation, existing models are predominantly trained on imperative languages, leaving functional programming languages (FPLs) such as Haskell, OCaml, and Scala chronically underexplored, with even frontier models performing substantially worse on FPLs. Fine-tuning is a natural remedy, but our experiments show that per-language fine-tuning fails to capture shared functional abstractions, while merged multi-language fine-tuning introduces cross-language interference. To address this, we introduce FPMoE, a lightweight, open-source code generation model built on a sparse Mixture-of-Experts (MoE) architecture with three language-specific routed experts (one each for Haskell, OCaml, and Scala) and a shared expert that captures cross-language functional patterns such as monadic reasoning and type-directed programming. This design resolves both failure modes simultaneously: dedicated experts eliminate interference, while the shared expert preserves abstractions that per-language models miss. On FPEval, FPMoE substantially outperforms fine-tuned baselines and, with only 3B active parameters, matches the performance of much larger models including DeepSeek-Coder-6.7B, Qwen2.5-Coder-14B-Instruct, and Qwen3-Coder-30B-A3B.

View on arXiv PDF

Similar