SEAICLJul 4, 2024

Narrow Transformer: StarCoder-Based Java-LM For Desktop

arXiv:2407.03941v21 citationsh-index: 2Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the need for small, efficient code models that can run on desktops without specialized hardware, focusing on Java programming, but it is incremental as it builds on existing models and benchmarks.

The paper tackles the problem of deploying code language models on developer desktops by developing NT-Java-1.1B, a specialized Java model based on StarCoderBase-1.1B, which achieves state-of-the-art performance on the MultiPL-E Java benchmark, surpassing its base model and most similar-sized models.

This paper presents NT-Java-1.1B, an open-source specialized code language model built on StarCoderBase-1.1B, designed for coding tasks in Java programming. NT-Java-1.1B achieves state-of-the-art performance, surpassing its base model and majority of other models of similar size on MultiPL-E Java code benchmark. While there have been studies on extending large, generic pre-trained models to improve proficiency in specific programming languages like Python, similar investigations on small code models for other programming languages are lacking. Large code models require specialized hardware like GPUs for inference, highlighting the need for research into building small code models that can be deployed on developer desktops. This paper addresses this research gap by focusing on the development of a small Java code model, NT-Java-1.1B, and its quantized versions, which performs comparably to open models around 1.1B on MultiPL-E Java code benchmarks, making them ideal for desktop deployment. This paper establishes the foundation for specialized models across languages and sizes for a family of NT Models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes