Building an Internal Coding Agent at Zup: Lessons and Open Questions
For enterprise teams building coding agents, it highlights that engineering decisions surrounding the model are more critical than model quality for production readiness.
The paper presents CodeGen, an internal coding agent at Zup, and finds that tool design and safety guardrails improved reliability more than prompt engineering, with progressive human oversight driving adoption.
Enterprise teams building internal coding agents face a gap between prototype performance and production readiness. The root cause is that technical model quality alone is insufficient -- tool design, safety enforcement, state management, and human trust calibration are equally decisive, yet underreported in the literature. We present CodeGen, an internal coding agent at Zup, and show that targeted tool design (e.g., string-replacement edits over full-file rewrites) and layered safety guardrails improved agent reliability more than prompt engineering, while progressive human oversight modes drove organic adoption without mandating trust. These findings suggest that the engineering decisions surrounding the model -- not the model itself -- determine whether a coding agent delivers real value in practice.