Alex Potanin

2papers

2 Papers

39.1SEApr 7
A Survey of Algorithm Debt in Machine and Deep Learning Systems: Definition, Smells, and Future Work

Emmanuel Iko-Ojo Simon, Chirath Hettiarachchi, Fatemeh Fard et al.

The adoption of Machine and Deep Learning (ML/DL) technologies introduces maintenance challenges, leading to Technical Debt (TD). Algorithm Debt (AD) is a TD type that impacts the performance and scalability of ML/DL systems. A review of 42 primary studies expanded AD's definition, uncovered its implicit presence, identified its smells, and highlighted future directions. These findings will guide an AD-focused study, enhancing the reliability of ML/DL systems.

29.6SEApr 19
React-ing to Grace Hopper 200: Five Open-Weights Coding Models, One React Native App, One GH200, One Weekend

Alex Potanin

We evaluate five state-of-the-art open-weights coding language models -- Kimi-K2.5 (at Q3 and Q4 quantizations), GLM-5.1, Qwen3-Coder-480B, and DeepSeek-V3.2 -- on a single multi-file React Native application generation task on NVIDIA GH200 576 GB hardware. The task specifies authentication, per-user per-day counting, and web compatibility, and is evaluated on whether the generated project runs out-of-the-box and on feature-level correctness. We find that SWE-Bench rankings do not predict task performance: Kimi-K2.5 at aggressive 3-bit quantization (UD-Q3_K_XL, 480 GB) produces the most complete and specification-compliant output, outranking models with substantially higher SWE-Bench Pro scores. We document three novel deployment findings: (1) default temperature=0 in coding tools causes sampling hangs with reasoning-model architectures, (2) reasoning-model thinking traces can leak through integration tools' file-path parsers, and (3) web-platform adaptation of native-mobile APIs is a universal training-data gap across every model tested. We also map the hardware-tier structure of April 2026 open-weights coding models, identifying two architectural schools and showing that the efficiency school (10-15 B active parameters) delivers equivalent SWE-Bench results at roughly 1/7th the hardware cost of the scale school (32-40 B active parameters).