76.2COApr 27Code
spectroxide: A code package for computing cosmic microwave background spectral distortionsEthan Baker, Hongwan Liu, Siddharth Mishra-Sharma
We present spectroxide, a code package for computing cosmic microwave background spectral distortions in which all ${\sim}14{,}500$ lines of Rust code, Python interface, and ${\sim}400$ automated tests were written by an AI assistant (Claude Code) under human physicist supervision. The solver evolves the photon Boltzmann equation under Compton scattering, double Compton emission, and Bremsstrahlung from $z \sim 5 \times 10^6$ to the present, computing spectral distortions from arbitrary heat and photon injection within this redshift range. No fully open-source code of this kind is publicly available; we validate against analytic limits, published spectra, and publicly available precomputed Green's function tables. We document the development as a case study in AI-assisted scientific computing, highlighting how domain expertise caught physics bugs (incorrect dimensional prefactors, near-cancellation errors) that evaded the full automated test suite, and provide recommendations for best practices in human--AI collaborative development of scientific software. We make spectroxide publicly available on GitHub.
CRApr 30, 2024
Constrained Decoding for Secure Code GenerationYanjun Fu, Ethan Baker, Yu Ding et al.
Code Large Language Models (Code LLMs) have been increasingly used by developers to boost productivity, but they often generate vulnerable code. Thus, there is an urgent need to ensure that code generated by Code LLMs is correct and secure. Previous research has primarily focused on generating secure code, overlooking the fact that secure code also needs to be correct. This oversight can lead to a false sense of security. Currently, the community lacks a method to measure actual progress in this area, and we need solutions that address both security and correctness of code generation. This paper introduces a new benchmark, CodeGuard+, along with two new metrics, to measure Code LLMs' ability to generate both secure and correct code. Using our new evaluation methods, we show that the state-of-the-art defense technique, prefix tuning, may not be as strong as previously believed, since it generates secure code but sacrifices functional correctness. We also demonstrate that different decoding methods significantly affect the security of Code LLMs. Furthermore, we explore a new defense direction: constrained decoding for secure code generation. We propose new constrained decoding techniques to generate secure code. Our results reveal that constrained decoding is more effective than prefix tuning to improve the security of Code LLMs, without requiring a specialized training dataset. Moreover, our evaluations over eight state-of-the-art Code LLMs show that constrained decoding has strong performance to improve the security of Code LLMs, and our technique outperforms GPT-4.