Certification from Examples is Hard for Circuits and Transformers under Minimal Overparametrization
For researchers and practitioners needing exact guarantees on neural network behavior, this work reveals fundamental hardness barriers even under minimal overparametrization, showing that certification is exponentially difficult for circuits and Transformers.
The paper proves that minimal overparametrization (e.g., adding one extra gate to threshold circuits or constant architectural overhead to Transformers) makes exact certification exponentially hard, requiring certificate sizes exponential in input dimension. It also shows that approximate certification with polynomially many mistakes still needs exponential certificates, while constant relative error can hide exponentially many mistakes.
As state-of-the-art neural networks are deployed on reasoning and algorithmic tasks, exactness guarantees become increasingly important. However, high average-case accuracy can still mask inconsistent behaviors. This motivates exact certification, which asks for the smallest set of labeled examples needed to certify that a learned hypothesis equals the target. We show that while some hypotheses are easy to certify, even minimal overparametrization can make certification exponentially hard across several hypothesis classes. For threshold circuits of depth $\ge 2$, adding a single extra gate can force certificate sizes exponential in the input dimension. We show an analogous hardness result for log-precision Transformers with only constant architectural overhead. We also characterize approximate certification, showing that allowing only polynomially many mistakes still requires exponentially large certificates, whereas constant relative-error guarantees can hide exponentially many mistakes. Empirically, we study certification for constructed circuits and trained Transformers for recognizing binary addition. While the constructed circuits instantiate the exponential barrier for certification, the trained Transformer analysis shows that imperfect models can evade detection by large uniformly sampled certificate candidates.