Hierarchical Attention Generates Better Proofs
This addresses the challenge of improving theorem proving accuracy and efficiency for AI systems, representing an incremental advance in method design.
The paper tackles the problem of LLMs failing to capture hierarchical structures in mathematical proofs by introducing Hierarchical Attention, a regularization method that improves proof success rates by 2.05% on miniF2F and 1.69% on ProofNet while reducing complexity by 23.81% and 16.50% respectively.
Large language models (LLMs) have shown promise in formal theorem proving, but their token-level processing often fails to capture the inherent hierarchical nature of mathematical proofs. We introduce \textbf{Hierarchical Attention}, a regularization method that aligns LLMs' attention mechanisms with mathematical reasoning structures. Our approach establishes a five-level hierarchy from foundational elements to high-level concepts, ensuring structured information flow in proof generation. Experiments demonstrate that our method improves proof success rates by 2.05\% on miniF2F and 1.69\% on ProofNet while reducing proof complexity by 23.81\% and 16.50\% respectively. The code is available at https://github.com/Car-pe/HAGBP.