Neural Architecture Codesign for Fast Physics Applications
This work addresses the challenge of making ML model design more accessible for physics researchers, though it is incremental as it builds on existing techniques like neural architecture search and compression.
The authors tackled the problem of reducing the need for ML expertise in designing models for physics applications by developing a neural architecture codesign pipeline, which achieved improved accuracy, smaller latencies, or reduced resource utilization in case studies like Bragg peak finding and jet classification.
We develop a pipeline to streamline neural architecture codesign for physics applications to reduce the need for ML expertise when designing models for novel tasks. Our method employs neural architecture search and network compression in a two-stage approach to discover hardware efficient models. This approach consists of a global search stage that explores a wide range of architectures while considering hardware constraints, followed by a local search stage that fine-tunes and compresses the most promising candidates. We exceed performance on various tasks and show further speedup through model compression techniques such as quantization-aware-training and neural network pruning. We synthesize the optimal models to high level synthesis code for FPGA deployment with the hls4ml library. Additionally, our hierarchical search space provides greater flexibility in optimization, which can easily extend to other tasks and domains. We demonstrate this with two case studies: Bragg peak finding in materials science and jet classification in high energy physics, achieving models with improved accuracy, smaller latencies, or reduced resource utilization relative to the baseline models.