Comparison of Image Processing Models in Quark Gluon Jet Classification
This work addresses jet classification in high-energy physics, with incremental improvements in model efficiency and accuracy for domain-specific applications.
The paper tackled the problem of distinguishing quark and gluon jets using simulated jet images by comparing convolutional and transformer-based models, finding that fine-tuning only the final two blocks of the Swin-Tiny model achieved 81.4% accuracy and an AUC of 88.9%.
We present a comprehensive comparison of convolutional and transformer-based models for distinguishing quark and gluon jets using simulated jet images from Pythia 8. By encoding jet substructure into a three-channel representation of particle kinematics, we evaluate the performance of convolutional neural networks (CNNs), Vision Transformers (ViTs), and Swin Transformers (Swin-Tiny) under both supervised and self-supervised learning setups. Our results show that fine-tuning only the final two transformer blocks of the Swin-Tiny model achieves the best trade-off between efficiency and accuracy, reaching 81.4% accuracy and an AUC (area under the ROC curve) of 88.9%. Self-supervised pretraining with Momentum Contrast (MoCo) further enhances feature robustness and reduces the number of trainable parameters. These findings highlight the potential of hierarchical attention-based models for jet substructure studies and for domain transfer to real collision data.