Structure-Preserving Graph Contrastive Learning for Mathematical Information Retrieval
This work provides an incremental improvement for researchers and practitioners working on mathematical information retrieval by offering a more effective graph augmentation strategy.
This paper addresses the challenge of applying graph contrastive learning (GCL) to mathematical formula retrieval by introducing Variable Substitution, a domain-specific graph augmentation technique. This method preserves the algebraic relationships and structure of mathematical formulas, leading to significant improvements in retrieval performance compared to generic augmentation strategies.
This paper introduces Variable Substitution as a domain-specific graph augmentation technique for graph contrastive learning (GCL) in the context of searching for mathematical formulas. Standard GCL augmentation techniques often distort the semantic meaning of mathematical formulas, particularly for small and highly structured graphs. Variable Substitution, on the other hand, preserves the core algebraic relationships and formula structure. To demonstrate the effectiveness of our technique, we apply it to a classic GCL-based retrieval model. Experiments show that this straightforward approach significantly improves retrieval performance compared to generic augmentation strategies. We release the code on GitHub.\footnote{https://github.com/lazywulf/formula_ret_aug}.