CLAILOOct 8, 2022

Semantic Representations of Mathematical Expressions in a Continuous Vector Space

arXiv:2211.08142v31 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses the challenge of semantic representation for mathematical notation in STEM literature, which is incremental by building on sequence-to-sequence methods for a specific domain.

The paper tackles the problem of representing mathematical expressions in a continuous vector space, as existing methods for natural text are inadequate due to the precision and sensitivity of notation. It proposes using a sequence-to-sequence encoder trained on equivalent expressions, showing it outperforms a structural approach in capturing semantics.

Mathematical notation makes up a large portion of STEM literature, yet finding semantic representations for formulae remains a challenging problem. Because mathematical notation is precise, and its meaning changes significantly with small character shifts, the methods that work for natural text do not necessarily work well for mathematical expressions. This work describes an approach for representing mathematical expressions in a continuous vector space. We use the encoder of a sequence-to-sequence architecture, trained on visually different but mathematically equivalent expressions, to generate vector representations (or embeddings). We compare this approach with a structural approach that considers visual layout to embed an expression and show that our proposed approach is better at capturing mathematical semantics. Finally, to expedite future research, we publish a corpus of equivalent transcendental and algebraic expression pairs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes