CLJul 1, 2024

How to Leverage Digit Embeddings to Represent Numbers?

Jasivan Alex Sivakumar, Nafise Sadat Moosavi

arXiv:2407.00894v212.921 citationsh-index: 16Has Code

Originality Incremental advance

AI Analysis

This addresses a specific problem in numerical reasoning for language models, but it is incremental as it builds on existing digit embedding techniques.

The paper tackles the challenge of representing numbers in language models by using mathematical priors to compute aggregated digit embeddings and incorporating them into transformer models, showing compatibility with any pretrained model and ease of implementation.

Within numerical reasoning, understanding numbers themselves is still a challenge for existing language models. Simple generalisations, such as solving 100+200 instead of 1+2, can substantially affect model performance (Sivakumar and Moosavi, 2023). Among various techniques, character-level embeddings of numbers have emerged as a promising approach to improve number representation. However, this method has limitations as it leaves the task of aggregating digit representations to the model, which lacks direct supervision for this process. In this paper, we explore the use of mathematical priors to compute aggregated digit embeddings and explicitly incorporate these aggregates into transformer models. This can be achieved either by adding a special token to the input embeddings or by introducing an additional loss function to enhance correct predictions. We evaluate the effectiveness of incorporating this explicit aggregation, analysing its strengths and shortcomings, and discuss future directions to better benefit from this approach. Our methods, while simple, are compatible with any pretrained model, easy to implement, and have been made publicly available.

View on arXiv PDF Code

Similar