Representing Numbers in NLP: a Survey and a Vision
This work tackles the problem of numeracy in NLP for researchers and practitioners, offering a structured approach to enhance number representation, but it is incremental as it synthesizes existing methods rather than introducing new ones.
The paper addresses the underrepresentation of numbers in NLP systems by proposing a taxonomy of numeracy tasks and analyzing existing number encoders and decoders, resulting in a framework for improved number handling and evaluation.
NLP systems rarely give special consideration to numbers found in text. This starkly contrasts with the consensus in neuroscience that, in the brain, numbers are represented differently from words. We arrange recent NLP work on numeracy into a comprehensive taxonomy of tasks and methods. We break down the subjective notion of numeracy into 7 subtasks, arranged along two dimensions: granularity (exact vs approximate) and units (abstract vs grounded). We analyze the myriad representational choices made by 18 previously published number encoders and decoders. We synthesize best practices for representing numbers in text and articulate a vision for holistic numeracy in NLP, comprised of design trade-offs and a unified evaluation.