Detecting multiword phrases in mathematical text corpora
This work addresses the specific problem of terminology detection in mathematical texts for indexing applications, but it appears incremental as it applies an existing dictionary-based approach to a new domain.
The authors tackled the problem of detecting multiword phrases in mathematical text corpora by using a dictionary-based method with the Lingo tool, resulting in a method that algorithmically identifies phrases for potential improvements in indexing and information retrieval.
We present an approach for detecting multiword phrases in mathematical text corpora. The method used is based on characteristic features of mathematical terminology. It makes use of a software tool named Lingo which allows to identify words by means of previously defined dictionaries for specific word classes as adjectives, personal names or nouns. The detection of multiword groups is done algorithmically. Possible advantages of the method for indexing and information retrieval and conclusions for applying dictionary-based methods of automatic indexing instead of stemming procedures are discussed.