Evaluation of semantic relations impact in query expansion-based retrieval systems
This work addresses the challenge of correctly interpreting user needs in intelligent systems, particularly for applications like natural language processing, but it appears incremental as it builds on existing semantic expansion techniques.
This paper tackles the problem of improving query expansion in retrieval systems by generating semantic resources from taxonomy labels and evaluating the impact of different semantic relations (like synonymy and antonymy) on classification accuracy. The result is a quantification of each relation's effect and an assessment of the best tradeoff between improvement and noise when combining them.
With the increasing demand of intelligent systems capable of operating in different contexts (e.g. users on the move) the correct interpretation of the user-need by such systems has become crucial to give consistent answers to the user questions. The most effective applications addressing such task are in the fields of natural language processing and semantic expansion of terms. These techniques are aimed at estimating the goal of an input query reformulating it as an intent, commonly relying on textual resources built exploiting different semantic relations like \emph{synonymy}, \emph{antonymy} and many others. The aim of this paper is to generate such resources using the labels of a given taxonomy as source of information. The obtained resources are integrated into a plain classifier for reformulating a set of input queries as intents and tracking the effect of each relation, in order to quantify the impact of each semantic relation on the classification. As an extension to this, the best tradeoff between improvement and noise introduction when combining such relations is evaluated. The assessment is made generating the resources and their combinations and using them for tuning the classifier which is used to reformulate the user questions as labels. The evaluation employs a wide and varied taxonomy as a use-case, exploiting its labels as basis for the semantic expansion and producing several corpora with the purpose of enhancing the pseudo-queries estimation.