Matching of Descriptive Labels to Glossary Descriptions
This work addresses a specific challenge for software engineers in clarifying semantics of labels in IT systems, but it is incremental as it builds on existing semantic text similarity methods.
The paper tackles the problem of matching short or generic descriptive labels to glossary descriptions in software engineering by proposing a framework that enhances semantic text similarity with semantic label enrichment and set-based collective contextualization. The result showed that these methods improved the accuracy of matching labels to descriptions in experiments on two publicly available datasets.
Semantic text similarity plays an important role in software engineering tasks in which engineers are requested to clarify the semantics of descriptive labels (e.g., business terms, table column names) that are often consists of too short or too generic words and appears in their IT systems. We formulate this type of problem as a task of matching descriptive labels to glossary descriptions. We then propose a framework to leverage an existing semantic text similarity measurement (STS) and augment it using semantic label enrichment and set-based collective contextualization where the former is a method to retrieve sentences relevant to a given label and the latter is a method to compute similarity between two contexts each of which is derived from a set of texts (e.g., column names in the same table). We performed an experiment on two datasets derived from publicly available data sources. The result indicated that the proposed methods helped the underlying STS correctly match more descriptive labels with the descriptions.