Medical Documents Classification Based on the Domain Ontology MeSH
This provides a domain-specific improvement for biomedical document classification, though it is incremental as it applies existing methods to a new representation.
The paper tackled medical document classification by using the MeSH domain ontology to generate concept-based representations, which improved performance by 30% over traditional stem-based methods when tested with C4.5 and KNN algorithms on the Ohsumed dataset.
This paper addresses the problem of classifying web documents using domain ontology. Our goal is to provide a method for improving the classification of medical documents by exploiting the MeSH thesaurus (Medical Subject Headings) which will allow us to generate a new representation based on concepts. This approach was tested with two well-known data mining algorithms C4.5 and KNN, and a comparison was made with the usual representation using stems. The enrichment of vectors using the concepts and the hyperonyms drawn from the domain ontology has significantly boosted their representation, something essential for good classification. The results of our experiments on the benchmark biomedical collection Ohsumed confirm the importance of the approach by a very significant improvement in the performance of the ontology-based classification compared to the classical representation (Stems) by 30%.