svcR: An R Package for Support Vector Clustering improved with Geometric Hashing applied to Lexical Pattern Discovery
This work provides a tool for researchers in computational biology and text mining to improve lexical pattern discovery, though it is incremental as it builds on existing support vector clustering methods.
The authors developed an R package called svcR that implements support vector clustering with a 2D-grid labeling approach to speed up cluster extraction, and demonstrated its application in classifying biological terms into ontological classes and generating regular expression rules for information extraction, achieving effective results in a developmental and molecular biology case study.
We present a new R package which takes a numerical matrix format as data input, and computes clusters using a support vector clustering method (SVC). We have implemented an original 2D-grid labeling approach to speed up cluster extraction. In this sense, SVC can be seen as an efficient cluster extraction if clusters are separable in a 2-D map. Secondly we showed that this SVC approach using a Jaccard-Radial base kernel can help to classify well enough a set of terms into ontological classes and help to define regular expression rules for information extraction in documents; our case study concerns a set of terms and documents about developmental and molecular biology.