On the String Kernel Pre-Image Problem with Applications in Drug Discovery
This work addresses a computational bottleneck in drug discovery and computational biology, but it is incremental as it builds on existing methods for string kernels.
The paper tackled the string kernel pre-image problem, which is crucial for structured output predictors, by developing a low-complexity upper bound and using it in a branch-and-bound algorithm, achieving successful applications in discovering druggable peptides.
The pre-image problem has to be solved during inference by most structured output predictors. For string kernels, this problem corresponds to finding the string associated to a given input. An algorithm capable of solving or finding good approximations to this problem would have many applications in computational biology and other fields. This work uses a recent result on combinatorial optimization of linear predictors based on string kernels to develop, for the pre-image, a low complexity upper bound valid for many string kernels. This upper bound is used with success in a branch and bound searching algorithm. Applications and results in the discovery of druggable peptides are presented and discussed.