Knowledge transfer across cell lines using Hybrid Gaussian Process models with entity embedding vectors
This work addresses the problem of reducing experimental effort in biochemical process development by leveraging existing data for novel processes, which is relevant for researchers and engineers in bioprocess engineering.
This paper explores the use of embedding vectors to represent product identity in Gaussian Process regression models, enabling knowledge transfer across different biochemical processes. By learning these embedding vectors from process data, the method aims to reduce the number of experiments needed for novel processes.
To date, a large number of experiments are performed to develop a biochemical process. The generated data is used only once, to take decisions for development. Could we exploit data of already developed processes to make predictions for a novel process, we could significantly reduce the number of experiments needed. Processes for different products exhibit differences in behaviour, typically only a subset behave similar. Therefore, effective learning on multiple product spanning process data requires a sensible representation of the product identity. We propose to represent the product identity (a categorical feature) by embedding vectors that serve as input to a Gaussian Process regression model. We demonstrate how the embedding vectors can be learned from process data and show that they capture an interpretable notion of product similarity. The improvement in performance is compared to traditional one-hot encoding on a simulated cross product learning task. All in all, the proposed method could render possible significant reductions in wet-lab experiments.