The Duck's Brain: Training and Inference of Neural Networks in Modern Database Engines
This work addresses the problem for data scientists who want to integrate machine learning directly into database systems, but it is incremental as it builds on existing relational algebra and array data types.
The paper tackles the challenge of using SQL for machine learning by showing how to transform data into a relational representation to train neural networks in SQL, with evaluation showing modern database systems are suitable for matrix algebra, though specialized array data types perform better in runtime and memory consumption.
Although database systems perform well in data access and manipulation, their relational model hinders data scientists from formulating machine learning algorithms in SQL. Nevertheless, we argue that modern database systems perform well for machine learning algorithms expressed in relational algebra. To overcome the barrier of the relational model, this paper shows how to transform data into a relational representation for training neural networks in SQL: We first describe building blocks for data transformation, model training and inference in SQL-92 and their counterparts using an extended array data type. Then, we compare the implementation for model training and inference using array data types to the one using a relational representation in SQL-92 only. The evaluation in terms of runtime and memory consumption proves the suitability of modern database systems for matrix algebra, although specialised array data types perform better than matrices in relational representation.