DB LGDec 7, 2017

Columnar Database Techniques for Creating AI Features

Brad Carlile, Akiko Marti, Guy Delamarter

arXiv:1712.02882v1

Originality Incremental advance

AI Analysis

This work addresses the challenge of scalable AI feature engineering for data analysts and engineers working with large databases, though it appears incremental as it builds on existing columnar database techniques.

The paper tackles the problem of inefficient feature creation for AI analytics by introducing Augmented Dictionary Values (ADVs) in columnar databases to minimize data movement and duplication, resulting in increased efficiency for featurization tasks such as feature selection, extraction, and creation.

Recent advances with in-memory columnar database techniques have increased the performance of analytical queries on very large databases and data warehouses. At the same time, advances in artificial intelligence (AI) algorithms have increased the ability to analyze data. We use the term AI to encompass both Deep Learning (DL or neural network) and Machine Learning (ML aka Big Data analytics). Our exploration of the AI full stack has led us to a cross-stack columnar database innovation that efficiently creates features for AI analytics. The innovation is to create Augmented Dictionary Values (ADVs) to add to existing columnar database dictionaries in order to increase the efficiency of featurization by minimizing data movement and data duplication. We show how various forms of featurization (feature selection, feature extraction, and feature creation) can be efficiently calculated in a columnar database. The full stack AI investigation has also led us to propose an integrated columnar database and AI architecture. This architecture has information flows and feedback loops to improve the whole analytics cycle during multiple iterations of extracting data from the data sources, featurization, and analysis.

View on arXiv PDF

Similar