VDMS: Efficient Big-Visual-Data Access for Machine Learning Workloads
This addresses the bottleneck of slow visual data access for machine learning and analytics pipelines, particularly in domains like medical imaging, though it is an incremental improvement over existing data management systems.
The paper tackles the problem of inefficient access to large visual datasets for machine learning workloads by introducing VDMS, a visual data management system that uses metadata graphs and machine-friendly storage formats, achieving 2x faster performance in complex queries compared to existing setups.
We introduce the Visual Data Management System (VDMS), which enables faster access to big-visual-data and adds support to visual analytics. This is achieved by searching for relevant visual data via metadata stored as a graph, and enabling faster access to visual data through new machine-friendly storage formats. VDMS differs from existing large scale photo serving, video streaming, and textual big-data management systems due to its primary focus on supporting machine learning and data analytics pipelines that use visual data (images, videos, and feature vectors), treating these as first class entities. We describe how to use VDMS via its user friendly interface and how it enables rich and efficient vision analytics through a machine learning pipeline for processing medical images. We show the improved performance of 2x in complex queries over a comparable set-up.