LGAIMLMar 12, 2020

PyODDS: An End-to-end Outlier Detection System with Automated Machine Learning

arXiv:2003.05602v153 citations
AI Analysis

This system reduces human effort in outlier detection for users across various domains, though it is incremental as it automates existing techniques.

The authors tackled the problem of manual effort in outlier detection by developing PyODDS, an automated end-to-end system that optimizes detection pipelines for new data sources, demonstrating it on real-world datasets with quantification and visualization.

Outlier detection is an important task for various data mining applications. Current outlier detection techniques are often manually designed for specific domains, requiring large human efforts of database setup, algorithm selection, and hyper-parameter tuning. To fill this gap, we present PyODDS, an automated end-to-end Python system for Outlier Detection with Database Support, which automatically optimizes an outlier detection pipeline for a new data source at hand. Specifically, we define the search space in the outlier detection pipeline, and produce a search strategy within the given search space. PyODDS enables end-to-end executions based on an Apache Spark backend server and a light-weight database. It also provides unified interfaces and visualizations for users with or without data science or machine learning background. In particular, we demonstrate PyODDS on several real-world datasets, with quantification analysis and visualization results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes