AIJan 6, 2015

Constraint-based sequence mining using constraint programming

arXiv:1501.01178v356 citations
Originality Incremental advance
AI Analysis

This work addresses a specific problem in data mining for researchers and practitioners, but it is incremental as it builds on existing constraint categories and techniques.

The paper tackled the lack of a general framework for constraint-based sequence mining by proposing two constraint programming formulations, including a new global constraint, and demonstrated flexibility in experiments compared to existing methods.

The goal of constraint-based sequence mining is to find sequences of symbols that are included in a large number of input sequences and that satisfy some constraints specified by the user. Many constraints have been proposed in the literature, but a general framework is still missing. We investigate the use of constraint programming as general framework for this task. We first identify four categories of constraints that are applicable to sequence mining. We then propose two constraint programming formulations. The first formulation introduces a new global constraint called exists-embedding. This formulation is the most efficient but does not support one type of constraint. To support such constraints, we develop a second formulation that is more general but incurs more overhead. Both formulations can use the projected database technique used in specialised algorithms. Experiments demonstrate the flexibility towards constraint-based settings and compare the approach to existing methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes