DBAILGJan 8, 2024

Rastro-DM: data mining with a trail

arXiv:2401.03925v1h-index: 12
Originality Synthesis-oriented
AI Analysis

It addresses documentation gaps in data mining for institutional use, though it is incremental as it complements existing methodologies like CRISP-DM.

The paper tackles the problem of documenting data mining projects by proposing Rastro-DM, a methodology focused on tracking processes rather than models, and illustrates its application in a project for classifying PDF documents in Brazilian public treasury investigations.

This paper proposes a methodology for documenting data mining (DM) projects, Rastro-DM (Trail Data Mining), with a focus not on the model that is generated, but on the processes behind its construction, in order to leave a trail (Rastro in Portuguese) of planned actions, training completed, results obtained, and lessons learned. The proposed practices are complementary to structuring methodologies of DM, such as CRISP-DM, which establish a methodological and paradigmatic framework for the DM process. The application of best practices and their benefits is illustrated in a project called 'Cladop' that was created for the classification of PDF documents associated with the investigative process of damages to the Brazilian Federal Public Treasury. Building the Rastro-DM kit in the context of a project is a small step that can lead to an institutional leap to be achieved by sharing and using the trail across the enterprise.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes