LGJun 3, 2022

Finding Rule-Interpretable Non-Negative Data Representation

arXiv:2206.01483v21 citationsh-index: 20
Originality Incremental advance
AI Analysis

This work addresses interpretability challenges in NMF for researchers in fields like biology and medicine, offering a hybrid approach that combines rule-based interpretability with part-based representation, though it is incremental as it builds on existing NMF and rule-based methods.

The paper tackles the difficulty of interpreting latent factors in Non-negative Matrix Factorization (NMF) by integrating rule-based descriptions to create a lower-dimensional, non-negative representation where factors are described by input rules, enabling clearer interpretation of attributes, interactions, and value ranges.

Non-negative Matrix Factorization (NMF) is an intensively used technique for obtaining parts-based, lower dimensional and non-negative representation. Researchers in biology, medicine, pharmacy and other fields often prefer NMF over other dimensionality reduction approaches (such as PCA) because the non-negativity of the approach naturally fits the characteristics of the domain problem and its results are easier to analyze and understand. Despite these advantages, obtaining exact characterization and interpretation of the NMF's latent factors can still be difficult due to their numerical nature. Rule-based approaches, such as rule mining, conceptual clustering, subgroup discovery and redescription mining, are often considered more interpretable but lack lower-dimensional representation of the data. We present a version of the NMF approach that merges rule-based descriptions with advantages of part-based representation offered by the NMF. Given the numerical input data with non-negative entries and a set of rules with high entity coverage, the approach creates the lower-dimensional non-negative representation of the input data in such a way that its factors are described by the appropriate subset of the input rules. In addition to revealing important attributes for latent factors, their interaction and value ranges, this approach allows performing focused embedding potentially using multiple overlapping target labels.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes