LGFeb 20, 2024

Learning Exceptional Subgroups by End-to-End Maximizing KL-divergence

arXiv:2402.12930v18 citationsh-index: 6ICML
Originality Incremental advance
AI Analysis

This addresses the need for scalable and interpretable subgroup discovery in scientific domains, though it appears incremental by building on existing normalizing flow techniques.

The paper tackled the problem of identifying exceptional subgroups in data, which is important for applications like demographic analysis and material discovery, by proposing Syflow, an end-to-end method that reliably finds highly exceptional subgroups with interpretable descriptions on synthetic and real-world datasets.

Finding and describing sub-populations that are exceptional regarding a target property has important applications in many scientific disciplines, from identifying disadvantaged demographic groups in census data to finding conductive molecules within gold nanoparticles. Current approaches to finding such subgroups require pre-discretized predictive variables, do not permit non-trivial target distributions, do not scale to large datasets, and struggle to find diverse results. To address these limitations, we propose Syflow, an end-to-end optimizable approach in which we leverage normalizing flows to model arbitrary target distributions, and introduce a novel neural layer that results in easily interpretable subgroup descriptions. We demonstrate on synthetic and real-world data, including a case study, that Syflow reliably finds highly exceptional subgroups accompanied by insightful descriptions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes