Deep Copula Classifier: Theory, Consistency, and Empirical Evaluation
This provides a practical, theoretically grounded alternative to independence-based classifiers for researchers and practitioners in machine learning, though it is incremental as it builds on existing copula and generative modeling ideas.
The paper tackles the problem of building interpretable and consistent classifiers by introducing the Deep Copula Classifier (DCC), which separates marginal estimation from dependence modeling using neural copula densities, achieving strong performance with accuracy up to 0.971 and ROC-AUC up to 0.998 in controlled studies and outperforming baselines like Logistic Regression on real-world datasets.
We present the Deep Copula Classifier (DCC), a class-conditional generative model that separates marginal estimation from dependence modeling using neural copula densities. DCC is interpretable, Bayes-consistent, and achieves excess-risk $O(n^{-r/(2r+d)})$ for $r$-smooth copulas. In a controlled two-class study with strong dependence ($|ρ|=0.995$), DCC learns Bayes-aligned decision regions. With oracle or pooled marginals, it nearly reaches the best possible performance (accuracy $\approx 0.971$; ROC-AUC $\approx 0.998$). As expected, per-class KDE marginals perform less well (accuracy $0.873$; ROC-AUC $0.957$; PR-AUC $0.966$). On the Pima Indians Diabetes dataset, calibrated DCC ($τ=1$) achieves accuracy $0.879$, ROC-AUC $0.936$, and PR-AUC $0.870$, outperforming Logistic Regression, SVM (RBF), and Naive Bayes, and matching Logistic Regression on the lowest Expected Calibration Error (ECE). Random Forest is also competitive (accuracy $0.892$; ROC-AUC $0.933$; PR-AUC $0.880$). Directly modeling feature dependence yields strong, well-calibrated performance with a clear probabilistic interpretation, making DCC a practical, theoretically grounded alternative to independence-based classifiers.