A Picture's Worth a Thousand Words: Visualizing n-dimensional Overlap in Logistic Regression Models with Empirical Likelihood
This work addresses a methodological bottleneck for statisticians and data scientists working with logistic regression, offering incremental improvements in visualization and computational assessment of overlap.
The paper tackles the problem of assessing overlap conditions in logistic regression models by translating Silvapulle's condition into an empirical likelihood maximization framework, making it computationally tractable with existing R code. It applies this to analyze minimal overlapping structures in dimensions up to three and provides rules for higher dimensions.
In this note, conditions for the existence and uniqueness of the maximum likelihood estimate for multidimensional predictor, binary response models are introduced from a sensitivity testing point of view. The well known condition of Silvapulle is translated to be an empirical likelihood maximization which, with existing R code, mechanizes the process of assessing overlap status. The translation shifts the meaning of overlap, defined by geometrical properties of the two-predictor groups, from the intersection of their convex cones is non-empty to the more understandable requirement that the convex hull of their differences contains zero. The code is applied to reveal the character of overlap by examining minimal overlapping structures and cataloging them in dimensions fewer than four. Rules to generate minimal higher dimensional structures which account for overlap are provided. Supplementary materials are available online.