Efficient comparison of independence structures of log-linear models
This work addresses a methodological gap for researchers in fields like epidemiology and sociology who use log-linear models to analyze variable relationships, though it is incremental as it builds on existing structure learning methods.
The paper tackles the problem of comparing independence structures of log-linear models, which previously relied on indirect measures requiring full density estimation, by introducing the first direct and efficient metric for this purpose, with proof that it is a metric and an algorithm efficient in the number of variables.
Log-linear models are a family of probability distributions which capture relationships between variables. They have been proven useful in a wide variety of fields such as epidemiology, economics and sociology. The interest in using these models is that they are able to capture context-specific independencies, relationships that provide richer structure to the model. Many approaches exist for automatic learning of the independence structure of log-linear models from data. The methods for evaluating these approaches, however, are limited, and are mostly based on indirect measures of the complete density of the probability distribution. Such computation requires additional learning of the numerical parameters of the distribution, which introduces distortions when used for comparing structures. This work addresses this issue by presenting the first measure for the direct and efficient comparison of independence structures of log-linear models. Our method relies only on the independence structure of the models, which is useful when the interest lies in obtaining knowledge from said structure, or when comparing the performance of structure learning algorithms, among other possible uses. We present proof that the measure is a metric, and a method for its computation that is efficient in the number of variables of the domain.