VC-Dimension Based Generalization Bounds for Relational Learning
This work addresses generalization guarantees for relational models, which is important for applications like social network analysis, but it appears incremental as it adapts existing VC-dimension concepts to a specific relational setting.
The paper tackles the problem of bounding the error of sufficient statistics in relational learning when data is a uniformly sampled complete substructure, and proves a generalization bound using a variant of the VC-dimension tailored for relational data.
In many applications of relational learning, the available data can be seen as a sample from a larger relational structure (e.g. we may be given a small fragment from some social network). In this paper we are particularly concerned with scenarios in which we can assume that (i) the domain elements appearing in the given sample have been uniformly sampled without replacement from the (unknown) full domain and (ii) the sample is complete for these domain elements (i.e. it is the full substructure induced by these elements). Within this setting, we study bounds on the error of sufficient statistics of relational models that are estimated on the available data. As our main result, we prove a bound based on a variant of the Vapnik-Chervonenkis dimension which is suitable for relational data.