Inferring Latent Structure From Mixed Real and Categorical Relational Data
This work addresses the challenge of modeling complex relational data for researchers in statistics and machine learning, but it appears incremental as it builds on existing latent feature models with specific adaptations for mixed data types.
The paper tackles the problem of analyzing mixed real and categorical relational data by inferring latent binary feature vectors for rows and columns, modeling them with a low-rank multivariate Gaussian distribution and a probit link, and it results in uncovering latent low-dimensional binary features and correlation structures.
We consider analysis of relational data (a matrix), in which the rows correspond to subjects (e.g., people) and the columns correspond to attributes. The elements of the matrix may be a mix of real and categorical. Each subject and attribute is characterized by a latent binary feature vector, and an inferred matrix maps each row-column pair of binary feature vectors to an observed matrix element. The latent binary features of the rows are modeled via a multivariate Gaussian distribution with low-rank covariance matrix, and the Gaussian random variables are mapped to latent binary features via a probit link. The same type construction is applied jointly to the columns. The model infers latent, low-dimensional binary features associated with each row and each column, as well correlation structure between all rows and between all columns.