Causal Clustering for 1-Factor Measurement Models on Data with Various Types
This work incrementally extends causal clustering methods to handle more realistic data types, benefiting researchers in fields like social sciences or medicine where mixed data is common.
The paper proves that the tetrad constraint, used in causal discovery algorithms like FOFC to detect latent variables, extends to cases with mixed data types (e.g., measured variables of mixed types or discrete measured variables with continuous latent causes), enabling such algorithms to work in these scenarios. Simulation studies show FOFC's performance on mixed data, comparing it to similar algorithms.
The tetrad constraint is a condition of which the satisfaction signals a rank reduction of a covariance submatrix and is used to design causal discovery algorithms that detects the existence of latent (unmeasured) variables, such as FOFC. Initially such algorithms only work for cases where the measured and latent variables are all Gaussian and have linear relations (Gaussian-Gaussian Case). It has been shown that a unidimentional latent variable model implies tetrad constraints when the measured and latent variables are all binary (Binary-Binary case). This paper proves that the tetrad constraint can also be entailed when the measured variables are of mixed data types and when the measured variables are discrete and the latent common causes are continuous, which implies that any clustering algorithm relying on this constraint can work on those cases. Each case is shown with an example and a proof. The performance of FOFC on mixed data is shown by simulation studies and is compared with some algorithms with similar functions.