Yiran Dong

ML
h-index2
4papers
Novelty50%
AI Score29

4 Papers

MLJul 1, 2023
Causal Structure Learning by Using Intersection of Markov Blankets

Yiran Dong, Chuanhou Gao

In this paper, we introduce a novel causal structure learning algorithm called Endogenous and Exogenous Markov Blankets Intersection (EEMBI), which combines the properties of Bayesian networks and Structural Causal Models (SCM). Furthermore, we propose an extended version of EEMBI, namely EEMBI-PC, which integrates the last step of the PC algorithm into EEMBI.

MLApr 20, 2022
Gaussian mixture modeling of nodes in Bayesian network according to maximal parental cliques

Yiran Dong, Chuanhou Gao

This paper uses Gaussian mixture model instead of linear Gaussian model to fit the distribution of every node in Bayesian network. We will explain why and how we use Gaussian mixture models in Bayesian network. Meanwhile we propose a new method, called double iteration algorithm, to optimize the mixture model, the double iteration algorithm combines the expectation maximization algorithm and gradient descent algorithm, and it performs perfectly on the Bayesian network with mixture models. In experiments we test the Gaussian mixture model and the optimization algorithm on different graphs which is generated by different structure learning algorithm on real data sets, and give the details of every experiment.

MLJul 4, 2025
LILI clustering algorithm: Limit Inferior Leaf Interval Integrated into Causal Forest for Causal Interference

Yiran Dong, Di Fan, Chuanhou Gao

Causal forest methods are powerful tools in causal inference. Similar to traditional random forest in machine learning, causal forest independently considers each causal tree. However, this independence consideration increases the likelihood that classification errors in one tree are repeated in others, potentially leading to significant bias in causal e ect estimation. In this paper, we propose a novel approach that establishes connections between causal trees through the Limit Inferior Leaf Interval (LILI) clustering algorithm. LILIs are constructed based on the leaves of all causal trees, emphasizing the similarity of dataset confounders. When two instances with di erent treatments are grouped into the same leaf across a su cient number of causal trees, they are treated as counterfactual outcomes of each other. Through this clustering mechanism, LILI clustering reduces bias present in traditional causal tree methods and enhances the prediction accuracy for the average treatment e ect (ATE). By integrating LILIs into a causal forest, we develop an e cient causal inference method. Moreover, we explore several key properties of LILI by relating it to the concepts of limit inferior and limit superior in the set theory. Theoretical analysis rigorously proves the convergence of the estimated ATE using LILI clustering. Empirically, extensive comparative experiments demonstrate the superior performance of LILI clustering.

MLNov 15, 2021
ELBD: Efficient score algorithm for feature selection on latent variables of VAE

Yiran Dong, Chuanhou Gao

In this paper, we develop the notion of evidence lower bound difference (ELBD), based on which an efficient score algorithm is presented to implement feature selection on latent variables of VAE and its variants. Further, we propose weak convergence approximation algorithms to optimize VAE related models through weighing the ``more important" latent variables selected and accordingly increasing evidence lower bound. We discuss two kinds of different Gaussian posteriors, mean-filed and full-covariance, for latent variables, and make corresponding theoretical analyses to support the effectiveness of algorithms. A great deal of comparative experiments are carried out between our algorithms and other 9 feature selection methods on 7 public datasets to address generative tasks. The results provide the experimental evidence of effectiveness of our algorithms. Finally, we extend ELBD to its generalized version, and apply the latter to tackling classification tasks of 5 new public datasets with satisfactory experimental results.