Variable selection for clustering with Gaussian mixture models: state of the art
This is an incremental review article that surveys state-of-the-art variable selection techniques for model-based clustering, targeting researchers and practitioners dealing with high-dimensional data.
The paper addresses the problem of variable selection in Gaussian mixture models for clustering, which is essential for handling large modern databases, and reviews existing methods while suggesting opportunities for improvement.
The mixture models have become widely used in clustering, given its probabilistic framework in which its based, however, for modern databases that are characterized by their large size, these models behave disappointingly in setting out the model, making essential the selection of relevant variables for this type of clustering. After recalling the basics of clustering based on a model, this article will examine the variable selection methods for model-based clustering, as well as presenting opportunities for improvement of these methods.