A moment-distance hybrid method for estimating a mixture of two symmetric densities
Volume 5, Issue 1 (2018), pp. 1–36
Pub. online: 18 January 2018
Type: Research Article
Open Access
Received
4 October 2017
4 October 2017
Revised
13 December 2017
13 December 2017
Accepted
18 December 2017
18 December 2017
Published
18 January 2018
18 January 2018
Abstract
In clustering of high-dimensional data a variable selection is commonly applied to obtain an accurate grouping of the samples. For two-class problems this selection may be carried out by fitting a mixture distribution to each variable. We propose a hybrid method for estimating a parametric mixture of two symmetric densities. The estimator combines the method of moments with the minimum distance approach. An evaluation study including both extensive simulations and gene expression data from acute leukemia patients shows that the hybrid method outperforms a maximum-likelihood estimator in model-based clustering. The hybrid estimator is flexible and performs well also under imprecise model assumptions, suggesting that it is robust and suited for real problems.
References
Bordes, L., Mottelet, S., Vandekerkhove, P., et al.: Semiparametric estimation of a two-component mixture model. The Annals of Statistics 34(3), 1204–1232 (2006). MR2278356
Clarke, B., Heathcote, C.: Robust estimation of k-component univariate normal mixtures. Annals of the Institute of Statistical Mathematics 46(1), 83–93 (1994). MR1272750
Fan, J., Lv, J.: A selective overview of variable selection in high dimensional feature space. Statistica Sinica 20(1), 101 (2010). MR2640659
Nelder, J.A., Mead, R.: A simplex method for function minimization. The Computer Journal 7(4), 308–313 (1965). MR3363409
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2017). https://www.R-project.org/