Modern Stochastics: Theory and Applications logo


  • Help
Login Register

  1. Home
  2. Issues
  3. Volume 5, Issue 1 (2018)
  4. A moment-distance hybrid method for esti ...

Modern Stochastics: Theory and Applications

Submit your article Information Become a Peer-reviewer
  • Article info
  • Full article
  • More
    Article info Full article

A moment-distance hybrid method for estimating a mixture of two symmetric densities
Volume 5, Issue 1 (2018), pp. 1–36
David Källberg   Yuri Belyaev   Patrik Rydén  

Authors

 
Placeholder
https://doi.org/10.15559/17-VMSTA93
Pub. online: 18 January 2018      Type: Research Article      Open accessOpen Access

Received
4 October 2017
Revised
13 December 2017
Accepted
18 December 2017
Published
18 January 2018

Abstract

In clustering of high-dimensional data a variable selection is commonly applied to obtain an accurate grouping of the samples. For two-class problems this selection may be carried out by fitting a mixture distribution to each variable. We propose a hybrid method for estimating a parametric mixture of two symmetric densities. The estimator combines the method of moments with the minimum distance approach. An evaluation study including both extensive simulations and gene expression data from acute leukemia patients shows that the hybrid method outperforms a maximum-likelihood estimator in model-based clustering. The hybrid estimator is flexible and performs well also under imprecise model assumptions, suggesting that it is robust and suited for real problems.

References

[1] 
Benaglia, T., Chauveau, D., Hunter, D.R., Young, D.: mixtools: An R package for analyzing finite mixture models. Journal of Statistical Software 32(6), 1–29 (2009)
[2] 
Bordes, L., Mottelet, S., Vandekerkhove, P., et al.: Semiparametric estimation of a two-component mixture model. The Annals of Statistics 34(3), 1204–1232 (2006). MR2278356
[3] 
Brouwer, R.K.: Extending the rand, adjusted rand and jaccard indices to fuzzy partitions. Journal of Intelligent Information Systems 32(3), 213–235 (2009)
[4] 
Celeux, G., Chauveau, D., Diebolt, J.: Stochastic versions of the em algorithm: an experimental study in the mixture case. Journal of Statistical Computation and Simulation 55(4), 287–314 (1996)
[5] 
Clarke, B., Heathcote, C.: Robust estimation of k-component univariate normal mixtures. Annals of the Institute of Statistical Mathematics 46(1), 83–93 (1994). MR1272750
[6] 
Cutler, A., Cordero-Braña, O.I.: Minimum hellinger distance estimation for finite mixture models. Journal of the American Statistical Association 91(436), 1716–1723 (1996)
[7] 
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 1–38 (1977)
[8] 
Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97(457), 77–87 (2002)
[9] 
Fan, J., Lv, J.: A selective overview of variable selection in high dimensional feature space. Statistica Sinica 20(1), 101 (2010). MR2640659
[10] 
Freyhult, E., Landfors, M., Önskog, J., Hvidsten, T.R., Rydén, P.: Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering. BMC Bioinformatics 11(1), 503 (2010)
[11] 
Fujisawa, H., Eguchi, S.: Robust estimation in the normal mixture model. Journal of Statistical Planning and Inference 136(11), 3989–4011 (2006)
[12] 
Gleason, J.R.: Understanding elongation: The scale contaminated normal family. Journal of the American Statistical Association 88(421), 327–337 (1993)
[13] 
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
[14] 
Hastie, T., Tibshirani, R., Friedman, J., Franklin, J.: The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer 27(2), 83–85 (2005)
[15] 
Hathaway, R.J.: A constrained formulation of maximum-likelihood estimation for normal mixture distributions. The Annals of Statistics, 795–800 (1985)
[16] 
Hodge, V.J., Austin, J.: A survey of outlier detection methodologies. Artificial Intelligence Review 22(2), 85–126 (2004)
[17] 
Hunter, D.R., Wang, S., Hettmansperger, T.P.: Inference for mixtures of symmetric distributions. The Annals of Statistics, 224–251 (2007)
[18] 
Ju, J., Kolaczyk, E.D., Gopal, S.: Gaussian mixture discriminant analysis and sub-pixel land cover characterization in remote sensing. Remote Sensing of Environment 84(4), 550–560 (2003)
[19] 
McLachlan, G., Peel, D.: Finite Mixture Models. John Wiley & Sons (2004)
[20] 
McLachlan, G.J., Basford, K.E.: Mixture models: Inference and applications to clustering. Applied Statistics (1988)
[21] 
Nelder, J.A., Mead, R.: A simplex method for function minimization. The Computer Journal 7(4), 308–313 (1965). MR3363409
[22] 
Pearson, K.: Contributions to the mathematical theory of evolution. Philosophical Transactions of the Royal Society of London. A, 71–110 (1894)
[23] 
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2017). https://www.R-project.org/
[24] 
Schlattmann, P., Böhning, D.: Mixture models and disease mapping. Statistics in Medicine 12(19–20), 1943–1950 (1993)
[25] 
Sfikas, G., Nikou, C., Galatsanos, N.: Robust Image Segmentation with Mixtures of Student’s t-Distributions. In: IEEE International Conference on Image Processing, 2007. ICIP 2007, vol. 1, p. 273. IEEE (2007)
[26] 
Titterington, D., Smith, A., Makov, U.: Statistical Analysis of Finite Mixture Models. Wiley, Chichester, UK (1985)
[27] 
Wolf, D.M., Lenburg, M.E., Yau, C., Boudreau, A., van ’t Veer, L.J.: Gene co-expression modules as clinically relevant hallmarks of breast cancer diversity. PloS ONE 9(2), 88309 (2014)
[28] 
Woodward, W.A., Parr, W.C., Schucany, W.R., Lindsey, H.: A comparison of minimum distance and maximum likelihood estimation of a mixture proportion. Journal of the American Statistical Association 79(387), 590–598 (1984)

Full article PDF XML
Full article PDF XML

Copyright
© 2018 The Author(s). Published by VTeX
by logo by logo
Open access article under the CC BY license.

Keywords
Inference for mixtures method of moments minimum distance model-based clustering

MSC2010
62F07 62F10 62F35 62P10 92D10

Funding
This work was supported by grants from the Swedish Research Council (P.R.), Dnr 340-2013-5185 (P.R.), the Kempe Foundations (D.K., P.R.), Dnr JCK-1315, and the Faculty of Science and Technology, Umeå University (P.R.).

Metrics
since March 2018
661

Article info
views

711

Full article
views

454

PDF
downloads

169

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

MSTA

MSTA

  • Online ISSN: 2351-6054
  • Print ISSN: 2351-6046
  • Copyright © 2018 VTeX

About

  • About journal
  • Indexed in
  • Editors-in-Chief

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • ejournals-vmsta@vtex.lt
  • Mokslininkų 2A
  • LT-08412 Vilnius
  • Lithuania
Powered by PubliMill  •  Privacy policy