1 Introduction
Finite mixture models (FMMs) arise naturally in statistical analysis of biological and sociological data [11, 13]. The model of mixture with varying concentrations (MVC) is a modification of the FMM where the mixing probabilities may be different for different observations. Namely, we consider a sample of subjects $O_{1},\dots ,O_{N}$ where each subject belongs to one of subpopulations (mixture components) $\mathcal{P}_{1},\dots ,\mathcal{P}_{M}$. The true subpopulation to which the subject $O_{j}$ belongs is unknown, but we know the probabilities ${p_{j;N}^{m}}=\operatorname{\mathsf{P}}[O_{j}\in \mathcal{P}_{m}]$ (mixing probabilities, concentrations of $\mathcal{P}_{m}$ in the mixture at the jth observation, $j=1,\dots ,N$, $m=1,\dots ,M$). For each subject O, a variable $\xi (O)$ is observed, which is considered as a random element in a measurable space $\mathfrak{X}$ equipped by a σ-algebra $\mathfrak{F}$. Let
The observations $\xi _{j;N}$ are assumed to be independent for $j=1,\dots ,N$.
\[F_{m}(A)=\operatorname{\mathsf{P}}\big[\xi (O)\in A\hspace{2.5pt}|\hspace{2.5pt}O\in \mathcal{P}_{m}\big],\hspace{1em}A\in \mathfrak{F},\]
be the distribution of $\xi (O)$ for subjects O that belong to the mth component. Then the unconditional distribution of $\xi _{j;N}=\xi (O_{j})$ is
(1)
\[\operatorname{\mathsf{P}}[\xi _{j;N}\in A]=\sum \limits_{m=1}^{M}{p_{j;N}^{m}}F_{m}(A),\hspace{1em}A\in \mathfrak{F}.\]We consider the nonparametric MVC model where the concentrations ${p_{j;N}^{m}}$ are known but the component distributions $F_{m}$ are completely unknown. Such models were applied to analyze gene expression level data [8] and data on sensitive questions in sociology [12]. An example of sociological data analysis based on MVC is presented in [9]. In this paper, we consider adherents of different political parties in Ukraine as subpopulations $\mathcal{P}_{i}$. Their concentrations are deduced from 2006 parliament election results in different regions of Ukraine. Individual voters are considered as subjects; their observed characteristics are taken from the Four-Wave Values Survey held in Ukraine in 2006. (Note that the political choices of the surveyed individuals were unknown. So, each subject must be considered as selected from mixture of different $\mathcal{P}_{i}$.) For example, one of the observed characteristics is the satisfaction of personal income (in points from 1 to 10).
A natural question in the analysis of such data is homogeneity testing for different components. For example, if $\mathfrak{X}=\mathbb{R}$, then we may ask if the means or variances (or both) of the distributions $F_{i}$ and $F_{k}$ are the same for some fixed i and k or if the variances of all the components are the same.
In [8], a test is proposed for the hypothesis of two-means homogeneity. In this paper, we generalize the approach from [8] to a much richer class of hypotheses, including different statements on means, variances, and other generalized functional moments of component distributions.
Hypotheses of equality of MVC component distributions, that is, $F_{i}\equiv F_{k}$, were considered in [6] (a Kolmogorov–Smirnov-type test is proposed) and [1] (tests based on wavelet density estimation). The technique of our paper also allows testing such hypotheses using the “grouped ${\chi }^{2}$”-approach.
The rest of the paper is organized as follows. We describe the considered hypotheses formally and discuss the test construction in Section 2. Section 3 contains auxiliary information on the functional moments estimation in MVC models. In Section 4, the test is described formally. Section 5 contains results of the test performance analysis by a simulation study and an example of real-life data analysis. Technical proofs are given in Appendix A.
2 Problem setting
In the rest of the paper, we use the following notation.
The zero vector from ${\mathbb{R}}^{k}$ is denoted by $\mathbb{O}_{k}$. The unit $k\times k$-matrix is denoted by $\mathbb{I}_{k\times k}$, and the $k\times m$-zero matrix by $\mathbb{O}_{k\times m}$. Convergences in probability and in distribution are denoted $\stackrel{P}{\longrightarrow }$ and $\stackrel{d}{\longrightarrow }$, respectively.
We consider the set of concentrations $p=({p_{j;N}^{m}},j=1,\dots ,N;\hspace{2.5pt}m=1,\dots ,M;N=1,\dots \hspace{0.1667em})$ as an infinite array, ${p_{\operatorname{\mathbf{\cdot }};N}^{\operatorname{\mathbf{\cdot }}}}=({p_{j;N}^{m}},j=1,\dots ,N;\hspace{2.5pt}m=1,\dots ,M)$ as an $(N\times m)$-matrix, and ${p_{\operatorname{\mathbf{\cdot }};N}^{m}}=({p_{j;N}^{m}},j=1,\dots ,N)\in {\mathbb{R}}^{d}$ and ${p_{j,N}^{\operatorname{\mathbf{\cdot }}}}=({p_{j;N}^{m}},m=1,\dots ,M)$ as column vectors. The same notation is used for arrays of similar structure, such as the array a introduced further.
Angle brackets with subscript N denote averaging of an array over all the observations, for example,
\[\big\langle {a_{\operatorname{\mathbf{\cdot }};N}^{m}}\big\rangle _{N}=\frac{1}{N}\sum \limits_{j=1}^{N}{a_{j;N}^{m}}.\]
Multiplication, summation, and other similar operations inside the angle brackets are applied to the arrays componentwise, so that
\[\big\langle {a_{\operatorname{\mathbf{\cdot }};N}^{m}}{p_{\operatorname{\mathbf{\cdot }};N}^{k}}\big\rangle _{N}=\frac{1}{N}\sum \limits_{j=1}^{N}{a_{j;N}^{m}}{p_{j;N}^{k}},\hspace{2em}\big\langle {\big({a_{\operatorname{\mathbf{\cdot }};N}^{m}}\big)}^{2}\big\rangle _{N}=\frac{1}{N}\sum \limits_{j=1}^{N}{\big({a_{j;N}^{m}}\big)}^{2},\]
and so on.Angle brackets without subscript mean the limit of the corresponding averages as $N\to \infty $ (assuming that this limit exists):
\[\big\langle {p}^{m}{a}^{k}\big\rangle =\underset{N\to \infty }{\lim }\big\langle {p_{\operatorname{\mathbf{\cdot }};N}^{m}}{a_{\operatorname{\mathbf{\cdot }};N}^{k}}\big\rangle _{N}.\]
We introduce formally random elements $\eta _{m}\in \mathfrak{X}$ with distributions $F_{m}$, $m=1,\dots ,M$.Consider a set of $K\le M$ measurable functions $g_{k}:\mathfrak{X}\to {\mathbb{R}}^{d_{k}}$, $k=1,\dots ,K$. Let ${\bar{g}_{k}^{m}}$ be the (vector-valued) functional moment of the mth component with moment function $g_{k}$, that is,
Fix a measurable function $T:{\mathbb{R}}^{d_{1}}\times {\mathbb{R}}^{d_{2}}\times \cdots \times {\mathbb{R}}^{d_{K}}\to {\mathbb{R}}^{L}$. For data described by the MVC model (1) we consider testing a null-hypothesis of the form
against the general alternative $T({\bar{g}_{1}^{1}},\dots ,{\bar{g}_{K}^{K}})\ne \mathbb{O}_{L}$.
Example 1.
Consider a three-component mixture ($M=3$) with $\mathfrak{X}=\mathbb{R}$. We would like to test the hypothesis ${H_{0}^{\sigma }}:\operatorname{Var}\eta _{1}=\operatorname{Var}\eta _{2}$ (i.e., the variances of the first and second components are the same). This hypothesis can be reformulated in the form (3) by letting $g_{1}(x)=g_{2}(x)={(x,{x}^{2})}^{T}$ and $T({(y_{11},y_{12})}^{T},{(y_{21},y_{22})}^{T})={(y_{12}-{(y_{11})}^{2},y_{22}-{(y_{21})}^{2})}^{T}$.
Example 2.
Let $\mathfrak{X}=\mathbb{R}$. Consider the hypothesis of mean homogeneity ${H_{0}^{\mu }}:\operatorname{\mathsf{E}}\eta _{1}=\cdots =\operatorname{\mathsf{E}}\eta _{M}$. Then the choice of $g_{m}(x)=x$, $T(y_{1},\dots ,y_{M})={(y_{1}-y_{2},y_{2}-y_{3},\dots ,y_{M-1}-y_{M})}^{T}$ reduces ${H_{0}^{\mu }}$ to the form (3).
Example 3.
Let $\mathfrak{X}$ be a finite discrete space: $\mathfrak{X}=\{x_{1},\dots ,x_{r}\}$. Consider the distribution homogeneity hypothesis ${H_{0}^{\equiv }}:F_{1}\equiv F_{2}$. To present it in the form (3), we can use $g_{i}(x)={(\mathbb{1}\{x=x_{i}\},k=1,\dots ,r-1)}^{T}$ and $T(y_{1},y_{2})=y_{1}-y_{2}$ ($y_{i}\in {\mathbb{R}}^{r-1}$ for $i=1,2$). In the case of continuous distributions, ${H_{0}^{\equiv }}$ can be discretized by data grouping.
To test $H_{0}$ defined by (3), we adopt the following approach. Let there be some consistent estimators ${\hat{g}_{k;N}^{m}}$ for ${\bar{g}_{k}^{m}}$. Assume that T is continuous. Consider the statistic $\hat{T}_{N}=T({\hat{g}_{1;N}^{1}},\dots ,{\hat{g}_{K;N}^{K}})$. Then, under $H_{0}$, $\hat{T}_{N}\approx \mathbb{O}_{L}$, and a far departure of $\hat{T}_{N}$ from zero will evidence in favor of the alternative.
To measure this departure, we use a Mahalanobis-type distance. If $\sqrt{N}\hat{T}_{N}$ is asymptotically normal with a nonsingular asymptotic covariance matrix D, then, under $H_{0}$, $N{\hat{T}_{N}^{T}}{D}^{-1}\hat{T}_{N}$ is asymptotically ${\chi }^{2}$-distributed. In fact, D depends on unknown component distributions $F_{i}$, so we replace it by its consistent estimator $\hat{D}_{N}$. The resulting statistic $\hat{s}_{N}=N{\hat{T}_{N}^{T}}{\hat{D}_{N}^{-1}}\hat{T}_{N}$ is a test statistic. The test rejects $H_{0}$ if $\hat{s}_{N}>{Q}^{{\chi _{L}^{2}}}(1-\alpha )$, where α is the significance level, and ${Q}^{G}(\alpha )$ denotes the quantile of level α for distribution G.
Possible candidates for the role of estimators ${\hat{g}_{k;N}^{m}}$ and $\hat{D}_{N}$ are considered in the next section.
3 Estimation of functional moments
Let us start with the nonparametric estimation of $F_{m}$ by the weighted empirical distribution of the form
where ${a_{j;N}^{m}}$ are some nonrandom weights to be selected “in the best way.” Denote $e_{m}={(\mathbb{1}\{k=m\},k=1,\dots ,M)}^{T}$ and
\[\varGamma _{N}=\frac{1}{N}{\big({p_{\operatorname{\mathbf{\cdot }};N}^{\operatorname{\mathbf{\cdot }}}}\big)}^{T}{p_{\operatorname{\mathbf{\cdot }};N}^{\operatorname{\mathbf{\cdot }}}}={\big(\big\langle {p_{\operatorname{\mathbf{\cdot }};N}^{m}}{p_{\operatorname{\mathbf{\cdot }};N}^{i}}\big\rangle _{N}\big)_{m,i=1}^{M}}.\]
Assume that $\varGamma _{N}$ is nonsingular. It is shown in [8] that, in this case, the weight array
yields the unbiased estimator with minimal assured quadratic risk.The simple estimator ${\hat{g}_{i;N}^{m}}$ for ${\bar{g}_{i}^{m}}$ is defined as
\[{\hat{g}_{i;N}^{m}}=\int _{\mathfrak{X}}g_{i}(x)\hat{F}_{m;N}(dx)=\frac{1}{N}\sum \limits_{j=1}^{N}{a_{j;N}^{m}}g_{i}(\xi _{j;N}).\]
We denote $\varGamma =\lim _{N\to \infty }\varGamma _{N}={(\langle {p}^{i}{p}^{m}\rangle )_{i,m=1}^{M}}$. Let $h:\mathfrak{X}\to {\mathbb{R}}^{d}$ be any measurable function.Theorem 1 ([9], Lemma 1).
Assume that:
Then ${\hat{h}_{N}^{m}}\stackrel{P}{\longrightarrow }\operatorname{\mathsf{E}}[h(\eta _{m})]$ as $N\to \infty $ for all $m=1,\dots ,M$.
To formulate the asymptotic normality result for the simple moment estimators, we need some additional notation.
We consider the set of all moments ${\bar{g}_{k}^{k}}$, $k=1,\dots ,K$, as one long vector belonging to ${\mathbb{R}}^{d}$, $d:=d_{1}+\cdots +d_{K}$:
The corresponding estimators also form a long vector
We denote the matrices of mixed second moments of $g_{k}(x)$, $k=1,\dots ,K$, and the corresponding estimators as
(4a)
\[\bar{g}:={\big({\big({\bar{g}_{1}^{1}}\big)}^{T},\dots ,{\big({\bar{g}_{K}^{K}}\big)}^{T}\big)}^{T}\in {\mathbb{R}}^{d}.\](4b)
\[\hat{g}_{N}:={\big({\big({\hat{g}_{1;N}^{1}}\big)}^{T},\dots ,{\big({\hat{g}_{K;N}^{K}}\big)}^{T}\big)}^{T}\in {\mathbb{R}}^{d}.\]We consider the function T as a function of d-dimensional argument, that is, $T(y):=T({y}^{1},\dots ,{y}^{K})$. Then $\hat{T}_{N}:=T(\hat{g}_{N})=T({\hat{g}_{1;N}^{1}},\dots ,{\hat{g}_{K;N}^{K}})$.
Let us define the following matrices (assuming that the limits exist):
(6a)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \alpha _{r,s;N}& \displaystyle :=\big({\alpha _{r,s;N}^{k,l}}\big)_{k,l=1,\dots ,K}:=\big(\big\langle {a_{\cdot ;N}^{k}}{a_{\cdot ;N}^{l}}{p_{\cdot ;N}^{r}}{p_{\cdot ;N}^{s}}\big\rangle _{N}\big)_{k,l=1,\dots ,K}\in {\mathbb{R}}^{K\times K};\end{array}\]Then the asymptotic covariance matrix of the normalized estimate $\sqrt{N}(\hat{g}_{N}-\bar{g})$ is Σ, where Σ consists of the blocks ${\varSigma }^{(k,l)}$:
Theorem 2.
Assume that:
-
(i) The functional moments ${\bar{g}_{k}^{m}}$, ${\bar{g}_{k,l}^{m}}$ exist and are finite for $k,l=1,\dots ,K$, $m=1,\dots ,M$.
-
(ii) There exists $\delta >0$ such that $\operatorname{\mathsf{E}}[|g_{k}(\eta _{m}){|}^{2+\delta }]<\infty $, $k=1,\dots ,K$, $m=1,\dots ,M$.
-
(iii) There exist finite matrices Γ, ${\varGamma }^{-1}$, $\alpha _{r,s}$, and $\beta _{m}$ for $r,s,m=1,\dots ,M$.
Then $\sqrt{N}(\hat{g}_{N}-\bar{g})\stackrel{d}{\longrightarrow }\zeta \simeq \mathcal{N}(\mathbb{O}_{d},\varSigma )$, $N\to \infty $.
Thus, to construct a test for $H_{0}$, we need a consistent estimator for Σ. The matrices $\alpha _{r,s;N}$ and $\beta _{m;N}$ are natural estimators for $\alpha _{r,s}$ and $\beta _{m}$. It is also natural to estimate ${\bar{g}_{k,l}^{m}}$ by ${\hat{g}_{k,l;N}^{m}}$ defined in (5b). In view of Theorem 1, these estimators are consistent under the assumptions of Theorem 2. But they can possess undesirable properties for moderate sample size. Indeed, note that $\hat{F}_{m;N}$ is not a probability distribution itself since the weights ${a_{j;N}^{m}}$ are negative for some j. Therefore, for example, the simple estimator of the second moment of some component can be negative, estimator (5b) for the positive semidefinite matrix ${\bar{g}_{k,k}^{m}}$ can be not positive semidefinite matrix, and so on. Due to the asymptotic normality result, this is not too troublesome for estimation of $\bar{g}$. But it causes serious difficulties when one uses an estimator of the asymptotic covariance matrix D based on ${\hat{g}_{k,l;N}^{m}}$ in order to calculate $\hat{s}_{N}$.
In [10], a technique is developed of $\hat{F}_{m;N}$ and ${\hat{h}_{N}^{m}}$ improvement that allows one to derive estimators with more adequate finite sample properties if $\mathfrak{X}=\mathbb{R}$.
So, assume that $\xi (O)\in \mathbb{R}$ and consider the weighted empirical CDF
It is not a nondecreasing function, and it can attain values outside $[0,1]$ since some ${a_{j;N}^{m}}$ are negative. The transform
yields a monotone function $\tilde{F}_{m;N}(x)$, but it still can be greater than 1 at some x. So, define
as the improved estimator for $F_{m}(x)$. Note that this is an “improvement upward,” since ${\tilde{F}_{m;N}^{+}}(x)\ge \hat{F}_{m;N}(x)$. Similarly, a downward improved estimator can be defined as
\[\begin{array}{r@{\hskip0pt}l}\displaystyle {\tilde{F}_{m;N}^{-}}(x)& \displaystyle =\underset{y\ge x}{\inf }\hat{F}_{m;N}(y),\\{} \displaystyle {\hat{F}_{m;N}^{-}}(x)& \displaystyle =\max \big\{0,{\tilde{F}_{m;N}^{-}}(x)\big\}.\end{array}\]
Any CDF that lies between ${\hat{F}_{m;N}^{-}}(x)$ and ${\hat{F}_{m;N}^{+}}(x)$ can be considered as an improved version of $\hat{F}_{m;N}(x)$. We will use only one such improvement, which combines ${\hat{F}_{m;N}^{-}}(x)$ and ${\hat{F}_{m;N}^{+}}(x)$:
(9)
\[{\hat{F}_{m;N}^{\pm }}(x)=\left\{\begin{array}{l@{\hskip10.0pt}l}{\hat{F}_{m;N}^{+}}(x)\hspace{1em}& \text{if}\hspace{2.5pt}{\hat{F}_{m;N}^{+}}(x,a)\le 1/2\text{,}\\{} {\hat{F}_{m;N}^{-}}(x)\hspace{1em}& \text{if}\hspace{2.5pt}{\hat{F}_{m;N}^{-}}(x,a)\ge 1/2\text{,}\\{} 1/2\hspace{1em}& \text{otherwise.}\end{array}\right.\]Note that all the three considered estimators ${\hat{F}_{m;N}^{\ast }}$ (∗ means any symbol from +, −, or ±) are piecewise constants on intervals between successive order statistics of the data. Thus, they can be represented as
\[{\hat{F}_{m;N}^{\ast }}(x)=\frac{1}{N}\sum \limits_{j=1}^{N}{b_{j;N}^{m\ast }}\mathbb{1}\{\xi _{j;N}<x\},\]
where ${b_{j;N}^{m\ast }}$ are some random weights that depend on the data.The corresponding improved estimator for ${\bar{g}_{i}^{m}}$ is
\[{\hat{g}_{i;N}^{m\ast }}={\int _{-\infty }^{+\infty }}g_{i}(x){\hat{F}_{m;N}^{\ast }}(dx)=\frac{1}{N}\sum \limits_{j=1}^{N}{b_{j;N}^{\ast }}g_{i}(\xi _{j;N}).\]
Let $h:\mathbb{R}\to \mathbb{R}$ be a measurable function.
4 Construction of the test
We first state an asymptotic normality result for $\hat{T}_{N}$. Denote
Theorem 4.
Assume that:
-
(i) ${T^{\prime }}(\bar{g})$ exist.
-
(ii) The assumptions of Theorem 2 hold.
-
(iii) The matrix $D={T^{\prime }}(\bar{g})\varSigma {({T^{\prime }}(\bar{g}))}^{T}$ is nonsingular.
Then, under $H_{0}$, $\sqrt{N}\hat{T}_{N}\stackrel{d}{\longrightarrow }N(\mathbb{O}_{L},D)$.
For the proof, see Appendix. Note that (iii) implies the nonsingularity of Σ.
Now, to estimate D, we can use
where ${\tilde{g}_{k,l;N}^{m}}$ is any consistent estimator for ${\bar{g}_{k,l;N}^{m}}$. For example, we can use
\[\hat{D}_{N}={T^{\prime }}(\tilde{g}_{N})\tilde{\varSigma }_{N}{\big({T^{\prime }}(\tilde{g}_{N})\big)}^{T},\]
where $\tilde{g}_{N}$ is any consistent estimator for $\bar{g}$, (10a)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle {\tilde{\varSigma }_{N}^{(k,l)}}& \displaystyle :=\sum \limits_{m=1}^{M}{\beta _{m;N}^{k,l}}{\tilde{g}_{k,l;N}^{m}}-\sum \limits_{r,s=1}^{M}{\alpha _{r,s;N}^{k,l}}{\tilde{g}_{k;N}^{r}}{\big({\tilde{g}_{l;N}^{s}}\big)}^{T}\in {\mathbb{R}}^{d_{k}\times d_{l}};\end{array}\]
\[{\tilde{g}_{k,l;N}^{m}}={\hat{g}_{k,l;N}^{m\pm }}=\frac{1}{N}\sum \limits_{j=1}^{N}{b_{j;N}^{m\pm }}g_{k}(\xi _{j;N})g_{l}{(\xi _{j;N})}^{T}\]
if $\mathfrak{X}=\mathbb{R}$ and the assumptions of Theorem 3 hold for all $h(x)={g_{l}^{i}}(x){g_{k}^{n}}(x)$, $i,k=1,\dots ,K$, $i=1,\dots ,d_{l}$, $n=1,\dots ,d_{k}$, $g_{l}(x)={({g_{l}^{1}}(x),\dots ,{g_{l}^{d_{l}}}(x))}^{T}$.Now let the test statistic be $\hat{s}_{N}=N{(\hat{T}_{N})}^{T}{\hat{D}_{N}^{-1}}\hat{T}_{N}$. For a given significance level α, the test $\pi _{N,\alpha }$ accepts $H_{0}$ if $\hat{s}_{N}\le {Q}^{\xi _{L}}(1-\alpha )$ and rejects $H_{0}$ otherwise.
The p-level of the test (i.e., the attained significance level) can be calculated as $p=1-G(\hat{s}_{N})$, where G means the CDF of ${\chi _{L}^{2}}$-distribution.
Theorem 5.
Let the assumptions of Theorem 4 hold. Moreover, assume the following:
Then $\lim _{N\to \infty }\operatorname{\mathsf{P}}_{H_{0}}\{\pi _{N,\alpha }\hspace{2.5pt}\text{rejects}\hspace{2.5pt}H_{0}\}=\alpha $.
Example 2 (Continued).
Consider testing ${H_{0}^{\mu }}$ by the test $\pi _{N,\alpha }$ with $g_{i}(x)=x$ and $T(y_{1},\dots ,y_{M})={(y_{1}-y_{2},y_{2}-y_{3},\dots ,y_{M-1}-y_{M})}^{T}$. It is obvious that ${T^{\prime }}(y)$ is a constant matrix of full rank. Assume that $\operatorname{Var}[\eta _{m}]>0$ for all $m=1,\dots ,M$ and $\det \varGamma \ne 0$. Then Σ is nonsingular, and so is D. Thus, in this case, assumptions (i) and (iv) of Theorem 2, (i) and (iii) of Theorem 4, and (ii) of Theorem 5 hold.
To ensure assumption (ii) of Theorem 2, we need $\operatorname{\mathsf{E}}[|\eta _{m}{|}^{2+\delta }]<\infty $ for some $\delta >0$ and all $m=1,\dots ,M$. In view of Theorem 1, this assumption also implies the consistency of $\hat{g}_{N}$ and ${\hat{g}_{kl;N}^{m}}$. If one uses ${\hat{g}_{N}^{\pm }}$ and ${\hat{g}_{kl;N}^{m\pm }}$ as estimators $\tilde{g}_{N}$ and ${\tilde{g}_{kl;N}^{m}}$ in $\hat{D}_{N}$, then a more restrictive assumption $\operatorname{\mathsf{E}}[|\eta _{m}{|}^{4+\delta }]<\infty $ is needed to ensure their consistency by Theorem 3.
5 Numerical results
5.1 Simulation study
To access the proposed test performance on samples of moderate size, we conducted a small simulation study. Three-component mixtures were analyzed ($M=3$) with Gaussian components $F_{m}\sim N(\mu _{m},{\sigma _{m}^{2}})$. The concentrations were generated as ${p_{j,N}^{m}}={\zeta _{j;N}^{m}}/s_{j;N}$, where ${\zeta _{j;N}^{m}}$ are independent, uniformly distributed on $[0,1]$ random variables, and $s_{j;N}={\sum _{m=1}^{M}}{\zeta _{j;N}^{m}}$. In all the experiments, 1000 samples were generated for each sample size $N=$ 50, 100, 250, 500, 750, 1000, 2000, and 5000. Three modifications of $\pi _{N;\alpha }$ test were applied to each sample. In the first modification, (ss), simple estimators were used to calculate both $\hat{T}_{N}$ and $\hat{D}_{N}$. In the second modification, (si), simple estimators were used in $\hat{T}_{N}$, and the improved ones were used in $\hat{D}_{N}$. In the last modification (ii), improved estimators were used in $\hat{T}_{N}$ and $\hat{D}_{N}$. Note that the modification (ii) has no theoretical justification since, as far as we know, there are no results on the limit distribution of $\sqrt{N}({\hat{g}_{N}^{\pm }}-\bar{g})$.
All tests were used with the nominal significance level $\alpha =0.05$.
In the figures, frequencies of errors of the tests are presented. In the plots, □ corresponds to (ss), △ to (si), and ∘ to (ii) modification.
Experiment A1.
In this experiment, we consider testing the mean homogeneity hypothesis ${H_{0}^{\mu }}$. The means were taken $\mu _{m}=0$, $m=1,2,3$, so ${H_{0}^{\mu }}$ holds. To shadow the equality of means, different variances of components were taken, namely ${\sigma _{1}^{2}}=1$, ${\sigma _{2}^{2}}=4$, and ${\sigma _{3}^{2}}=9$. The resulting first-type error frequencies are presented on the left panel of Fig. 1. For the (ss) test, for small N, there were 1.4% cases of incorrect covariance matrix estimates ($\hat{D}_{N}$ was not positive definite). Incorrect estimates were absent for large N.
Experiment A2.
Here we also tested ${H_{0}^{\mu }}$ for components with the same variances as in A1. But $\mu _{1}=2$ and $\mu _{2}=\mu _{3}=0$, so ${H_{0}^{\mu }}$ does not hold. The frequencies of the second-type error are presented on the right panel of Fig. 1. The percent of incorrect estimates $\hat{D}_{N}$ is 1.6% for (ss) and small N.
Experiment B1.
In this and the next experiment, we tested ${H_{0}^{\sigma }}$: ${\sigma _{1}^{2}}={\sigma _{2}^{2}}$. The data were generated with $\mu _{1}=0$, $\mu _{2}=3$, $\mu _{3}=-2$, ${\sigma _{1}^{2}}={\sigma _{2}^{2}}=1$, and ${\sigma _{2}^{2}}=4$, so ${H_{0}^{\sigma }}$ holds. The frequencies of the first -type error are presented on the left panel of Fig. 2. The percent of incorrect $\hat{D}_{N}$ in (ss) varies from 19.4% for small N to 0% for large N.
Experiment B2.
Now $\mu _{m}$ and ${\sigma _{3}^{2}}$ are the same as in B1, but ${\sigma _{1}^{2}}=1$ and ${\sigma _{2}^{2}}=4$, so ${H_{0}^{\sigma }}$ does not hold. The frequencies of the second-type error are presented on the left panel of Fig. 2. The percent of incorrect $\hat{D}_{N}$ in (ss) was 15.5% for small N and decreases to 0% for large N.
5.2 Example of a sociological data analysis
Consider the data discussed in [9]. It consists of two parts. The first part is the data from the Four-Wave World Values Survey (FWWVS) held in Ukraine by the European Values Study Foundation (www.europeanvalues.nl) and World Values Survey Association (www.worldvaluessurvey.org) in 2006. They contain answers of $N=4006$ Ukrainian respondents on different questions about their social status and attitudes to different human values. We consider here the level of satisfaction of personal income (subjective income) as our variable of interest ξ, so $\xi _{j;N}$ is the subjective income of the jth respondent.
Our aim is to analyze differences in the distribution of ξ on populations of adherents of different political parties. Namely, we use the data on results of Ukrainian Parliament elections held in 2006. 46 parties took part in the elections. The voters could also vote against all or not to take part in the voting. We divided all the population of Ukrainian voters into three large groups (political subpopulations): $\mathcal{P}_{1}$ which contains adherents of the Party of Regions (PR, 32.14% of votes), $\mathcal{P}_{2}$ of Orange Coalition supporters (OC which consisted of “BJUT” and “NU” parties, 36.24%), and $\mathcal{P}_{3}$ of all others, including the persons who voted against all or did not take part in the pool (Other).
Political preferences of respondents are not available in the FWWVS data, so we used official results of the elections by 27 regions of Ukraine (see the site of Ukrainian Central Elections Commission www.cvk.gov.ua) to estimate the concentrations ${p_{j;N}^{m}}$ of the considered political subpopulations in the region where the jth respondent voted.
Means and variances of ξ over different subpopulations were estimated by the data (see Table 1). Different tests were performed to test their differences. The results are presented in the Table 2. Here $\mu _{m}$ means the expectation, and ${\sigma _{m}^{2}}$ means the variance of ξ over the mth subpopulation. Degrees of freedom for the limit ${\chi }^{2}$ distribution are placed in the “df” column.
Table 1.
Means (μ) and variances (${\sigma }^{2}$) for the subjective income distribution on different political populations
PR | OC | Other | |
μ | 2.31733 | 2.65091 | 4.44504 |
${\mu }^{+}$ | 2.45799 | 2.64187 | 4.44504 |
${\sigma }^{2}$ | 0.772514 | 4.85172 | 4.93788 |
${\sigma }^{2+}$ | 2.09235 | 4.7639 | 4.93788 |
These results show that the hypothesis of homogeneity of all variances must be definitely rejected. The variances of ξ for PR and OC adherents are different, but the tests failed to observe significant differences in the pairs of variances PR-Other and OC-Other. For the means, all the tests agree that PR and OC has the same mean ξ, whereas the mean of Other is different from the common mean of PR and OC.
Table 2.
Test statistics and p-values for hypotheses on subjective income distribution
Hypotheses | ss | si | ii | df |
$\mu _{1}=\mu _{2}=\mu _{3}$ | 11.776 | 10.8658 | 8.83978 | 2 |
p-value | 0.00277252 | 0.0043704 | 0.0120356 | |
$\mu _{1}=\mu _{2}$ | 2.15176 | 2.04539 | 0.621483 | 1 |
p-value | 0.142407 | 0.152668 | 0.430497 | |
$\mu _{1}=\mu _{3}$ | 10.7076 | 10.0351 | 8.75216 | 1 |
p-value | 0.00106696 | 0.00153585 | 0.00309236 | |
$\mu _{2}=\mu _{3}$ | 7.40835 | 7.10653 | 7.17837 | 1 |
p-value | 0.00649218 | 0.00768036 | 0.00737877 | |
${\sigma _{1}^{2}}={\sigma _{2}^{2}}={\sigma _{3}^{2}}$ | 15.8317 | 14.786 | 6.40963 | 2 |
p-value | 0.000364914 | 0.000615547 | 0.0405664 | |
${\sigma _{1}^{2}}={\sigma _{2}^{2}}$ | 14.7209 | 13.8844 | 5.95528 | 1 |
p-level | 0.000124657 | 0.000194405 | 0.0146733 | |
${\sigma _{1}^{2}}={\sigma _{3}^{2}}$ | 1.92166 | 1.77162 | 0.826778 | 1 |
p-level | 0.165674 | 0.183182 | 0.363206 | |
${\sigma _{2}^{2}}={\sigma _{3}^{2}}$ | 0.000741088 | 0.00072198 | 0.00294353 | 1 |
p-level | 0.978282 | 0.978564 | 0.956733 |
6 Concluding remarks
We developed a technique that allows one to construct testing procedures for different hypotheses on functional moments of mixtures with varying concentrations. This technique can be applied to test the homogeneity of means or variances (or both) of some components of the mixture. Performance of different modifications of the test procedure is compared in a small simulation study. The (ss) modification showed the worst first-type error and the highest power. The (ii) test has the best first-type error and the worst power. It seems that the (si) modification can be recommended as a golden mean.