1 Introduction
Finite Mixture Models (FMM) are widely used in the analysis of biological, economic and sociological data. For a comprehensive survey of different statistical techniques based on FMMs, see [9]. Mixtures with Varying Concentrations (MVC) is a subclass of these models in which the mixing probabilities are not constant, but vary for different observations (see [4, 5]).
In this paper we consider application of the jackknife technique to the estimation of asymptotic covariance matrix (the covariance matrix for asymptotically normal estimator, ACM) in the case when the data are described by the MVC model. The jackknife is a well-known resampling technique usually applied to i.i.d. samples (see Section 5.5.2 in [11], Chapter 4 in [15], Chapter 2 in [12]). On the jackknife estimates of variance for censored and dependent data, see [14]. Its modification to the case of the MVC model in which the observations are still independent but not identically distributed needs some efforts.
We obtained a general theorem on consistency of the jackknife estimators for ACM for moment estimators in the MVC models and apply this result to construct confidence sets for regression coefficients in linear errors-in-variables models for MVC data. On general errors-in-variables models, see [2, 3, 8]. The model and the estimators for the regression coefficients considered in this paper was proposed in [6], where the asymptotic normality of these estimates is shown.
The rest of the paper is organized as follows. In Section 2 we introduce the MVC model and describe the estimation technique for these models based on weighted moments. In Section 3 the jackknife estimates for the ACM are introduced and conditions of their consistency formulated. Section 4 is devoted to the algorithm of fast computation of the jackknife estimates. In Section 5 we apply the previous results to construct confidence sets for linear regression coefficients in errors-in-variables models with MVC. In Section 6 results of simulations are presented. In Section 7 we present results of application of the proposed technique to analyze sociological data. Proofs are placed in Section 8. Section 9 contains concluding remarks.
2 Mixtures with varying concentrations
We consider a dataset in which each observed subject O belongs to one of M subpopulations (mixture components). The number $\kappa (O)$ of the population which O belongs to is unknown. We observe d numeric characteristics of O which form the vector $\xi (O)={({\xi ^{1}}(O),\dots ,{\xi ^{d}}(O))^{T}}\in {\mathbb{R}^{d}}$ of observable variables. The distribution of $\xi (O)$ may depend on the component $\kappa (O)$:
\[ {F_{\xi }^{(m)}}(A)=\operatorname{\mathsf{P}}\{\xi (O)\in A\hspace{2.5pt}|\hspace{2.5pt}\kappa (O)=m\},\hspace{1em}m=1,\dots ,M,\]
where A is any Borel subset of ${\mathbb{R}^{d}}$.We observe variables of n independent subjects ${\xi _{j}}=\xi ({O_{j}})$. The probability to obtain j-th subject from m-th component
can be considered as the concentration of the m-th component in the mixture when the j-th observation was made. The concentrations are known and can vary for different observations.
So, the distribution of ${\xi _{j}}$ is described by the model of mixture with varying concentrations:
We will denote by
(1)
\[ \operatorname{\mathsf{P}}\{{\xi _{j}}\in A\}={\sum \limits_{m=1}^{M}}{p_{j}^{(m)}}{F_{\xi }^{(m)}}(A)\]
\[ {\mu ^{(m)}}={\operatorname{\mathsf{E}}^{(m)}}[\xi ]=\operatorname{\mathsf{E}}[\xi (O)\hspace{2.5pt}|\hspace{2.5pt}\kappa (O)=m]={\int _{{\mathbb{R}^{d}}}}x{F_{\xi }^{(m)}}(dx)\]
the vector of theoretical first moments of the m-th component distribution. In what follows, ${\operatorname{Cov}^{(m)}}[\xi ]$ means the covariance of $\xi (O)$ for the m-th component, ${\operatorname{Var}^{(m)}}[{\xi ^{l}}]$ means the variance of ${\xi ^{l}}(O)$ for this component and so on.To estimate ${\mu ^{(k)}}$ by observations ${\xi _{1}}$, …, ${\xi _{n}}$ one can use the weighted sample mean
where ${a_{j}^{(k)}}={a_{j;n}^{(k)}}$ are some weights dependent on components’ concentrations, but not on the observed ${\xi _{j}}={\xi _{j;n}}$. (In what follows we denote by the subscript $;n$ that the corresponding quantity is considered for the sample size n. In most cases this subscript is dropped to simplify notations.)
(2)
\[ {\bar{\xi }_{;n}^{(k)}}={\bar{\xi }^{(k)}}={\sum \limits_{j=1}^{n}}{a_{j}^{(k)}}{\xi _{j}},\]To obtain unbiased estimates in (2) one needs to select the weights satisfying the assumption
Let us denote
(3)
\[ {\sum \limits_{j=1}^{n}}{a_{j}^{(k)}}{p_{j}^{(m)}}=\left\{\begin{array}{l@{\hskip10.0pt}l}1\hspace{1em}& \hspace{2.5pt}\text{if}\hspace{2.5pt}k=m,\\ {} 0\hspace{1em}& \hspace{2.5pt}\text{if}\hspace{2.5pt}k\ne m.\end{array}\right.\]
\[\begin{array}{l}\displaystyle \boldsymbol{\Xi }={({\xi _{1}^{T}},\dots ,{\xi _{n}})^{T}}=\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}{\xi _{1}^{1}}& \dots & {\xi _{1}^{d}}\\ {} \vdots & \ddots & \vdots \\ {} {\xi _{n}^{1}}& \dots & {\xi _{n}^{d}}\end{array}\right),\\ {} \displaystyle \mathbf{a}=\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}{a_{1}^{(1)}}& \dots & {a_{1}^{(M)}}\\ {} \vdots & \ddots & \vdots \\ {} {a_{n}^{(1)}}& \dots & {a_{n}^{(M)}}\end{array}\right)\hspace{1em}\hspace{2.5pt}\text{and}\hspace{2.5pt}\hspace{1em}\mathbf{p}=\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}{p_{1}^{(1)}}& \dots & {p_{1}^{(M)}}\\ {} \vdots & \ddots & \vdots \\ {} {p_{n}^{(1)}}& \dots & {p_{n}^{(M)}}\end{array}\right).\end{array}\]
Then ${\mathbf{p}_{\centerdot }^{(m)}}={({p_{1}^{(m)}},\dots ,{p_{n}^{(m)}})^{T}}$, ${\mathbf{p}_{j}^{\centerdot }}={({p_{j}^{(1)}},\dots ,{p_{j}^{(M)}})^{T}}$ and the same notation is used for the matrix a.In this notation the unbiasedness condition (3) reads
where $\mathbb{E}$ means the $M\times M$ unit matrix.
There can be many choices of a satisfying (4). In [4, 5] minimax weights are considered defined by2
where
is the Gram matrix of the set of concentration vectors ${\mathbf{p}_{\centerdot }^{(1)}}$, …, ${\mathbf{p}_{\centerdot }^{(M)}}$. In what follows, we assume that these vectors are linearly independent, so $\det \boldsymbol{\Gamma }>0$ and ${\boldsymbol{\Gamma }^{-1}}$ exists. See [5] on the optimal properties of the estimates for concentration distributions based on the minimax weights (5).
To describe the asymptotic behavior of ${\bar{\xi }_{;n}^{(k)}}$ as $n\to \infty $, we will calculate its covariance matrix.
Notice that
exist. Then the limits
\[ \operatorname{Cov}[{\xi _{j}}]=\operatorname{\mathsf{E}}[{\xi _{j}}{\xi _{j}^{T}}]-\operatorname{\mathsf{E}}[{\xi _{j}}]\operatorname{\mathsf{E}}{[{\xi _{j}}]^{T}}={\sum \limits_{m=1}^{M}}{p_{j}^{(m)}}{\boldsymbol{\Sigma }^{(m)}}-{\sum \limits_{m,l=1}^{M}}{p_{j}^{(m)}}{p_{j}^{(l)}}{\mu ^{(m)}}{({\mu ^{(l)}})^{T}},\]
where ${\boldsymbol{\Sigma }^{(m)}}={\operatorname{Cov}^{(m)}}[\xi ]=\operatorname{Cov}[\xi (O)\hspace{2.5pt}|\hspace{2.5pt}\kappa (O)=m]$. So,
\[ n\operatorname{Cov}{\bar{\xi }^{(k)}}={\sum \limits_{m=1}^{M}}{\langle {({\mathbf{a}^{(k)}})^{2}}{\mathbf{p}^{(m)}}\rangle _{;n}}{\boldsymbol{\Sigma }^{(m)}}-{\sum \limits_{m,l=1}^{M}}{\langle {({\mathbf{a}^{(k)}})^{2}}{\mathbf{p}^{(m)}}{\mathbf{p}^{(l)}}\rangle _{;n}}{\mu ^{(m)}}{({\mu ^{(l)}})^{T}},\]
where
\[ {\langle {({\mathbf{a}^{(k)}})^{2}}{\mathbf{p}^{(m)}}\rangle _{;n}}=n{\sum \limits_{j=1}^{n}}{({a_{j}^{(k)}})^{2}}{p_{j}^{(m)}},\hspace{2em}{\langle {({\mathbf{a}^{(k)}})^{2}}{\mathbf{p}^{(m)}}{\mathbf{p}^{(l)}}\rangle _{;n}}=n{\sum \limits_{j=1}^{n}}{({a_{j}^{(k)}})^{2}}{p_{j}^{(m)}}{p_{j}^{(l)}}.\]
Assume that the limits
(6)
\[ {\langle {({\mathbf{a}^{(k)}})^{2}}{\mathbf{p}^{(m)}}{\mathbf{p}^{(l)}}\rangle _{\infty }}\stackrel{\text{def}}{=}\underset{n\to \infty }{\lim }{\langle {({\mathbf{a}^{(k)}})^{2}}{\mathbf{p}^{(m)}}{\mathbf{p}^{(l)}}\rangle _{;n}}\]
\[ {\langle {({\mathbf{a}^{(k)}})^{2}}{\mathbf{p}^{(m)}}\rangle _{\infty }}\stackrel{\text{def}}{=}\underset{n\to \infty }{\lim }{\langle {({\mathbf{a}^{(k)}})^{2}}{\mathbf{p}^{(m)}}\rangle _{;n}}\]
exist also, since due to (3) we have ${\textstyle\sum _{l=1}^{M}}{p_{j}^{l}}=1$ for all j.So, under this assumption,
where
(7)
\[ n\operatorname{Cov}[{\bar{\xi }^{(k)}}]\to {\boldsymbol{\Sigma }_{\infty }}\hspace{1em}\hspace{2.5pt}\text{as}\hspace{2.5pt}n\to \infty ,\]
\[ {\boldsymbol{\Sigma }_{\infty }}\stackrel{\text{def}}{=}{\sum \limits_{m=1}^{M}}{\langle {({\mathbf{a}^{(k)}})^{2}}{\mathbf{p}^{(m)}}\rangle _{\infty }}{\boldsymbol{\Sigma }^{(m)}}-{\sum \limits_{m,l=1}^{M}}{\langle {({\mathbf{a}^{(k)}})^{2}}{\mathbf{p}^{(m)}}{\mathbf{p}^{(l)}}\rangle _{\infty }}{\mu ^{(m)}}{({\mu ^{(l)}})^{T}}.\]
Theorem 1.
Assume that:
-
1. $\frac{1}{n}{\boldsymbol{\Gamma }_{;n}}\to {\boldsymbol{\Gamma }_{\infty }}$ as $n\to \infty $ and $\det {\boldsymbol{\Gamma }_{\infty }}>0$.
-
2. Assumption (6) holds.
-
3. ${\operatorname{\mathsf{E}}^{(m)}}[\| \xi {\| ^{2}}]<\infty $ for all $m=1,\dots ,M$.
This theorem is a simple corollary of Theorem 4.3 in [5].
3 Jackknife estimation of ACM of moment estimators
In what follows, we will consider unknown parameters of the component distribution ${F_{\xi }^{(k)}}$, which can be represented in the form
where $H:{\mathbb{R}^{d}}\to {\mathbb{R}^{q}}$ is some known function. A natural estimator for such parameter by the sample ${\xi _{1}}$, …, ${\xi _{n}}$ is
Then asymptotic behavior of this estimator is described by the following theorem.
Theorem 2.
In assumptions of Theorem 1, if H is continuously differentiable in some neighborhood of ${\mu ^{(k)}}$, then
\[ \sqrt{n}({\hat{\vartheta }_{;n}^{(k)}}-{\vartheta ^{(k)}})\stackrel{\text{W}}{\longrightarrow }N(0,{\mathbf{V}_{\infty }}),\]
where
(10)
\[\begin{array}{l}\displaystyle {\mathbf{V}_{\infty }}={\mathbf{V}_{\infty }^{(k)}}={\mathbf{H}^{\prime }}({\mu ^{(k)}}){\boldsymbol{\Sigma }_{\infty }^{(k)}}{({\mathbf{H}^{\prime }}({\mu ^{(k)}}))^{T}},\\ {} \displaystyle {\mathbf{H}^{\prime }}=\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}\frac{\partial {H^{1}}}{\partial {\mu ^{1}}}& \dots & \frac{\partial {H^{1}}}{\partial {\mu ^{d}}}\\ {} \vdots & \ddots & \vdots \\ {} \frac{\partial {H^{q}}}{\partial {\mu ^{1}}}& \dots & \frac{\partial {H^{q}}}{\partial {\mu ^{d}}}\end{array}\right).\end{array}\]This theorem is a simple implication from our Theorem 1 and Theorem 3 in Section 5, Chapter 1 of [1].
So, ${\mathbf{V}_{\infty }}$ defined by (10) is the ACM of the estimator ${\hat{\vartheta }^{(k)}}$ (the covariance matrix of the limit normal distribution of the normalized difference between the estimator and the estimated parameter). If it was known one could use it to construct tests for hypotheses on ${\vartheta ^{(k)}}$ or to derive confidence set for ${\vartheta ^{(k)}}$. In fact, for most estimators the ACM is unknown. Usually some estimate of ${\mathbf{V}_{\infty }}$ is used to replace its true value in statistical algorithms.
The jackknife is one of most general techniques of ACM estimation. Let $\hat{\vartheta }$ be any estimator of ϑ by the data ${\xi _{1}}$, …, ${\xi _{n}}$:
Consider estimates of the same form which are calculated by all observations without one
In our case $\hat{\vartheta }=H({\bar{\xi }^{(k)}})$, so
where
Here ${\mathbf{a}_{i-}}=({a_{ji-}^{(m)}},j=1,\dots ,n,\hspace{2.5pt}m=1,\dots ,M)$ is the minimax weights matrix calculated by the matrix ${\mathbf{p}_{i-}}$ of concentrations of all observations except i-th one. That is, ${\mathbf{p}_{i-}}={({p_{1}^{\centerdot }},\dots ,{p_{i-1}^{\centerdot }},0,{p_{i+1}^{\centerdot }},\dots ,{p_{n}^{\centerdot }})^{T}}$,
Notice that 0 is placed at the i-th row of ${\mathbf{p}_{i-}}$ as a placeholder only, to preserve the numbering of the rows in ${\mathbf{p}_{i-}}$ and ${\mathbf{a}_{i-}}$, which corresponds to the numbering of subjects in the sample.
\[ {\hat{\vartheta }_{i-}}=\hat{\vartheta }({\xi _{1}},\dots ,{\xi _{i-1}},{\xi _{i+1}},\dots ,{\xi _{n}}).\]
Then the jackknife estimator for ${\mathbf{V}_{\infty }}$ is defined by
(11)
\[ {\hat{\mathbf{V}}_{;n}}={\hat{\mathbf{V}}_{;n}^{(k)}}=n{\sum \limits_{i=1}^{n}}({\hat{\vartheta }_{i-}}-\hat{\vartheta }){({\hat{\vartheta }_{i-}}-\hat{\vartheta })^{T}}.\]Theorem 3.
Let ϑ be defined by (8), $\hat{\vartheta }$ by (9), ${\mathbf{V}_{\infty }}$ by (10) and ${\hat{V}_{;n}^{(k)}}$ by (11)–(15). Assume that:
-
1. H is twice continuously differentiable in some neighborhood of ${\mu ^{(k)}}$.
-
2. There exists some $\alpha >4$, such that $\operatorname{\mathsf{E}}[\| \xi (O){\| ^{\alpha }}\hspace{2.5pt}|\hspace{2.5pt}\kappa (O)=m]<\infty $ for all $m=1,\dots ,M$.
-
3. $\frac{1}{n}{\boldsymbol{\Gamma }_{;n}}\to {\boldsymbol{\Gamma }_{\infty }}$ as $n\to \infty $ and $\det {\boldsymbol{\Gamma }_{\infty }}>0$.
-
4. Assumption (6) holds.
Then ${\hat{\mathbf{V}}_{;n}^{(k)}}\to {\mathbf{V}_{\infty }^{(k)}}$.
For proof see Section 8.
4 Fast calculation algorithm for jackknife estimator
Direct calculation of ${\hat{\mathbf{V}}_{;n}}$ by (11)–(15) needs $\sim C{n^{2}}$ elementary operations. Here we consider an algorithm which reduces the computational complexity to $\sim Cn$ operations (linear complexity).
Notice that ${\boldsymbol{\Gamma }_{i-}}=\boldsymbol{\Gamma }-{\mathbf{p}_{i}^{\centerdot }}{({\mathbf{p}_{i}^{\centerdot }})^{T}}$. So
where
(Formula (16) can be demonstrated directly by checking ${\boldsymbol{\Gamma }_{i-}^{-1}}{\boldsymbol{\Gamma }_{i-}}=\mathbb{E}$. It is also a corollary to the Serman–Morrison–Woodbury formula, see A.9.4 in [13].)
(16)
\[ {({\boldsymbol{\Gamma }_{i-}})^{-1}}={\boldsymbol{\Gamma }^{-1}}+\frac{1}{1-{h_{i}}}{\boldsymbol{\Gamma }^{-1}}{\mathbf{p}_{i}^{\centerdot }}{({\mathbf{p}_{i}^{\centerdot }})^{T}}{\boldsymbol{\Gamma }^{-1}},\](17)
\[ {h_{i}}={({\mathbf{p}_{i}^{\centerdot }})^{T}}{\boldsymbol{\Gamma }^{-1}}{\mathbf{p}_{i}^{\centerdot }}.\]Let us denote
Then ${\bar{\boldsymbol{\xi }}_{i-}}={({\boldsymbol{\Gamma }_{i-}})^{-1}}{\mathbf{p}_{i-}}{\boldsymbol{\Xi }_{i-}}$, where ${\boldsymbol{\Xi }_{i-}}={({\xi _{1}},\dots ,{\xi _{i-1}},0,{\xi _{i+1}},\dots ,{\xi _{n}})^{T}}$. (Zero at the i-th row is a placeholder as in the matrix ${\mathbf{p}_{i-}}$.) Applying (16) one obtains
Then the following algorithm allows one to calculate ${\hat{\mathbf{V}}^{(m)}}$ for all $m=1,\dots ,M$ at once by $\sim Cn$ operations.
\[ {\bar{\boldsymbol{\xi }}_{i-}}=\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}{\bar{\xi }_{i-}^{1(1)}}& \dots & {\bar{\xi }_{i-}^{d(1)}}\\ {} \vdots & \ddots & \vdots \\ {} {\bar{\xi }_{i-}^{1(M)}}& \dots & {\bar{\xi }_{i-}^{d(M)}}\end{array}\right)={({({\bar{\xi }_{i-}^{(1)}})^{T}},\dots ,{({\bar{\xi }_{i-}^{(M)}})^{T}})^{T}}\]
and
(18)
\[ \bar{\boldsymbol{\xi }}={({({\bar{\xi }^{(1)}})^{T}},\dots ,{({\bar{\xi }^{(M)}})^{T}})^{T}}.\]
\[ {\bar{\boldsymbol{\xi }}_{i-}}={\boldsymbol{\Gamma }^{-1}}{({\mathbf{p}_{i-}})^{T}}{\boldsymbol{\Xi }_{i-}}+\frac{1}{1-{h_{i}}}{\boldsymbol{\Gamma }^{-1}}{\mathbf{p}_{i}^{\centerdot }}{({\mathbf{p}_{i}^{\centerdot }})^{T}}{\boldsymbol{\Gamma }^{-1}}{({\mathbf{p}_{i}^{\centerdot }})^{T}}{\boldsymbol{\Xi }_{i-}}.\]
This together with ${({\mathbf{p}_{i-}})^{T}}{\boldsymbol{\Xi }_{i-}}={\mathbf{p}^{T}}\boldsymbol{\Xi }-{\mathbf{p}_{i}^{\centerdot }}{\xi _{i}^{T}}$ implies
(19)
\[ {\bar{\boldsymbol{\xi }}_{i-}}=\bar{\boldsymbol{\xi }}+\frac{1}{1-{h_{i}}}{\mathbf{a}_{i}^{\centerdot }}({({\mathbf{p}_{i}^{\centerdot }})^{T}}\bar{\boldsymbol{\xi }}-{\xi _{i}^{T}}).\]5 Regression with errors in variables
In this section we consider a mixture of simple linear regressions with errors in variables. A modification of orthogonal regression estimation technique was proposed for the regression coefficients estimation in [6]. We will show how the jackknife ACM estimators from Section 3 can be applied in this case to construct confidence sets for the regression coefficients.
Recall the errors-in-variables regression model in the context of mixture with varying concentrations.
We consider the case when each subject O has two variables of interest: $x(O)$ and $y(O)$. These variables are related by a strict linear dependence with coefficients depending on the component that O belongs to:
where ${b_{0}^{(m)}}$, ${b_{1}^{(m)}}$ are the regression coefficients for the m-th component.
The true values of $x(O)$ and $y(O)$ are unobservable. These variables are observed with measurement errors
Here we assume that the errors ${\varepsilon _{X}}(O)$ and ${\varepsilon _{Y}}(O)$ are conditionally independent given $\kappa (O)=m$,
for all $m=1,\dots ,M$. So the distributions of ${\varepsilon _{X}}(O)$ and ${\varepsilon _{Y}}(O)$ can be different, but their variances are the same for a given subject. We assume that ${\sigma _{(m)}^{2}}>0$, $m=1,\dots ,M$, and are unknown.
(22)
\[ {\operatorname{\mathsf{E}}^{(m)}}{\varepsilon _{X}}={\operatorname{\mathsf{E}}^{(m)}}{\varepsilon _{Y}}=0\hspace{1em}\hspace{2.5pt}\text{and}\hspace{2.5pt}\hspace{1em}{\operatorname{Var}^{(m)}}{\varepsilon _{X}}={\operatorname{Var}^{(m)}}{\varepsilon _{Y}}={\sigma _{(m)}^{2}}\]As in Section 2 we observe a sample ${(X({O_{j}}),Y({O_{j}}))^{T}}={({X_{j}},{Y_{j}})^{T}}$, $j=1,\dots ,n$, from the mixture with known concentrations ${p_{j}^{(m)}}=\operatorname{\mathsf{P}}\{\kappa ({O_{j}})=m\}$.
In the case of homogeneous sample, when there is no mixture, the classical way to estimate ${b_{0}}$ and ${b_{1}}$ is orthogonal regression. That is, the estimator is taken as the minimizer of the total least squares functional which is the sum of squares of minimal Euclidean distances from the observation points to the regression line. The modification of this technique for mixtures with varying concentrations proposed in [6] leads to the following estimators for ${b_{0}^{(k)}}$ and ${b_{1}^{(k)}}$:
where
(23)
\[ \begin{aligned}{}{\hat{b}_{1}^{(k)}}& =\frac{{\hat{S}_{XX}^{(k)}}-{\hat{S}_{YY}^{(k)}}+\sqrt{{({\hat{S}_{XX}^{(k)}}-{\hat{S}_{YY}^{(k)}})^{2}}+4{({\hat{S}_{XY}^{(k)}})^{2}}}}{2{\hat{S}_{XY}^{(k)}}},\\ {} {\hat{b}_{0}^{(k)}}& ={\bar{Y}^{k}}-{\hat{b}_{1}^{(k)}}{\bar{X}^{(k)}},\end{aligned}\]
\[\begin{array}{l}\displaystyle {\bar{X}^{(k)}}={\sum \limits_{j=1}^{n}}{a_{j}^{(k)}}{X_{j}},\hspace{2em}{\bar{Y}^{(k)}}={\sum \limits_{j=1}^{n}}{a_{j}^{(k)}}{Y_{j}},\\ {} \displaystyle {\hat{S}_{XX}^{(k)}}={\sum \limits_{j=1}^{n}}{a_{j}^{(k)}}{({X_{j}}-{\bar{X}^{(k)}})^{2}},\hspace{2em}{\hat{S}_{YY}^{(k)}}={\sum \limits_{j=1}^{n}}{a_{j}^{(k)}}{({Y_{j}}-{\bar{Y}^{(k)}})^{2}},\\ {} \displaystyle {\hat{S}_{XY}^{(k)}}={\sum \limits_{j=1}^{n}}{a_{j}^{(k)}}({X_{j}}-{\bar{X}^{(k)}})({Y_{j}}-{\bar{Y}^{(k)}}).\end{array}\]
Conditions of consistency and asymptotic normality of these estimators are given in Theorems 5.1 and 5.2 from [6]. For example, under the assumptions of Theorem 4 we obtain
\[ \sqrt{n}({\hat{\vartheta }_{;n}^{(k)}}-{\vartheta ^{(k)}})\to N(0,{\mathbf{V}_{\infty }^{(k)}}),\]
where ${\vartheta ^{(k)}}={({b_{0}^{(k)}},{b_{1}^{(k)}})^{T}}$.The ACM ${\mathbf{V}_{\infty }}$ of the estimator is given by formula (21) in [6]. This formula is rather complicated and involves theoretical moments of unobservable variables $x(O)$, ${\varepsilon _{X}}(O)$ and ${\varepsilon _{Y}}(O)$. So it is natural to estimate ${\mathbf{V}_{\infty }}$ by the jackknife technique, which doesn’t need to know or estimate these moments.
Notice that the estimator ${({\hat{b}_{0}^{(k)}},{\hat{b}_{1}^{(k)}})^{T}}$ can be represented in terms of Section 3 if we expand the space of observable variables including quadratic terms. That is, we consider the sample
\[ {\xi _{j}}={({X_{j}},{Y_{j}},{({X_{j}})^{2}},{({Y_{j}})^{2}},{({X_{j}}{Y_{j}})^{2}})^{T}},\hspace{1em}j=1,\dots ,n.\]
Then the estimator ${\hat{\vartheta }_{;n}}={({\hat{\vartheta }_{;n}^{1}},{\hat{\vartheta }_{;n}^{2}})^{T}}={({\hat{b}_{0}^{(k)}},{\hat{b}_{1}^{(k)}})^{T}}$ defined by (23) can be represented in form (9) with twice continuously differentiable function H if ${\operatorname{Var}^{(k)}}[x]\ne 0$ and ${b_{1}^{(k)}}\ne 0$.So we can apply the technique developed in Sections 3–4. Let us define the estimator ${\hat{\mathbf{V}}_{;n}^{(k)}}$ for ${\mathbf{V}_{\infty }^{(k)}}$ by (11).
Theorem 4.
Assume that the following conditions hold.
-
1. $\frac{1}{n}{\boldsymbol{\Gamma }_{;n}}\to {\boldsymbol{\Gamma }_{\infty }}$ as $n\to \infty $ and $\det {\boldsymbol{\Gamma }_{\infty }}>0$.
-
2. Assumption (6) holds.
-
3. ${\operatorname{\mathsf{E}}^{(m)}}[{(x)^{12}}]\hspace{0.1667em}<\hspace{0.1667em}\infty $, ${\operatorname{\mathsf{E}}^{(m)}}[{({\varepsilon _{X}})^{12}}]\hspace{0.1667em}<\hspace{0.1667em}\infty $, ${\operatorname{\mathsf{E}}^{(m)}}[{({\varepsilon _{Y}})^{12}}]\hspace{0.1667em}<\hspace{0.1667em}\infty $ for all $m=1,\dots ,M$.
-
4. ${\operatorname{Var}^{(k)}}x(0)\ne 0$ and ${b_{1}^{(k)}}\ne 0$.
Then ${\hat{\mathbf{V}}_{;n}^{(k)}}\to {\mathbf{V}_{\infty }^{(k)}}$ in probability as $n\to \infty $.
In what follows we assume that ${\mathbf{V}_{\infty }^{(k)}}$ is nonsingular. This assumption holds, e.g. if for all m the distributions of $x(O)$, ${\varepsilon _{X}}(O)$ and ${\varepsilon _{Y}}(O)$ given $\kappa (O)=m$ are absolutely continuous with continuous PDFs. (The proof of this fact is rather technical, so we do not present it here.)
We can construct a confidence set (ellipsoid) for the unknown parameter ${\vartheta ^{(k)}}$ applying the Theorem 4 by the usual way. Namely, let for any $\mathbf{t}\in {\mathbb{R}^{2}}$
\[ {T_{;n}}(\mathbf{t})={(\mathbf{t}-{\hat{\vartheta }_{;n}^{(k)}})^{T}}{({\hat{\mathbf{V}}_{;n}^{(k)}})^{-1}}(\mathbf{t}-{\hat{\vartheta }_{;n}^{(k)}}).\]
Then in the assumptions of Theorem 4, if $\det {\mathbf{V}_{\infty }^{(k)}}\ne 0$,
where η is a random variable (r.v.) with chi-square distribution with 2 degrees of freedom.Consider ${B_{\alpha ;n}}=\{\mathbf{t}\in {\mathbb{R}^{2}}:{T_{;n}}(\mathbf{t})\le {Q^{\eta }}(1-\alpha )\}$, where ${Q^{\eta }}(\alpha )$ means the quantile of level α for the r.v. η. By (24)
so ${B_{\alpha ;n}}$ is an asymptotic confidence set for ${\vartheta ^{(k)}}$ of level α.
6 Results of simulation
To assess performance of the proposed technique we performed a small simulation study. In the following three experiments we calculated covering frequencies of confidence sets for regression coefficients in the model (20)–(22) constructed by (25) and corresponding one-dimensional confidence intervals.
In all experiments for sample size $n=100$ through 5000 we generated $B=1000$ samples and calculated estimates for the parameters and corresponding confidence sets. The one-dimensional confidence intervals for ${b_{i}^{(k)}}$ were calculated by the standard formula
\[ \left[{\hat{b}_{i;n}^{(k)}}-{\lambda _{\alpha /2}}\sqrt{\frac{{\hat{v}_{ii;n}^{(k)}}}{n}},{\hat{b}_{i;n}^{(k)}}+{\lambda _{\alpha /2}}\sqrt{\frac{{\hat{v}_{ii;n}^{(k)}}}{n}}\hspace{2.5pt}\right],\]
where ${\hat{v}_{ii;n}^{(k)}}$ is the i-th diagonal entry of the matrix ${\hat{\mathbf{V}}_{;n}^{(k)}}$, ${\lambda _{\alpha /2}}$ is the quantile of level $1-\alpha /2$ for the standard normal distribution. The confidence level for the sets and intervals was taken $\alpha =0.05$.Then the numbers of cases when the confidence set covers the true value of the estimated parameter were calculated and divided by B. These are the covering frequencies reported in the tables below.
In all the experiments we considered two-component mixture ($M=2$) with the concentrations of components
The regression coefficients were taken as
It seems that the increase of errors dispersion doesn’t deteriorate covering accuracy of the confidence sets.
It seems that the accuracy of covering slightly decreased but this decrease is insignificant for practical purposes.
\[ {b_{0}^{(1)}}=1/2,\hspace{2em}{b_{1}^{(1)}}=2,\hspace{2em}{b_{0}^{(2)}}=-1/2,\hspace{2em}{b_{1}^{(2)}}=-1/3,\]
and the distribution of the true (unobservable) regressor $x(O)$ was $N(0,2)$ for $\kappa (O)=1$ and $N(1,2)$ for $\kappa (O)=2$. Experiment 1.
In this experiment we let ${\varepsilon _{X}}$ and ${\varepsilon _{Y}}\sim N(0,0.25)$. The variance of the errors is so small that the regression coefficients can be estimated with no difficulties even for small sample sizes.
The covering frequencies for confidence sets are presented in Table 1. It seems that they approach the nominal covering probability 0.95 with satisfactory accuracy for sample sizes $n\ge 1000$.
Experiment 3.
Here we consider the case when the errors distributions are heavy tailed. We generate the data with ${\varepsilon _{X}}$ and ${\varepsilon _{Y}}$ having Student-T distribution with $\mathrm{df}\hspace{0.1667em}=\hspace{0.1667em}14$ degrees of freedom. (This is the smallest df for which assumptions of Theorem 4 hold.) Covering frequencies are presented in Table 3.
Table 1.
Covering frequencies for confidence sets in Experiment 1
n | ${b_{0}^{(1)}}$ | ${b_{1}^{(1)}}$ | $({b_{0}^{(1)}},{b_{1}^{(1)}})$ | ${b_{0}^{(2)}}$ | ${b_{1}^{(2)}}$ | $({b_{0}^{(2)}},{b_{1}^{(2)}})$ |
100 | 0.935 | 0.961 | 0.948 | 0.936 | 0.987 | 0.957 |
250 | 0.953 | 0.960 | 0.950 | 0.964 | 0.980 | 0.950 |
500 | 0.940 | 0.954 | 0.939 | 0.958 | 0.973 | 0.962 |
1000 | 0.946 | 0.949 | 0.943 | 0.954 | 0.971 | 0.935 |
2500 | 0.961 | 0.949 | 0.948 | 0.937 | 0.953 | 0.947 |
5000 | 0.947 | 0.949 | 0.948 | 0.954 | 0.956 | 0.958 |
Table 2.
Covering frequencies for confidence sets in Experiment 2
n | ${b_{0}^{(1)}}$ | ${b_{1}^{(1)}}$ | $({b_{0}^{(1)}},{b_{1}^{(1)}})$ | ${b_{0}^{(2)}}$ | ${b_{1}^{(2)}}$ | $({b_{0}^{(2)}},{b_{1}^{(2)}})$ |
100 | 0.969 | 0.942 | 0.918 | 0.950 | 0.974 | 0.958 |
250 | 0.958 | 0.956 | 0.945 | 0.946 | 0.962 | 0.959 |
500 | 0.949 | 0.945 | 0.936 | 0.953 | 0.966 | 0.960 |
1000 | 0.959 | 0.946 | 0.954 | 0.947 | 0.958 | 0.942 |
2500 | 0.956 | 0.949 | 0.950 | 0.947 | 0.961 | 0.958 |
5000 | 0.953 | 0.941 | 0.952 | 0.955 | 0.955 | 0.968 |
Table 3.
Covering frequencies for confidence sets in Experiment 3
n | ${b_{0}^{(1)}}$ | ${b_{1}^{(1)}}$ | $({b_{0}^{(1)}},{b_{1}^{(1)}})$ | ${b_{0}^{(2)}}$ | ${b_{1}^{(2)}}$ | $({b_{0}^{(2)}},{b_{1}^{(2)}})$ |
100 | 0.935 | 0.961 | 0.948 | 0.936 | 0.987 | 0.957 |
250 | 0.953 | 0.960 | 0.950 | 0.964 | 0.980 | 0.950 |
500 | 0.940 | 0.954 | 0.939 | 0.958 | 0.973 | 0.962 |
1000 | 0.946 | 0.949 | 0.943 | 0.954 | 0.971 | 0.935 |
2500 | 0.961 | 0.949 | 0.948 | 0.937 | 0.953 | 0.947 |
5000 | 0.947 | 0.949 | 0.948 | 0.954 | 0.956 | 0.958 |
7 Sociologic analysis of EIT data
We would like to demonstrate advantages of the proposed technique by application to the analysis of the External Independent Testing (EIT) data (see [7]). EIT is a set of exams for high school graduates in Ukraine which must be passed for admission to universities. We use data on EIT-2016 from the official site of Ukrainian Center for Educational Quality Assessment.3
In this presentation we consider only the data on scores on two subjects: Ukrainian language and literature (Ukr) and on Mathematics (Math). The scores range from 100 to 200 points. (We have excluded the data on persons who failed on one of the exams or didn’t pass these exams at all.) EIT-2016 contain such data on 246 thousands of examinees. The information on the region (Oblast) of Ukraine in which each examinee attended the high school is also available in EIT-2016.
Our aim is to investigate how dependence between Ukr and Math scores differs for examinees grown up in different environments There can be, e.g. an environment of adherents of Ukrainian culture and Ukrainian state, or in the environment of persons critical toward the Ukrainan independence. EIT-2016 doesn’t contain information on such issues. So we use data on Ukrainian Parliament (Verhovna Rada) election results to deduce approximate proportions of adherents of different political choices in different regions of Ukraine.
We divided adherents of 29 parties and blocks that took part in the elections into three large groups, which are the components of our mixture:
-
(1) Pro-Ukrainian persons, voting for the parties that then created the ruling coalition (BPP, Batkivschyna, Narodny Front, Radicals and Samopomich)
-
(2) Contra-Ukrainian persons who voted for the Opposition block, voted against all or voted for small parties which where under 5% threshold on these elections.
-
(3) Neutral persons who did not took part in the voting.
Combining these data with EIT-2016 we obtain the sample $({X_{j}},{Y_{j}})$, $j\hspace{0.1667em}=\hspace{0.1667em}1,\dots ,n$, where ${X_{j}}$ is the Math score of the j-th examinee, ${Y_{j}}$ is his/her Ukr score. The concentrations of components $({p_{j}^{1}},{p_{j}^{2}},{p_{j}^{3}})$ are taken as frequencies of adherents of corresponding political choice at the region where the j-th examinee attended the high school.
In [7] the authors propose to use classical linear regression model (in which the error appears in the response only) to describe dependence between ${X_{j}}$ and ${Y_{j}}$ in these data. But the errors-in-variables model can be more adequate since the causes which deteriorate ideal functional dependence $\mathrm{Ukr}={b_{0}}+{b_{1}}$ Math can affect both Math and Ukr scores causing random deviations of, maybe, the same dispersion for each variable.
So, in this presentation, we assumed that the data are described by the model (20)–(22), where $\kappa (O)=1,2,3$ means the component (environment at which the person O was grown up) corresponding to one of three political choices given above. (Lower and upper endpoints of confidence intervals are given in columns named low and upp correspondingly.)
Table 4.
Confidence sets for coefficients of regression between Math and Ukr
Pro | Contra | Neutral | ||||
low | upp | low | upp | low | upp | |
${b_{0}^{(k)}}$ | 40.12 | 40.22 | 236.3 | 240.1 | 84.21 | 87.19 |
${b_{1}^{(k)}}$ | 0.8562 | -0.366 | -0.345 | -2.80 | 0.335 | 0.359 |
In this model we calculated the confidence intervals of the level $\alpha =0.05/3\approx 0.0167$ to derive the unilateral level $\alpha =0.05$ in comparisons of three intervals derived for three different components. The results are presented in Table 4. We observe that the obtained intervals are rather narrow. They don’t intersect for different components. So, the regression coefficients for different components are significantly different. (Of course, it is so only if our theoretical model of the data distribution is adequate.)
The orthogonal regression lines corresponding to different components are presented on Fig 1. The solid line corresponds to the Pro-component, the dashed line for the Contra-component and the dotted line for the Neutrals.
These results have simple and plausible explanation. Say, in the Pro-component the success in Ukr positively correlates with the general school successes, so with Math scores, too. It is natural for persons who are interested in Ukrainian culture and literature. In the Contra-component the correlation is negative. Why? The persons with high Math grades in this component do not feel the need to learn Ukrainian. But the persons with less success in Math try to improve their average score (by which the admission to universities is made) by increasing their Ukr score. The Neutral component shows positive correlation between Math and Ukr, but it is less then the correlation in the Pro-component.
Surely, these explanations are too simple to be absolutely correct. We consider them only as examples of hypotheses which can be deduced from the data by the proposed technique.
8 Proofs
To demonstrate Theorem 3 we need three lemmas. Below the symbols C and c mean finite constants, maybe different.
Proof.
By definition, $\frac{1}{n}{\boldsymbol{\Gamma }_{;n}}\to {\boldsymbol{\Gamma }_{\infty }}$, so there exists $c>0$ such that $\det {\boldsymbol{\Gamma }_{;n}}>c$ for all n large enough. This together with $|{p_{j;n}^{m}}|<1$ imply
(Here $\| \cdot \| $ means the operator norm.) Taking into account that ${\mathbf{a}_{;n}}={\mathbf{p}_{;n}}{\boldsymbol{\Gamma }_{;n}^{-1}}$, we obtain the first statement of the lemma.
Then by (16)–(17),
\[ {\mathbf{a}_{j}^{\centerdot }}-{\mathbf{a}_{ji-}^{\centerdot }}={\boldsymbol{\Gamma }^{-1}}{\mathbf{p}_{j}^{\centerdot }}-{\boldsymbol{\Gamma }_{i-}^{-1}}{\mathbf{p}_{j}^{\centerdot }}=\frac{1}{{h_{i}}}{\boldsymbol{\Gamma }^{-1}}{\mathbf{p}_{i}^{\centerdot }}{({\mathbf{p}_{i}^{\centerdot }})^{T}}{\boldsymbol{\Gamma }^{-1}}{\mathbf{p}_{j}^{\centerdot }}.\]
This together with (26) yields the second statement. □Proof.
Let ${\eta _{1}}$, …, ${\eta _{n}}$ be independent random variables with $\operatorname{\mathsf{E}}{\eta _{i}}=0$. Let us denote ${B_{n}}={\textstyle\sum _{j=1}^{n}}\operatorname{\mathsf{E}}{({\eta _{j}})^{2}}$. Then the last formula in the proof of Theorem 7.2 and Theorem 7.3 in [10] imply the following proposition.
Lemma 3.
Proof.
By the Chebyshov inequality we obtain that for some $0<R<\infty $,
Then for $\alpha \beta >1$,
\[\begin{array}{l}\displaystyle \operatorname{\mathsf{P}}\{\underset{j=1,\dots ,n}{\sup }|{\xi _{j}}|>c{n^{\beta }}\}=1-\operatorname{\mathsf{P}}\{\underset{j=1,\dots ,n}{\sup }|{\xi _{j}}|\le c{n^{\beta }}\}\\ {} \displaystyle =1-{\prod \limits_{j=1}^{n}}\operatorname{\mathsf{P}}\{|{\xi _{j}}|\le C{n^{\beta }}\}=1-{\prod \limits_{j=1}^{n}}(1-\operatorname{\mathsf{P}}\{|{\xi _{j}}|>C{n^{\beta }}\})\\ {} \displaystyle \le 1-{\left(1-\frac{R}{{C^{\alpha }}{n^{\alpha \beta }}}\right)^{n}}=1-\exp \left(n\log \left(1-\frac{R}{{C^{\alpha }}{n^{\alpha \beta }}}\right)\right)\\ {} \displaystyle \sim 1-\exp \left(\frac{-nR}{{C^{\alpha }}{n^{\alpha \beta }}}\right)\to 0\hspace{1em}\hspace{2.5pt}\text{as}\hspace{2.5pt}n\to \infty ,\end{array}\]
if $\alpha \beta >1$.Lemma is proved. □
Proof of Theorem 3.
Let ${\xi ^{\prime }_{j}}={\xi _{j}}-\operatorname{\mathsf{E}}{\xi _{j}}$. Then
and
Let us denote
We will show that
and
These two equations imply the statement of the theorem.
\[ {\bar{\xi }^{(k)}}={\sum \limits_{j=1}^{n}}{a_{j}^{(k)}}{\xi _{j}}={\sum \limits_{j=1}^{n}}{a_{j}^{(k)}}{\xi ^{\prime }_{j}}+{\sum \limits_{j=1}^{n}}{\sum \limits_{m=1}^{M}}{a_{j}^{(k)}}{p_{j}^{(m)}}{\mu ^{(m)}}={\sum \limits_{j=1}^{n}}{a_{j}^{(k)}}{\xi ^{\prime }_{j}}+{\mu ^{(k)}},\]
due to (3). Similarly,
Let us denote ${\mathbf{U}_{i}}={({U_{1}},\dots ,{U_{d}})^{T}}={\bar{\xi }^{(k)}}-{\bar{\xi }_{i-}^{(k)}}$. Then
(27)
\[ {\mathbf{U}_{i}}={\sum \limits_{j=1}^{n}}{a_{j}^{(k)}}{\xi ^{\prime }_{j}}-\sum \limits_{j\ne i}{a_{ji-}}{\xi ^{\prime }_{j}}={a_{i}^{(k)}}{\xi ^{\prime }_{i}}+\sum \limits_{j\ne i}({a_{j}^{(k)}}-{a_{ji-}^{(k)}}){\xi ^{\prime }_{j}}\]
\[ \hat{\vartheta }-{\hat{\vartheta }_{i-}}=H({\bar{\xi }^{(k)}})-H({\bar{\xi }_{i-}^{(k)}})={\mathbf{H}^{\prime }}({\zeta _{i}}){\mathbf{U}_{i}},\]
where ${\zeta _{i}}$ is some intermediate point between ${\bar{\xi }^{(k)}}$ and ${\bar{\xi }_{i-}^{(k)}}$. So,
(28)
\[ {\hat{\mathbf{V}}_{;n}}=n{\sum \limits_{i=1}^{n}}{\mathbf{H}^{\prime }}({\zeta _{i}}){\mathbf{U}_{i}}{\mathbf{U}_{i}^{T}}{({\mathbf{H}^{\prime }}({\zeta _{i}}))^{T}}.\](29)
\[ {\tilde{\mathbf{V}}_{;n}}=n{\sum \limits_{i=1}^{n}}{\mathbf{H}^{\prime }}({\mu ^{(k)}}){\mathbf{U}_{i}}{\mathbf{U}_{i}^{T}}{({\mathbf{H}^{\prime }}({\mu ^{(k)}}))^{T}}.\]We start from (30). Let us calculate $\operatorname{\mathsf{E}}{\tilde{\mathbf{V}}_{;n}}$. Notice that
Now, let us estimate
Notice that
\[ \operatorname{\mathsf{E}}{\mathbf{U}_{i}}{\mathbf{U}_{i}^{T}}={({a_{i}^{(k)}})^{2}}\operatorname{\mathsf{E}}{\xi ^{\prime }_{i}}{({\xi ^{\prime }_{i}})^{T}}+\sum \limits_{j\ne i}{({a_{j}^{(k)}}-{a_{ji-}^{(k)}})^{2}}\operatorname{\mathsf{E}}{\xi ^{\prime }_{i}}{({\xi ^{\prime }_{i}})^{T}}.\]
By Assumption 2 of the theorem, ${\sup _{i}}\| \operatorname{\mathsf{E}}{\xi ^{\prime }_{i}}{({\xi ^{\prime }_{i}})^{T}}\| <C$, and by Lemma 1,
So,
\[ \operatorname{\mathsf{E}}{\tilde{\mathbf{V}}_{;n}}=n{\mathbf{H}^{\prime }}({\mu ^{(k)}}){\sum \limits_{i=1}^{n}}{({a_{i}^{(k)}})^{2}}\operatorname{\mathsf{E}}{\xi ^{\prime }_{i}}{({\xi ^{\prime }_{i}})^{T}}{({\mathbf{H}^{\prime }}({\mu ^{(k)}}))^{T}}+O({n^{-1}}).\]
By the same way as in (9), we obtain
(32)
\[ \operatorname{\mathsf{E}}{\tilde{\mathbf{V}}_{;n}}\to {\mathbf{H}^{\prime }}({\mu ^{(k)}}){\boldsymbol{\Sigma }_{\infty }^{(k)}}{({\mathbf{H}^{\prime }}({\mu ^{(k)}}))^{T}}={\mathbf{V}_{\infty }}.\](33)
\[\begin{array}{l}\displaystyle \operatorname{\mathsf{E}}\| {\tilde{\mathbf{V}}_{;n}}-\operatorname{\mathsf{E}}{\tilde{\mathbf{V}}_{;n}}{\| ^{2}}\le C{n^{2}}{\sum \limits_{{l_{1}},{l_{2}}=1}^{d}}{\sum \limits_{i=1}^{n}}\operatorname{\mathsf{E}}{({U_{i}^{{l_{1}}}}{U_{i}^{{l_{2}}}}-\operatorname{\mathsf{E}}{U_{i}^{{l_{1}}}}{U_{i}^{{l_{2}}}})^{2}}\\ {} \displaystyle \le C{n^{2}}{\sum \limits_{l=1}^{n}}{\sum \limits_{i=1}^{n}}\operatorname{\mathsf{E}}{({U_{i}^{l}})^{4}}.\end{array}\]
\[\begin{array}{l}\displaystyle \operatorname{\mathsf{E}}{({U_{i}^{l}})^{4}}=\operatorname{\mathsf{E}}{\left({a_{i}^{(k)}}{{\xi _{i}^{l}}^{\prime }}+\sum \limits_{j\ne i}({a_{i}^{(k)}}-{a_{ji-}^{(k)}}){{\xi _{j}^{l}}^{\prime }}\right)^{4}}\\ {} \displaystyle ={({a_{i}^{(k)}})^{4}}\operatorname{\mathsf{E}}{({{\xi _{j}^{l}}^{\prime }})^{4}}+6{({a_{i}^{(k)}})^{2}}\operatorname{\mathsf{E}}{\left(\sum \limits_{j\ne i}({a_{i}^{(k)}}-{a_{ji-}^{(k)}}){{\xi _{j}^{l}}^{\prime }}\right)^{2}}\\ {} \displaystyle +\operatorname{\mathsf{E}}{\left(\sum \limits_{j\ne i}({a_{i}^{(k)}}-{a_{ji-}^{(k)}}){{\xi _{j}^{l}}^{\prime }}\right)^{4}}=O({n^{-4}})\end{array}\]
due to Lemma 1 and Assumption 2 of the theorem. So by (33) we obtain
\[ \operatorname{\mathsf{E}}\| {\tilde{\mathbf{V}}_{;n}}-\operatorname{\mathsf{E}}{\tilde{\mathbf{V}}_{;n}}{\| ^{2}}=O({n^{-1}}).\]
This and (32) imply (30).Let us show (31). Notice that
By Lemma 1, ${\sup _{i}}\| {\mathbf{U}_{i}}{({\mathbf{U}_{i}})^{T}}\| \le C{n^{-2}}{\sup _{i}}\| {\xi ^{\prime }_{i}}{\| ^{2}}$. By Lemma 3 and Assumption 2 of the theorem,
(34)
\[\begin{array}{l}\displaystyle n({\hat{\mathbf{V}}_{;n}}-{\tilde{\mathbf{V}}_{;n}})=n{\sum \limits_{i=1}^{n}}({\mathbf{H}^{\prime }}({\zeta _{i}})-{\mathbf{H}^{\prime }}({\mu ^{(k)}})){\mathbf{U}_{i}}{({\mathbf{U}_{i}})^{T}}{({\mathbf{H}^{\prime }}({\zeta _{i}}))^{T}}\\ {} \displaystyle +n{\sum \limits_{i=1}^{n}}({\mathbf{H}^{\prime }}({\mu ^{(k)}})){\mathbf{U}_{i}}{({\mathbf{U}_{i}})^{T}}{({\mathbf{H}^{\prime }}({\zeta _{i}})-{\mathbf{H}^{\prime }}({\mu ^{(k)}}))^{T}}.\end{array}\]
\[ \underset{i}{\sup }\| {\xi ^{\prime }_{i}}{\| ^{2}}={O_{P}}({n^{\beta }})\hspace{1em}\hspace{2.5pt}\text{for}\hspace{2.5pt}\beta >\frac{2}{\alpha }.\]
Since $\alpha >2$ we may take here $\beta <1/2$. Let us estimate ${\sup _{i}}\| {\mathbf{H}^{\prime }}({\zeta _{i}})-{\mathbf{H}^{\prime }}({\mu ^{(k)}})\| $. Notice that ${\zeta _{i}}$ is an intermediate point between ${\bar{\zeta }^{(k)}}$ and ${\bar{\zeta }_{i-}^{(k)}}$. By Lemma 2,
Then, by Lemma 1,
\[ \underset{i}{\sup }\| {\bar{\xi }^{(k)}}-{\bar{\xi }_{i-}^{(k)}}\| \le C{n^{-1}}\underset{i}{\sup }|{\xi ^{\prime }_{i}}|={O_{P}}({n^{-1+\beta }})\]
and
\[ \underset{i}{\sup }\| {\zeta _{i}}-{\mu ^{(k)}}\| ={O_{P}}\left(\sqrt{\frac{\log \log n}{n}}\right).\]
Due to Assumption 1 of the theorem this implies
\[ \underset{i}{\sup }\| {\mathbf{H}^{\prime }}({\zeta _{i}})-{\mathbf{H}^{\prime }}({\mu ^{(k)}})\| ={O_{P}}\left(\sqrt{\frac{\log \log n}{n}}\right)\]
and ${\sup _{i}}\| {\mathbf{H}^{\prime }}({\zeta _{i}})|={O_{P}}(1)$.9 Conclusions
We introduced a modification of the jackknife technique for the ACM estimation for moment estimators by observations from mixtures with varying concentrations. A fast algorithm is proposed which implements this technique. Consistency of derived estimator is demonstrated. Results of simulations demonstrate its practical applicability for sample sizes $n>1000$.