1 Introduction
This paper continues studies of the jackknife (JK) technique application for statistical inference based on the model of mixture with varying concentrations (MVC). JK is a powerful tool of asymptotic covariance estimation of asymptotically normal statistics introduced by Quenouille (1949) and Tukey (1958). On its applications for homogeneous samples, see [11] and [1]. The JK technique was applied to heteroscedasitc nonlinear regression models in [9]. Applications to errors-in-variables models are considered in [12, 13].
In MVC models one deals with a nonhomogeneous sample which consists of subjects belonging to M different subpopulations (mixture components). One knows the probabilities with which a subject belongs to the mixture components and these probabilities are different for different subjects. So the considered observations are independent but not identically distributed. Modification of JK to such data analysis is a challenging problem.
On parametric inference in regression MVC models, see [2]. Estimation in nonparametric MVC models is discussed in [3]. In [5] a jackknife application to MVC of linear regressions models with errors in variables is considered. It is shown that the JK-estimators are consistent and allow to construct asymptotic confidence intervals for regression coefficients based on orthogonal regression estimators. In [7] a general result on asymptotic normality of generalized estimating equation (GEE) estimators for MVC is obtained which is applied to derive asymptotic normality of a modification of least squares (LS) estimators for MVC of nonlinear regressions models. A JK estimator for the asymptotic covariance was introduced in [7] also, but its properties were not investigated analytically.
In this paper we consider JK estimation of asymptotic covariance of GEE estimators in MVC models and show its consistency. The MVC model and the GEE estimator are discussed in Section 2. A version of JK for MVC is described in Section 3. Main results on consistency, asymptotic normality of GEE estimator and consistency of JK estimator of asymptotic covariance are presented in Section 4. Here we also consider an application to some nonlinear regression model. In Section 5 the developed statistical techniques are applied to a real life sociological data. Conclusive remarks are placed in Section 6. Section 7 contains technical proofs.
2 MVC model and GEE estimation
In the MVC model we assume that each observed subject O belongs to one of M different mixture components (subpopulations) ${\mathcal{P}_{k}}$, $k=1,\dots ,M$. The sample contains n subjects ${O_{1}}$,…, ${O_{n}}$. Let ${\kappa _{j}}=k$ iff ${O_{j}}\in {\mathcal{P}_{k}}$. The true ${\kappa _{j}}$ are unknown, but one knows the mixing probabilities
These probabilities are also called the concentrations of the k-th component at j-th observation.
The D-dimensional vector of observed variables of O will be denoted by $\boldsymbol{\xi }(O)={({\xi ^{1}}(O),\dots ,{\xi ^{D}}(O))^{T}}\in {\mathbb{R}^{D}}$, ${\boldsymbol{\xi }_{j}}={\boldsymbol{\xi }_{j;n}}=\boldsymbol{\xi }({O_{j}})$.
Let ${F^{(k)}}$ be the distribution of $\boldsymbol{\xi }(O)$ for $O\in {\mathcal{P}_{k}}$, i.e.
So, in the MVC model one observes independent ${\boldsymbol{\xi }_{j}}$, $j=1,\dots ,n$, with the distribution defined by (1). In this paper we adopt a semiparametric model of components’ distributions
where F is some known function of its arguments, ${\boldsymbol{\vartheta }^{(k)}}\in \Theta \subseteq {\mathbb{R}^{d}}$ are unknown Euclidean parameters of interest, ${\boldsymbol{\nu }^{(k)}}$ are some nonparametric nuisance parameters.
\[ {F^{(k)}}(A)=\operatorname{\mathsf{P}}\{\boldsymbol{\xi }(O)\in A\hspace{2.5pt}|\hspace{2.5pt}O\in {\mathcal{P}_{k}}\}\]
for all Borel sets $A\subseteq {\mathbb{R}^{D}}$. Then
(1)
\[ \operatorname{\mathsf{P}}\{{\boldsymbol{\xi }_{j}}\in A\}={\sum \limits_{k=1}^{M}}{p_{j;n}^{k}}{F^{(k)}}(A).\](2)
\[ {F^{(k)}}(A)=F(A,{\boldsymbol{\vartheta }^{(k)}},{\boldsymbol{\nu }^{(k)}}),\hspace{2.5pt}k=1,\dots ,M,\]In what follows we will denote by ${\boldsymbol{\xi }_{(k)}}$ a random vector with distribution ${F^{(k)}}$ which can be considered as the value of $\boldsymbol{\xi }(O)$ for a subject O selected at random from the component ${\mathcal{P}_{k}}$.
Example.
Consider the model of mixture of regressions from [7]. In this model the observations are ${\boldsymbol{\xi }_{j}}={({Y_{j}},{X_{j}^{1}},\dots ,{X_{j}^{m}})^{T}}$, where ${Y_{j}}$ is the response and ${\mathbf{X}_{j}}={({X_{j}^{1}},\dots ,{X_{j}^{m}})^{T}}$ is the vector of regressors in the regression model
where g is a known regression function, ${\boldsymbol{\vartheta }^{(k)}}$ is a vector of unknown regression coefficients in the k-th mixture component, ${\varepsilon _{j}}$ are regression error terms. (In [7] somewhat more general model is considered, in which the regression functions and parameter spaces can be different for different components. In this presentation we restrict ourselves to simplify notation. The main result on JK-consistency can be extended to the general case considered in [7]).
(3)
\[ {Y_{j}}=g({\mathbf{X}_{j}};{\boldsymbol{\vartheta }^{({\kappa _{j}})}})+{\varepsilon _{j}},\]We assume that ${\varepsilon _{j}}$ are independent for different j and, for each j, ${\varepsilon _{j}}$ and ${\mathbf{X}_{j}}$ are conditionally independent given ${\kappa _{j}}$. Let ${F_{X}^{(k)}}$ and ${F_{\varepsilon }^{(k)}}$ be the conditional distributions of ${\mathbf{X}_{j}}$ and ${\varepsilon _{j}}$ given ${O_{j}}\in {\mathcal{P}_{k}}$. We assume that $\operatorname{\mathsf{E}}[{\varepsilon _{j}}\hspace{2.5pt}|\hspace{2.5pt}{\kappa _{j}}=k]=\textstyle\int x{F_{X}^{(k)}}(dx)=0$ for all $k=1,\dots ,M$.
Model (3) is a partial case of model (1)–(2) in which the nuisance parameters are the distributions of regressors and errors for all components, i.e., ${\boldsymbol{\nu }^{(k)}}=({F_{X}^{(k)}},{F_{\varepsilon }^{(k)}})$.
To estimate ${\boldsymbol{\vartheta }^{(k)}}$ in (1)–(2) we apply the technique of generalized estimating equations (GEE) considered in [7]. (On GEE estimation technique and its relations to least squares, maximum likelihood and M-estimators in context of i.i.d. observations, see Section 5.4 in [10].) Let us choose an elementary estimating function $\mathbf{s}:{\mathbb{R}^{D}}\times \Theta \to {\mathbb{R}^{d}}$ such that
So, considering (4) as an equation of $\boldsymbol{\gamma }={({\gamma ^{1}},\dots {\gamma ^{d}})^{T}}\in \Theta \subseteq {\mathbb{R}^{d}}$, we observe that its unique solution is ${\boldsymbol{\vartheta }^{(k)}}$. To obtain an estimator for ${\boldsymbol{\vartheta }^{(k)}}$ we replace $\operatorname{\mathsf{E}}\mathbf{s}({\boldsymbol{\xi }_{(k)}},\boldsymbol{\gamma })$ by its estimator
where ${a_{j;n}^{k}}$ are nonrandom weights satisfying the assumption
Observe that ${\mathbf{S}_{n}^{(k)}}(\boldsymbol{\gamma })$ is an unbiased estimator for $\operatorname{\mathsf{E}}\mathbf{s}({\boldsymbol{\xi }_{(k)}},\boldsymbol{\gamma })$ under (6), i.e., $\operatorname{\mathsf{E}}{\mathbf{S}_{n}^{(k)}}(\boldsymbol{\gamma })=\operatorname{\mathsf{E}}\mathbf{s}({\boldsymbol{\xi }_{(k)}},\boldsymbol{\gamma })$. The GEE estimator to ${\boldsymbol{\vartheta }^{(k)}}$ is any statistics ${\hat{\boldsymbol{\vartheta }}_{n}^{(k)}}$ such that
E.g., in the model (3) differentiation of the least squares functional yields the elementary estimating function
where
(4)
\[ \operatorname{\mathsf{E}}\mathbf{s}({\boldsymbol{\xi }_{(k)}},\boldsymbol{\gamma })=0\hspace{2.5pt}\text{iff}\hspace{2.5pt}\boldsymbol{\gamma }={\boldsymbol{\vartheta }^{(k)}}.\](5)
\[ {\mathbf{S}_{n}^{(k)}}(\boldsymbol{\gamma })={\sum \limits_{j=1}^{n}}{a_{j;n}^{k}}\mathbf{s}({\boldsymbol{\xi }_{j;n}};\boldsymbol{\gamma }),\](6)
\[ {\sum \limits_{j=1}^{n}}{a_{j;n}^{k}}{p_{j;n}^{m}}=\left\{\begin{array}{l@{\hskip10.0pt}l}1\hspace{1em}& \hspace{2.5pt}\text{if}\hspace{2.5pt}k=m,\\ {} 0\hspace{1em}& \hspace{2.5pt}\text{if}\hspace{2.5pt}k\ne m,\end{array}\right.\hspace{2.5pt}\hspace{2.5pt}\hspace{2.5pt}\text{for all}\hspace{2.5pt}m=1,\dots ,M.\](7)
\[ {\mathbf{S}_{n}^{(k)}}({\hat{\boldsymbol{\vartheta }}_{n}^{(k)}})=0,\hspace{2.5pt}\text{a.s.}\](8)
\[ \mathbf{s}(\boldsymbol{\gamma },Y,\mathbf{X})=(Y-g(\mathbf{X},\boldsymbol{\gamma }))\dot{\mathbf{g}}(\mathbf{X},\boldsymbol{\gamma }),\]
\[ \dot{\mathbf{g}}(\mathbf{X},\boldsymbol{\gamma })={\left(\frac{\partial g(\mathbf{X},\boldsymbol{\gamma })}{\partial {\boldsymbol{\gamma }^{1}}},\dots \frac{\partial g(\mathbf{X},\boldsymbol{\gamma }}{\partial {\boldsymbol{\gamma }^{d}}}\right)^{T}}.\]
In this paper we consider only the minimax weights ${a_{j;n}^{k}}$, which can be defined as follows. Let ${\mathbf{p}_{;n}}$ be the matrix of all concentrations for all components of the mixture:
\[ {\mathbf{p}_{;n}}=\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}{p_{1;n}^{1}}& \dots & {p_{1;n}^{M}}\\ {} \vdots & \ddots & \vdots \\ {} {p_{n;n}^{1}}& \dots & {p_{n;n}^{M}}\end{array}\right).\]
Then the matrix of all weights ${\mathbf{a}_{;n}}={({a_{j;n}^{m}})_{j=1,\dots ,n,m=1,\dots ,M}}$ is defined as
where ${\boldsymbol{\Gamma }_{;n}}={\mathbf{p}_{;n}^{T}}{\mathbf{p}_{;n}}$. (We assume that $\det {\boldsymbol{\Gamma }_{;n}}\ne 0$). Minimax properties of ${\mathbf{a}_{;n}}$ were discussed in [3]. For one alternative approach to weighting in GEE for MVC, see [4].3 Jackknife for MVC
Consider the set of parameters ${\boldsymbol{\vartheta }^{(k)}}$, $k=1,\dots ,M$, for different components as one long vector parameter $\boldsymbol{\vartheta }={({({\boldsymbol{\vartheta }^{(1)}})^{T}},\dots ,{({\boldsymbol{\vartheta }^{(M)}})^{T}})^{T}}$ and similarly for the set of estimators ${\hat{\boldsymbol{\vartheta }}_{n}}={({({\hat{\boldsymbol{\vartheta }}_{n}^{(1)}})^{T}},\dots ,{({\hat{\boldsymbol{\vartheta }}_{n}^{(M)}})^{T}})^{T}}$. (Recall that ${\boldsymbol{\vartheta }^{(k)}},{\hat{\boldsymbol{\vartheta }}_{n}^{(k)}}\in {\mathbb{R}^{d}}$.) It was shown in [7] that under suitable assumptions (see Theorem 3 below) the estimator ${\hat{\boldsymbol{\vartheta }}_{;n}}$ is asymptotically normal, i.e.
and define ${\hat{\boldsymbol{\vartheta }}_{-in}^{(k)}}$ as a statistics which satisfy
${\hat{\boldsymbol{\vartheta }}_{-in}}={({({\hat{\boldsymbol{\vartheta }}_{-in}^{(1)}})^{T}},\dots ,{({\hat{\boldsymbol{\vartheta }}_{-in}^{(M)}})^{T}})^{T}}$. In fact, ${\hat{\boldsymbol{\vartheta }}_{-in}}$ is the GEE estimator for $\boldsymbol{\vartheta }$ calculated by the sample which contains all the observed subjects ${O_{j}}$, except the i-th one. Then the JK estimator for V is
(On some efficient algorithms for calculation of ${\mathbf{a}_{;-i,n}}$ and ${\hat{\mathbf{V}}_{n}}$ see [5]).
\[ \sqrt{n}({\hat{\boldsymbol{\vartheta }}_{;n}}-\boldsymbol{\vartheta })\stackrel{\text{W}}{\longrightarrow }N(0,\mathbf{V}).\]
To apply asymptotic normality for hypotheses testing one needs an estimator for the dispersion matrix (asymptotic covariance) V given by (16). Jackknife (JK) is a powerful tool for constructing such estimators. We now consider its modification for the MVC models (cf. [5]). Let
\[ {\mathbf{p}_{;-i,n}}=\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}{p_{1;n}^{1}}& \dots & {p_{1;n}^{M}}\\ {} \vdots & \ddots & \vdots \\ {} {p_{i-1;n}^{1}}& \dots & {p_{i-1;n}^{M}}\\ {} 0& \dots & 0\\ {} {p_{i+1;n}^{1}}& \dots & {p_{i+1;n}^{M}}\\ {} \vdots & \ddots & \vdots \\ {} {p_{n;n}^{1}}& \dots & {p_{n;n}^{M}}\end{array}\right),\]
i.e. ${\mathbf{p}_{;-i,n}}$ is the matrix ${\mathbf{p}_{;n}}$ with the i-th row replaced by the zero row. Then ${\boldsymbol{\Gamma }_{;-i,n}}={\mathbf{p}_{;-i,n}^{T}}{\mathbf{p}_{;-i,n}}$ and
Let
(11)
\[ {\mathbf{S}_{-in}^{(k)}}(\boldsymbol{\gamma })=\sum \limits_{j\ne i}{a_{j;-i,n}^{k}}\mathbf{s}({\boldsymbol{\xi }_{j;n}};\boldsymbol{\gamma })\](12)
\[ {\mathbf{S}_{-in}^{(k)}}({\hat{\boldsymbol{\vartheta }}_{-in}^{(k)}})=0,\hspace{2.5pt}\text{a.s.},\](13)
\[ {\hat{\mathbf{V}}_{n}}=n{\sum \limits_{i=1}^{n}}({\hat{\boldsymbol{\vartheta }}_{-in}}-{\hat{\boldsymbol{\vartheta }}_{n}}){({\hat{\boldsymbol{\vartheta }}_{-in}}-{\hat{\boldsymbol{\vartheta }}_{n}})^{T}}.\]4 Main theorems
In this section we consider asymptotic behavior of ${\hat{\boldsymbol{\vartheta }}_{;n}}$ and ${\hat{\mathbf{V}}_{n}}$ as $n\to \infty $. Note that we do not assume any relationship between the samples $\{{\boldsymbol{\xi }_{j;n}},\hspace{2.5pt}j=1,\dots ,n\}$ for different n. They can be independent or dependent, or a smaller sample can be a part of a larger one. The concentration arrays ${\mathbf{p}_{;n}}$ are also unrelated for different n.
To formulate the theorems we need some notations and assumptions. In what follows we assume that the limit
exists and $\det {\boldsymbol{\Gamma }_{\infty }}\ne 0$.
(14)
\[ {\boldsymbol{\Gamma }_{\infty }}=\underset{n\to \infty }{\lim }\frac{1}{n}{\mathbf{p}_{;n}^{T}}{\mathbf{p}_{;n}}\]For a vector x, the symbol $|\mathbf{x}|$ means the Euclidean norm. For a matrix A, $|\mathbf{A}|$ is the operator norm. Let $\boldsymbol{\psi }(\mathbf{x},\boldsymbol{\gamma })$ be any function of $\mathbf{x}\in {\mathbb{R}^{D}}$, $\boldsymbol{\gamma }\in \Theta $, maybe vector- or matrix-valued, $h:{\mathbb{R}^{D}}\to \mathbb{R}$, $\rho :\Theta \times \Theta \to \mathbb{R}$. We say that $\boldsymbol{\psi }$ satisfy Condition ${\boldsymbol{\Psi }_{h,\rho }}$ iff, for all ${\boldsymbol{\gamma }_{1}},{\boldsymbol{\gamma }_{2}}\in \Theta $ and all $\mathbf{x}\in {\mathbb{R}^{D}}$,
\[ |\boldsymbol{\psi }(\mathbf{x},{\boldsymbol{\gamma }_{1}})-\boldsymbol{\psi }(\mathbf{x},{\boldsymbol{\gamma }_{2}})|\le h(\mathbf{x})\rho ({\boldsymbol{\gamma }_{1}},{\boldsymbol{\gamma }_{2}}).\]
A set of functions $\{{\boldsymbol{\psi }_{i}},i\in I\}$ satisfy Condition ${\boldsymbol{\Psi }_{h,\rho }}$ if ${\boldsymbol{\psi }_{i}}$ satisfy ${\boldsymbol{\Psi }_{h,\rho }}$ for each $i\in I$.The following theorem states conditions of consistency of ${\hat{\boldsymbol{\vartheta }}_{n}}$ and consistency of ${\hat{\boldsymbol{\vartheta }}_{-in}}$ uniform by i.
Theorem 1 (Consistency).
Let the following assumptions hold.
Then, under the assumptions (C1)–(C7), ${\hat{\boldsymbol{\vartheta }}_{n}^{(k)}}\stackrel{\text{P}}{\longrightarrow }{\boldsymbol{\vartheta }^{(k)}}$ as $n\to \infty $, and under the assumptions (C1)–(C6), (C7’),
as $n\to \infty $.
-
(C1) Θ is a compact set in ${\mathbb{R}^{d}}$.
-
(C2) Condition ${\boldsymbol{\Psi }_{h,\rho }}$ holds for the elementary estimating functions s with some functions ρ and h.
-
(C3) ρ is a continuous function on $\Theta \times \Theta $ with $\rho (\boldsymbol{\gamma },\boldsymbol{\gamma })=0$ for all $\boldsymbol{\gamma }\in \Theta $.
-
(C4) For all $l=1,\dots ,M$, $\operatorname{\mathsf{E}}|\mathbf{s}({\boldsymbol{\xi }_{(l)}},{\boldsymbol{\vartheta }^{(l)}}){|^{2}}<\infty $ and $\operatorname{\mathsf{E}}{(h({\boldsymbol{\xi }_{(l)}}))^{2}}<\infty $.
-
(C5) $\operatorname{\mathsf{E}}\mathbf{s}({\xi _{(k)}},\boldsymbol{\gamma })=0$ if and only if $\boldsymbol{\gamma }={\boldsymbol{\vartheta }^{(k)}}$.
-
(C6) $\det {\boldsymbol{\Gamma }_{\infty }}>0$.
-
(C7) $\operatorname{\mathsf{P}}\{\exists \boldsymbol{\gamma }\in \Theta \textit{, such that}\hspace{2.5pt}{\mathbf{S}_{n}^{(k)}}(\boldsymbol{\gamma })=0\}\to 1$ as $n\to \infty $.
-
(C7’) $\operatorname{\mathsf{P}}\{\forall i=1,\dots ,n,\exists {\boldsymbol{\gamma }_{i}}\in \Theta \textit{, such that}\hspace{2.5pt}{\mathbf{S}_{-in}^{(k)}}({\boldsymbol{\gamma }_{i}})=0\}\to 1$ as $n\to \infty $.
(15)
\[ \underset{i=1,\dots ,n}{\sup }|{\hat{\boldsymbol{\vartheta }}_{-in}^{(k)}}-{\boldsymbol{\vartheta }^{(k)}}|\stackrel{\text{P}}{\longrightarrow }0\]Assumptions (C7) and (C7’) claim the existence of GEE solutions with probability tending to 1 as $n\to \infty $. They seem rather imperfect. The following theorem provides conditions under which they hold.
Let $\dot{\mathbf{S}}(\boldsymbol{\gamma })$ be the Jacobian of a vector-valued function S, ${\mathbf{S}_{\infty }}(\boldsymbol{\gamma })=\operatorname{\mathsf{E}}\mathbf{s}({\boldsymbol{\xi }_{(k)}},\boldsymbol{\gamma })$.
Theorem 2 (Existence).
To formulate the asymptotic normality result we need some additional notations. Let
Let $\mathbf{s}(\mathbf{x},\boldsymbol{\gamma })={({s^{1}}(\mathbf{x},\boldsymbol{\gamma }),\dots ,{s^{d}}(\mathbf{x},\boldsymbol{\gamma }))^{T}}$.
\[ {\mathbf{M}^{(k)}}(\boldsymbol{\gamma })=\operatorname{\mathsf{E}}\dot{\mathbf{s}}({\boldsymbol{\xi }_{(k)}},\boldsymbol{\gamma }),\hspace{2.5pt}{\mathbf{M}^{(k)}}={\mathbf{M}^{(k)}}({\boldsymbol{\vartheta }^{(k)}})=\operatorname{\mathsf{E}}\dot{\mathbf{s}}({\boldsymbol{\xi }_{(k)}},{\boldsymbol{\vartheta }^{(k)}}),\]
\[ \langle {\mathbf{a}^{k}}{\mathbf{a}^{m}}{\mathbf{p}^{l}}{\mathbf{p}^{i}}\rangle =\underset{n\to \infty }{\lim }n{\sum \limits_{j=1}^{n}}{a_{j;n}^{k}}{a_{j;n}^{m}}{p_{j;n}^{l}}{p_{j;n}^{i}},\]
\[ \langle {\mathbf{a}^{k}}{\mathbf{a}^{m}}{\mathbf{p}^{l}}\rangle =\underset{n\to \infty }{\lim }n{\sum \limits_{j=1}^{n}}{a_{j;n}^{k}}{a_{j;n}^{m}}{p_{j;n}^{l}}\]
(existence of these limits is a condition in the following Theorem 3). Now
\[\begin{aligned}{}{\mathbf{Z}^{(m,l)}}& ={\sum \limits_{i=1}^{M}}\langle {\mathbf{a}^{m}}{\mathbf{a}^{l}}{\mathbf{p}^{i}}\rangle \operatorname{\mathsf{E}}\mathbf{s}({\boldsymbol{\xi }_{(i)}},{\boldsymbol{\vartheta }^{(m)}})\mathbf{s}{({\boldsymbol{\xi }_{(i)}},{\boldsymbol{\vartheta }^{(l)}})^{T}}\\ {} & \hspace{1em}-{\sum \limits_{{i_{i}},{i_{2}}=1}^{M}}\langle {\mathbf{a}^{m}}{\mathbf{a}^{l}}{\mathbf{p}^{{i_{1}}}}{\mathbf{p}^{{i_{2}}}}\rangle \operatorname{\mathsf{E}}\mathbf{s}({\boldsymbol{\xi }_{({i_{1}})}},{\boldsymbol{\vartheta }^{(m)}})\operatorname{\mathsf{E}}\mathbf{s}{({\boldsymbol{\xi }_{({i_{2}})}},{\boldsymbol{\vartheta }^{(l)}})^{T}},\end{aligned}\]
\[ {\mathbf{V}^{(m,l)}}={({\mathbf{M}^{(m)}})^{-1}}{\mathbf{Z}^{(m,l)}}{({\mathbf{M}^{(m)}})^{-T}}\]
(here and below ${\mathbf{M}^{-T}}={({\mathbf{M}^{-1}})^{T}}$). Let’s pack all the matrices ${\mathbf{V}^{(m,l)}}$ into one $(Md)\times (Md)$ matrix
(16)
\[ \mathbf{V}=\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}{\mathbf{V}^{(1,1)}}& \dots & {\mathbf{V}^{(1,M)}}\\ {} \vdots & \ddots & \vdots \\ {} {\mathbf{V}^{(M,1)}}& \dots & {\mathbf{V}^{(M,M)}}\end{array}\right).\]Theorem 3 (Asymptotic normality).
Let the following assumptions hold.
Then
-
(AN1) $\boldsymbol{\vartheta }$ is an inner point of ${\Theta ^{M}}=\Theta \times \cdots \times \Theta $.
-
(AN2) There exists an open ball B centered in $\boldsymbol{\vartheta }$, such that the derivatives\[ \frac{{\partial ^{2}}{s^{l}}(\mathbf{x},\boldsymbol{\gamma })}{\partial {\gamma ^{i}}\partial {\gamma ^{j}}}\]exist for all $\boldsymbol{\gamma }={({\gamma ^{1}},\dots ,{\gamma ^{d}})^{T}}\in B$, all $l,i,j=1,\dots ,d$, and almost all x (w.r.t. all ${F^{(k)}}$, $k=1,\dots ,M$).
-
(AN3) There exists a function $h:{\mathbb{R}^{D}}\to \mathbb{R}$ such that\[ \underset{l,i,j}{\max }\underset{\boldsymbol{\gamma }\in B}{\sup }\left|\frac{{\partial ^{2}}{s^{l}}(\mathbf{x},\boldsymbol{\gamma })}{\partial {\gamma ^{i}}\partial {\gamma ^{j}}}\right|\le h(x)\]and $\operatorname{\mathsf{E}}{(h({\boldsymbol{\xi }_{(k)}}))^{\alpha }}<\infty $ for some $\alpha >1$ for all $k=1,\dots M$.
-
(AN4) $\operatorname{\mathsf{E}}|\mathbf{s}({\boldsymbol{\xi }_{(k)}},{\boldsymbol{\vartheta }^{(k)}}){|^{2}}<\infty $ for all $k=1,\dots M$.
-
(AN5) ${\mathbf{M}^{(k)}}$ are finite and nonsingular for all $k=1,\dots ,M$.
-
(AN6) The limits $\langle {\mathbf{a}^{k}}{\mathbf{a}^{m}}{\mathbf{p}^{i}}{\mathbf{p}^{l}}\rangle $ exist for all $k,m,i,l=1,\dots ,M$.
-
(AN7) Matrix ${\boldsymbol{\Gamma }_{\infty }}$ exists and is nonsingular.
-
(AN8) ${\hat{\boldsymbol{\vartheta }}_{n}}$ exists and is a consistent estimator for $\boldsymbol{\vartheta }$.
\[ \sqrt{n}({\hat{\boldsymbol{\vartheta }}_{n}}-\boldsymbol{\vartheta })\stackrel{\text{W}}{\longrightarrow }N(0,\mathbf{V})\]
as $n\to \infty $.
Now we are ready to formulate the theorem on consistency of the JK estimator of V.
Theorem 4.
Assume that assumptions (AN1), (AN5), (AN6), (AN7) of Theorem 3 hold and, moreover:
Then ${\hat{\mathbf{V}}_{n}}\stackrel{\text{P}}{\longrightarrow }\mathbf{V}$ as $n\to \infty $.
-
(JK1) There exists a function $h:{\mathbb{R}^{D}}\to \mathbb{R}$ such that\[ \underset{\boldsymbol{\gamma }\in \Theta }{\sup }|\mathbf{s}(\mathbf{x},\boldsymbol{\gamma })|\le h(\mathbf{x}),\hspace{2.5pt}\underset{\boldsymbol{\gamma }\in \Theta }{\sup }|\dot{\mathbf{s}}(\mathbf{x},\boldsymbol{\gamma })|\le h(\mathbf{x}),\]\[ \underset{l,i,j}{\max }\underset{\boldsymbol{\gamma }\in B}{\sup }\left|\frac{{\partial ^{2}}{s^{l}}(\mathbf{x},\boldsymbol{\gamma })}{\partial {\gamma ^{i}}\partial {\gamma ^{j}}}\right|\le h(x)\]and for some $\alpha >4$,
-
(JK2) ${\hat{\boldsymbol{\vartheta }}_{n}}$ is a $\sqrt{n}$-consistent estimator of $\boldsymbol{\vartheta }$,
-
(JK3) ${\sup _{i=1,\dots ,n}}|{\hat{\boldsymbol{\vartheta }}_{-in}}-\boldsymbol{\vartheta }|\stackrel{\text{P}}{\longrightarrow }0$ as $n\to \infty $.
Example.
Let the observed data be ${\boldsymbol{\xi }_{j}}={({Y_{j}},{X_{j}})^{T}}$, $j=1,\dots ,n$, where dependence between ${X_{j}}$ and ${Y_{j}}$ is described by the regression model (3) with
where ${\boldsymbol{\vartheta }^{(k)}}={({\boldsymbol{\vartheta }_{0}^{(k)}},{\boldsymbol{\vartheta }_{1}^{(k)}})^{T}}$ are the vectors of regression coefficients for the k-th mixture component.
(17)
\[ g({X_{j}},{\boldsymbol{\vartheta }^{(k)}})=\frac{1}{1+\exp (-{\boldsymbol{\vartheta }_{0}^{(k)}}-{\boldsymbol{\vartheta }_{1}^{(k)}}{X_{j}})},\]Assume that $\boldsymbol{\gamma }\in \Theta $, where Θ is a compact set in ${\mathbb{R}^{2}}$. Then for the elementary estimating function s defined by (8) we obtain
\[ |\mathbf{s}({\boldsymbol{\xi }_{j}},\boldsymbol{\gamma })|\le C(1+|{X_{j}}|)(1+|{\varepsilon _{j}}|)\hspace{2.5pt}\text{and}\hspace{2.5pt}|\frac{{\partial ^{2}}}{\partial {\gamma ^{i}}{\gamma ^{i}}}{s^{l}}({\boldsymbol{\xi }_{j}},\boldsymbol{\gamma })|\le C(1+|{X_{j}}{|^{3}})(1+|{\varepsilon _{j}}|),\]
where $C<\infty $ is some constant. So Assumption (JK1) holds if $\operatorname{\mathsf{E}}[{({\varepsilon _{j}})^{4}}\hspace{2.5pt}|\hspace{2.5pt}{\kappa _{j}}=k]<\infty $ and $\operatorname{\mathsf{E}}[|{X_{j}}{|^{12}}\hspace{2.5pt}|\hspace{2.5pt}{\kappa _{j}}=k]<\infty $ for all $k=1,\dots ,M$. Assumption (AN5) holds if
Assumption (C5) also holds under (18), see Theorem 2 in [8]. So, under rather mild assumptions Theorems 1–4 hold for generalized least squares estimator in this model. In [7] confidence sets for ${\boldsymbol{\vartheta }^{(k)}}$ are constructed based on the asymptotic normality of ${\hat{\boldsymbol{\vartheta }}_{n}^{(k)}}$ and consistency of ${\hat{\mathbf{V}}_{n}^{(k,k)}}$. Namely, the confidence ellipsoid for ${\boldsymbol{\vartheta }^{(k)}}$ is defined as
\[ {B_{\alpha ,n}}=\{\boldsymbol{\gamma }\in {\mathbb{R}^{d}}:\hspace{2.5pt}{(\boldsymbol{\gamma }-{\boldsymbol{\vartheta }^{(k)}})^{T}}{({\hat{\mathbf{V}}_{n}^{(k,k)}})^{-1}}(\boldsymbol{\gamma }-{\boldsymbol{\vartheta }^{(k)}})\le {Q^{\eta }}(1-\alpha )\},\]
where ${Q^{\eta }}(1-\alpha )$ is the quantile of level $1-\alpha $ of the ${\chi ^{2}}$-distribution with d degrees of freedom. Then under the assumptions of Theorems 1–4, if $\det {\mathbf{Z}^{(k,k)}}\ne 0$,
\[ \underset{n\to \infty }{\lim }\operatorname{\mathsf{P}}\{{\boldsymbol{\vartheta }^{(k)}}\in {B_{\alpha ,n}}\}=1-\alpha .\]
In [7] results of simulations are presented which show that these ellipsoids can be used for samples large enough.5 Application to sociological data
In this section we show how the considered technique can be applied to the statistical analysis of real life data. In many sociological problems one deals with two sets of data from two different sources. The first (I) set consists of individual records with values of variables which present personal information of investigated persons. The second (A) set contain some averaged information on large groups of a variable of investigated persons which is not presented in the I-set. The problem is how to merge information from A- and I-sets to infer on the model involving variables from both sets.
We consider as the I-set a data of results of Ukrainian External Independent Testing (EIT) in 2016 from the official site of Ukrainian Center for Educational Quality Assessment. EIT exams are to be passed by high school graduates for admission to universities. Information on scores in Ukrainian language and literature (Ukr) and on Mathematics (Math) for nearly 246 000 examinees of EIT-2016 is available.1 In [5] linear regression dependence between Ukr and Math is assumed. In this paper ve consider the model
in which the coefficients ${\boldsymbol{\vartheta }_{0}^{(k)}}$ and ${\boldsymbol{\vartheta }_{1}^{(k)}}$ depend on the political attitudes of the adult environment in which the student was brought up. There can be a family of Ukrainian independence adherents or an environment critical to existence of Ukrainian state and culture.
(19)
\[ \text{Ukr}=\frac{1}{1+\exp (-{\boldsymbol{\vartheta }_{0}^{(k)}}-{\boldsymbol{\vartheta }_{1}^{(k)}}\text{Math})}+\varepsilon ,\]Fig. 1.
Estimated logistic regression lines by EIT-2016 data. Solid line for 1st component, dashed line for 2nd, dotted line for 3rd one
EIT-2016 does not contain information on political issues. But for each examine the region of Ukraine is recorded where he/she graduated. So we used data on results of Ukrainian Parliament (Verhovna Rada) elections-2014 to get approximate proportions of adherents of different political choices in regions of Ukraine (A-set). All possible electoral choices at these elections (voting for one of parties, voting against all or not to take part in the voting) were divided into three groups (components): (1) pro-Ukrainian, (2) contra-Ukrainian and (3) neutral (see [5] for details). The concentrations of components are taken as frequencies of adherents of corresponding electoral choice at the region where j-th examinee attended high school. On Fig. 1 the fitted regression lines are presented. The dependence between Math and Ukr on this picture seems significantly different in the three components. Say, in the pro component it is increasing and seemingly nonlinear, in the contra component it is decreasing, in the neutral one it is increasing and quite near to linear dependence.
To verify significance of these differences we constructed the confidence ellipsoids for the parameters as described in Section 4. By the Bonferroni rule, to infer with the significance level ${\alpha _{0}}=0.05$, we took the levels of the ellipsoids $\alpha ={\alpha _{0}}/3=0.01666$. Obtained ellipsoids are presented on Fig. 2. Since they are not intersecting, we conclude that the differences between the parameters are significant for all the components.
6 Conclusion
So, we obtained conditions under which the JK estimator for asymptotic variance is valid for nonlinear GEE estimators in MVC models. The presented example of sociological data analysis demonstrates possibilities of practical applications of this estimator.
7 Proofs
We start from some auxiliary lemmas.
Lemma 1.
Assume that ${\boldsymbol{\Gamma }_{\infty }}$ exists and is nonsingular. Then for some constant ${C_{a}}<\infty $,
\[ |{a_{j;n}^{k}}|\le \frac{{C_{a}}}{n},\hspace{2.5pt}|{a_{j;n}^{k}}-{a_{j;-in}^{k}}|\le \frac{{C_{a}}}{{n^{2}}}\]
for all $k=1,\dots ,M$, $1\le i\ne j\le n$, $n=1,2,\dots \hspace{0.1667em}$.
For the proof, see [5], lemma 1.
Let ${\psi _{j;n}}$, $j=1,\dots ,n$, $n=1,2,\dots \hspace{0.1667em}$, be any set of functions with domain ${\mathbb{R}^{D}}$ and values in a set of scalars, or vectors, or matrices. We use the following notation
\[ {\Psi _{n}^{(k)}}(\boldsymbol{\gamma })={\sum \limits_{j=1}^{n}}{a_{j;n}^{k}}{\psi _{j;n}}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma }),\]
\[ {\Psi _{-in}^{(k)}}(\boldsymbol{\gamma })=\sum \limits_{j\ne i}{a_{j;-in}^{k}}{\psi _{j;n}}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma }).\]
In this notation ψ can be replaced by any other symbol, e.g., $\tilde{\psi }$, s or $\dot{\mathbf{s}}$.Proof.
Let
Put ${C_{n}}={C_{0}}{n^{1/\alpha }}$ for some ${C_{0}}>0$. Then
Really,
But
\[ {\tilde{\psi }_{j;n}}=\underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\psi _{j;n}}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma })|.\]
Then
(20)
\[ {\mathbb{P}_{n}}(C)=\operatorname{\mathsf{P}}\{\underset{j=1,\dots ,n}{\max }{\tilde{\psi }_{j;n}}>C\}=1-{\prod \limits_{j=1}^{n}}(1-\operatorname{\mathsf{P}}\{{\tilde{\psi }_{j;n}}>C\}).\](21)
\[ {\tilde{p}_{n}}=\underset{j=1,\dots ,n}{\sup }\operatorname{\mathsf{P}}\{{\tilde{\psi }_{j;n}}>{C_{n}}\}=o(1/n).\]
\[ \operatorname{\mathsf{P}}\{{\tilde{\psi }_{j;n}}>{C_{n}}\}\le \operatorname{\mathsf{P}}\{H({\boldsymbol{\xi }_{j;n}})>{C_{n}}\}\]
\[ \le {\sum \limits_{k=1}^{M}}{p_{j;n}^{k}}\operatorname{\mathsf{P}}\{H({\boldsymbol{\xi }_{(k)}})>{C_{n}}\}\le {\sum \limits_{k=1}^{M}}\operatorname{\mathsf{P}}\{H({\boldsymbol{\xi }_{(k)}})>{C_{n}}\}.\]
So, to show (21) one needs only to observe that
(22)
\[ n\operatorname{\mathsf{P}}\{H({\boldsymbol{\xi }_{(k)}})>{C_{n}}\}=o(1)\hspace{2.5pt}\text{for all}\hspace{2.5pt}k=1,\dots M.\]
\[ n\operatorname{\mathsf{P}}\{H({\boldsymbol{\xi }_{(k)}})>{C_{n}}\}=n\operatorname{\mathsf{E}}\mathbf{1}\{H({\boldsymbol{\xi }_{(k)}})>{C_{n}}\}\]
\[ \le n\operatorname{\mathsf{E}}\mathbf{1}\{H({\boldsymbol{\xi }_{(k)}})>{C_{n}}\}\frac{H{({\boldsymbol{\xi }_{(k)}})^{\alpha }}}{{C_{n}^{\alpha }}}\le \frac{1}{{C_{0}^{\alpha }}}\operatorname{\mathsf{E}}H{({\boldsymbol{\xi }_{(k)}})^{\alpha }}\mathbf{1}\{H({\boldsymbol{\xi }_{(k)}})>{C_{n}}\}=o(1)\]
due to the assumption (ii) of the lemma. So (21) holds. From (20) and (21) we obtain
for any ${C_{0}}>0$. □Lemma 3.
Let the following assumptions hold.
Then there exists a sequence of random variables ${\zeta _{n}}={O_{p}}(1)$, such that for all $n=1,2,\dots \hspace{0.1667em}$, all ${\gamma _{1}},{\gamma _{2}}\in \Theta $ and all $i=1,\dots n$ the following inequalities hold:
Proof.
Let us start with (23). Observe that by Condition ${\boldsymbol{\Psi }_{h,\rho }}$ and Lemma 1,
\[ |{\Psi _{n}^{(k)}}({\boldsymbol{\gamma }_{1}})-{\Psi _{n}^{(k)}}({\boldsymbol{\gamma }_{2}})|\le {\sum \limits_{j=1}^{n}}|{a_{j;n}^{k}}|h({\boldsymbol{\xi }_{j;n}})\rho ({\boldsymbol{\gamma }_{1}},{\boldsymbol{\gamma }_{2}})\le \frac{{C_{a}}\rho ({\boldsymbol{\gamma }_{1}},{\boldsymbol{\gamma }_{2}})}{n}{\sum \limits_{j=1}^{n}}h({\boldsymbol{\xi }_{j;n}}).\]
By assumption (ii) of the lemma
\[ {A_{1}}=\underset{j,n}{\max }\operatorname{\mathsf{E}}h({\boldsymbol{\xi }_{j;n}})\le \underset{k=1,\dots ,M}{\max }\operatorname{\mathsf{E}}h({\boldsymbol{\xi }_{(k)}})<\infty \]
and ${A_{2}}=\operatorname{\mathsf{E}}{(h({\boldsymbol{\xi }_{j;n}}))^{2}}<\infty $. So, for any $\lambda >{A_{1}}$,
\[ \operatorname{\mathsf{P}}\left\{\frac{1}{n}{\sum \limits_{j=1}^{n}}h({\xi _{j;n}})>\lambda \right\}\le \frac{\operatorname{\mathsf{Var}}\frac{1}{n}{\textstyle\textstyle\sum _{j=1}^{n}}h({\xi _{j;n}})}{{\left(\lambda -\operatorname{\mathsf{E}}\frac{1}{n}{\textstyle\textstyle\sum _{j=1}^{n}}h({\xi _{j;n}})\right)^{2}}}\le \frac{{A_{2}}}{n{(\lambda -{A_{1}})^{2}}}\to 0\]
as $n\to \infty $. So (23) holds with ${\zeta _{n}}=\frac{{C_{a}}}{n}{\textstyle\sum _{j=1}^{n}}h({\xi _{j;n}})$.To show (24) observe that
\[ |{\Psi _{-in}^{(k)}}({\boldsymbol{\gamma }_{1}})-{\Psi _{-in}^{(k)}}({\boldsymbol{\gamma }_{2}})|\le {\sum \limits_{j=1}^{n}}|{a_{j;-in}^{k}}|h({\boldsymbol{\xi }_{j;n}})\rho ({\boldsymbol{\gamma }_{1}},{\boldsymbol{\gamma }_{2}})\le \frac{{C_{a}}\rho ({\boldsymbol{\gamma }_{1}},{\boldsymbol{\gamma }_{2}})}{n}{\sum \limits_{j=1}^{n}}h({\boldsymbol{\xi }_{j;n}})\]
and
for some ${C^{\prime }_{a}}<\infty $ due to Lemma 1. The rest of the proof is the same as for (23). □Lemma 4.
Let the following assumptions hold.
Then
as $n\to \infty $.
-
(i) Θ is a compact in ${\mathbb{R}^{d}}$.
-
(ii) Condition ${\boldsymbol{\Psi }_{h,\rho }}$ holds for $\{{\psi _{j;n}},\hspace{2.5pt}j=1,\dots ,n,\hspace{2.5pt}n=1,2,\dots \hspace{0.1667em}\}$.
-
(iii) ρ is a continuous function on $\Theta \times \Theta $ and $\rho (\boldsymbol{\gamma },\boldsymbol{\gamma })=0$ for all $\boldsymbol{\gamma }\in \Theta $.
-
(iv) For all $k=1,\dots ,M$, $\operatorname{\mathsf{E}}{(h({\boldsymbol{\xi }_{(k)}}))^{2}}<\infty $, ${\max _{j,n}}\operatorname{\mathsf{E}}|{\psi _{j;n}}({\boldsymbol{\xi }_{(k)}},{\boldsymbol{\vartheta }^{(k)}}){|^{2}}<\infty $.
-
(v) $\det {\Gamma _{\infty }}\ne 0$.
(25)
\[ \underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\Psi _{n}^{(k)}}(\boldsymbol{\gamma })-\operatorname{\mathsf{E}}{\Psi _{n}^{(k)}}(\boldsymbol{\gamma })|\stackrel{\text{P}}{\longrightarrow }0,\](26)
\[ \underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\Psi _{-in}^{(k)}}(\boldsymbol{\gamma })-\operatorname{\mathsf{E}}{\Psi _{-in}^{(k)}}(\boldsymbol{\gamma })|\stackrel{\text{P}}{\longrightarrow }0\]Proof.
Let ${\bar{\psi }_{j;n}}(\boldsymbol{\gamma })=\operatorname{\mathsf{E}}{\psi _{j;n}}({\boldsymbol{\xi }_{j;n}}\boldsymbol{\gamma })$,
To prove the lemma is sufficient to show that
and
as $n\to \infty $.
\[ {\tilde{\psi }_{j;n}}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma })={\psi _{j;n}}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma })-{\bar{\psi }_{j;n}}(\boldsymbol{\gamma }).\]
Then ${\tilde{\psi }_{j;n}}$ satisfy Condition ${\boldsymbol{\Psi }_{\tilde{h},\rho }}$ with $\tilde{h}(\mathbf{x})=h(\mathbf{x})+{C_{\Psi }}$, where ${C_{\Psi }}$ is some constant, e.g.,
\[ {C_{\Psi }}=\underset{k=1,\dots ,M}{\max }\operatorname{\mathsf{E}}h({\boldsymbol{\xi }_{(k)}})<\infty .\]
By the assumption (iv),
and
(28)
\[ \underset{j,n}{\max }\operatorname{\mathsf{E}}|{\tilde{\psi }_{j;n}}({\boldsymbol{\xi }_{(k)}},{\boldsymbol{\vartheta }^{(k)}}){|^{2}}<\infty ,\text{for all}\hspace{2.5pt}k=1,\dots ,M.\](29)
\[ \underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\tilde{\Psi }_{n}^{(k)}}(\boldsymbol{\gamma })|\stackrel{\text{P}}{\longrightarrow }0\](30)
\[ \underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\tilde{\Psi }_{-in}^{(k)}}(\boldsymbol{\gamma })|\stackrel{\text{P}}{\longrightarrow }0\]Let us show (29). Consider the case when ${\psi _{j;n}}$ are scalar-valued. Then
and
Assumptions (iii) and (iv) with inequalities (27) and (28) imply
for any $\boldsymbol{\gamma }\in \Theta $.
(31)
\[ \operatorname{\mathsf{E}}{\tilde{\Psi }_{n}^{(k)}}(\boldsymbol{\gamma })={\sum \limits_{j=1}^{n}}{a_{j;n}^{k}}\operatorname{\mathsf{E}}{\tilde{\psi }_{j;n}}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma })=0\](32)
\[ \operatorname{\mathsf{Var}}{\tilde{\Psi }_{n}^{(k)}}(\boldsymbol{\gamma })={\sum \limits_{j=1}^{n}}{({a_{j;n}^{k}})^{2}}\operatorname{\mathsf{Var}}{\tilde{\psi }_{j;n}}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma })\le \frac{{C_{a}^{2}}}{n}\underset{j,n}{\max }\operatorname{\mathsf{Var}}{\tilde{\psi }_{j;n}}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma }).\]
\[ \underset{j,n}{\max }\operatorname{\mathsf{Var}}{\tilde{\psi }_{j;n}}({\boldsymbol{\xi }_{j;n}},,\boldsymbol{\gamma })<\infty .\]
So, by (31) and (32), we get
(33)
\[ {\tilde{\Psi }_{n}^{(k)}}(\boldsymbol{\gamma })\stackrel{\text{P}}{\longrightarrow }0\hspace{2.5pt}\text{as}\hspace{2.5pt}n\to \infty \]It is obvious that if ${\psi _{j;n}}$ are vector- or matrix-valued, then (33) holds coordinatewise.
Applying Lemma 3 to ${\tilde{\psi }_{j;n}}$ one obtains
with some ${\zeta _{n}}={O_{P}}(1)$. To prove (29) we have to show that for any $\delta >0$ and $\varepsilon >0$ there exists such ${n_{0}}$ that for all $n>{n_{0}}$
Fix ε and δ. Choose a nonrandom ${C_{\zeta }}<\infty $ such that $\operatorname{\mathsf{P}}\{{\zeta _{n}}>{C_{\zeta }}\}\le \varepsilon /2$ for all n.
(34)
\[ |{\tilde{\Psi }_{n}^{(k)}}({\boldsymbol{\gamma }_{1}})-{\tilde{\Psi }_{n}^{(k)}}({\boldsymbol{\gamma }_{2}})|\le {\zeta _{n}}\rho ({\boldsymbol{\gamma }_{1}},{\boldsymbol{\gamma }_{1}})\hspace{2.5pt}\text{for all}\hspace{2.5pt}{\boldsymbol{\gamma }_{1}},{\boldsymbol{\gamma }_{2}}\in \Theta \](35)
\[ \operatorname{\mathsf{P}}\{\underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\tilde{\Psi }_{n}^{(k)}}(\boldsymbol{\gamma })|>\delta \}<\varepsilon .\]Since Θ is compact by assumption (iii) of the lemma there exists a finite set $T=\{{\mathbf{t}_{1}},\dots ,{\mathbf{t}_{L}}\}\subset \Theta $ such that for each $\boldsymbol{\gamma }\in \Theta $ one can choose $l(\boldsymbol{\gamma })\in \{1,\dots ,L\}$ with
Note that
The second term in the RHS of (37) tends to 0 as $n\to \infty $ due to (33). So it is less then $\varepsilon /2$ for n large enough. By (36) the first term is less then $\operatorname{\mathsf{P}}\{{\zeta _{n}}>{C_{\zeta }}\}<\varepsilon /2$.
(36)
\[ \rho (\boldsymbol{\gamma },{\mathbf{t}_{l(\boldsymbol{\gamma })}})<\frac{\delta }{2{C_{\zeta }}}.\]
\[ |{\tilde{\Psi }_{n}^{(k)}}(\boldsymbol{\gamma })|\le |{\tilde{\Psi }_{n}^{(k)}}(\boldsymbol{\gamma })-{\tilde{\Psi }_{n}^{(k)}}({\mathbf{t}_{l(\boldsymbol{\gamma })}})|+|{\tilde{\Psi }_{n}^{(k)}}({\mathbf{t}_{l(\boldsymbol{\gamma })}})|,\]
so
\[ \underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\tilde{\Psi }_{n}^{(k)}}(\boldsymbol{\gamma })|\le \underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\tilde{\Psi }_{n}^{(k)}}(\boldsymbol{\gamma })-{\tilde{\Psi }_{n}^{(k)}}({\mathbf{t}_{l(\boldsymbol{\gamma })}}|+\underset{l=1,\dots .L}{\max }|{\tilde{\Psi }_{n}^{(k)}}({\mathbf{t}_{l}})|\]
\[ \le {\zeta _{n}}\underset{\boldsymbol{\gamma }\in \Theta }{\sup }\rho (\boldsymbol{\gamma },{\mathbf{t}_{l(\boldsymbol{\gamma })}})+\underset{l=1,\dots .L}{\max }|{\tilde{\Psi }_{n}^{(k)}}({\mathbf{t}_{l}})|.\]
Therefore
(37)
\[ \operatorname{\mathsf{P}}\{\underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\tilde{\Psi }_{n}^{(k)}}(\boldsymbol{\gamma })>\delta \}\le \operatorname{\mathsf{P}}\{{\zeta _{n}}\underset{\boldsymbol{\gamma }\in \Theta }{\sup }\rho (\boldsymbol{\gamma },{\mathbf{t}_{l(\boldsymbol{\gamma })}})>\delta /2\}+\operatorname{\mathsf{P}}\{\underset{l=1,\dots ,L}{\max }|{\tilde{\Psi }_{n}^{(k)}}({\mathbf{t}_{l}})|>\delta /2\}.\]Let us show (26). Observe that
\[ {\tilde{\Psi }_{n}^{(k)}}(\boldsymbol{\gamma })-{\tilde{\Psi }_{-in}^{(k)}}(\boldsymbol{\gamma })={a_{i;n}^{k}}{\tilde{\psi }_{i}}({\boldsymbol{\xi }_{i,n}},\boldsymbol{\gamma })+\sum \limits_{j\ne i}({a_{j;n}^{k}}-{a_{j;-in}^{k}}){\tilde{\psi }_{j;n}}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma }).\]
To estimate ${\max _{j,n}}{\sup _{\boldsymbol{\gamma }\in \Theta }}|{\tilde{\psi }_{j;n}}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma })|$ we apply Lemma 2 with $H(x)=|x|$. Assumption (iv) of Lemma 4 implies that the assertion of Lemma 2 holds with $\alpha =2$. So
\[ \underset{j,n}{\max }\underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\tilde{\psi }_{j;n}}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma })|={O_{P}}({n^{1/2}}).\]
From Lemma 1 we get
so
\[ \underset{i=1,\dots ,n}{\max }\underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\tilde{\Psi }_{n}^{(k)}}(\boldsymbol{\gamma })-{\tilde{\Psi }_{-in}^{(k)}}(\boldsymbol{\gamma })|={O_{P}}({n^{-1/2}}).\]
This with (25) implies (26). □Proof of Theorem 1.
1. We will show that ${\hat{\boldsymbol{\vartheta }}_{n}^{(k)}}\stackrel{\text{P}}{\longrightarrow }{\boldsymbol{\vartheta }^{(k)}}$ as $n\to \infty $.
Let ${\mathbf{S}_{\infty }^{(k)}}(\boldsymbol{\gamma })=\operatorname{\mathsf{E}}\mathbf{s}({\boldsymbol{\xi }_{(k)}},\boldsymbol{\gamma })$. Assumptions (C3) and (C4) imply that ${\mathbf{S}_{\infty }^{(k)}}(\boldsymbol{\gamma })$ is continuous on $\boldsymbol{\gamma }\in \Theta $. By (C5), $|{\mathbf{S}_{\infty }^{(k)}}(\boldsymbol{\gamma })|>0$ for all $\boldsymbol{\gamma }\ne {\boldsymbol{\vartheta }^{(k)}}$.
Fix any $\varepsilon >0$ and consider ${\mathcal{N}_{\varepsilon }}=\{\boldsymbol{\gamma }\in \Theta :\hspace{2.5pt}|\boldsymbol{\gamma }-{\boldsymbol{\vartheta }^{(k)}}|\ge \varepsilon \}$. Then
as $n\to \infty $. Therefore
\[ {s_{\text{min}}}=\underset{\boldsymbol{\gamma }\in {\mathcal{N}_{\varepsilon }}}{\inf }|{\mathbf{S}_{\infty }^{(k)}}(\boldsymbol{\gamma })|>0.\]
For $\boldsymbol{\gamma }\in {\mathcal{N}_{\varepsilon }}$
\[ |{\mathbf{S}_{n}^{(k)}}(\boldsymbol{\gamma })|\ge |{\mathbf{S}_{\infty }^{(k)}}(\boldsymbol{\gamma })|-|{\mathbf{S}_{\infty }^{(k)}}(\boldsymbol{\gamma })-{\mathbf{S}_{n}^{(k)}}(\boldsymbol{\gamma })|\]
\[ \ge {s_{\text{min}}}-\underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\mathbf{S}_{\infty }^{(k)}}(\boldsymbol{\gamma })-{\mathbf{S}_{n}^{(k)}}(\boldsymbol{\gamma })|.\]
So
\[ \operatorname{\mathsf{P}}\left\{\underset{\boldsymbol{\gamma }\in {\mathcal{N}_{\varepsilon }}}{\inf }|{\mathbf{S}_{n}^{(k)}}(\boldsymbol{\gamma })|\le \frac{{s_{\text{min}}}}{2}\right\}\le \operatorname{\mathsf{P}}\left\{\underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\mathbf{S}_{\infty }^{(k)}}(\boldsymbol{\gamma })-{\mathbf{S}_{n}^{(k)}}(\boldsymbol{\gamma })|\ge \frac{{s_{\text{min}}}}{2}\right\}.\]
Applying Lemma 4 with ${\psi _{j;n}}=\mathbf{s}$ one obtains that
(38)
\[ \underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\mathbf{S}_{\infty }^{(k)}}(\boldsymbol{\gamma })-{\mathbf{S}_{n}^{(k)}}(\boldsymbol{\gamma })|\stackrel{\text{P}}{\longrightarrow }0\]
\[ \operatorname{\mathsf{P}}\left\{\underset{\boldsymbol{\gamma }\in {\mathcal{N}_{\varepsilon }}}{\inf }|{\mathbf{S}_{n}^{(k)}}(\boldsymbol{\gamma })|\le \frac{{s_{\text{min}}}}{2}\right\}\to 0.\]
Since ${\mathbf{S}_{n}^{(k)}}({\hat{\boldsymbol{\vartheta }}_{n}^{(k)}})=0$, this implies $\operatorname{\mathsf{P}}\{{\hat{\boldsymbol{\vartheta }}_{n}^{(k)}}\in {\mathcal{N}_{\varepsilon }}\}\to 0$ as $n\to \infty $, i.e. ${\hat{\boldsymbol{\vartheta }}_{n}^{(k)}}\stackrel{\text{P}}{\longrightarrow }{\boldsymbol{\vartheta }^{(k)}}$.
2. Let us show (15).
By Lemma 4,
Then
(39)
\[ \underset{i=1,\dots ,n}{\max }\underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\mathbf{S}_{-in}^{(k)}}-{\mathbf{S}_{\infty }^{(k)}}(\boldsymbol{\gamma })|\stackrel{\text{P}}{\longrightarrow }0.\]
\[ \operatorname{\mathsf{P}}\left\{\underset{i=1,\dots ,n}{\min }\underset{\boldsymbol{\gamma }\in {\mathcal{N}_{\varepsilon }}}{\inf }|{\mathbf{S}_{-in}^{(k)}}(\boldsymbol{\gamma })|\le \frac{{s_{\text{min}}}}{2}\right\}\]
\[ \le \operatorname{\mathsf{P}}\left\{\underset{i=1,\dots ,n}{\max }\underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\mathbf{S}_{\infty }^{(k)}}(\boldsymbol{\gamma })-{\mathbf{S}_{-in}^{(k)}}(\boldsymbol{\gamma })|\ge \frac{{s_{\text{min}}}}{2}\right\}\stackrel{\text{P}}{\longrightarrow }0.\]
From this we obtain (15) by the same way as in the first part of the proof. □Proof of Theorem 4.
Let
Similarly,
By the Mean Value theorem, there exists ${t_{ni}^{l}}\in [0,1]$ such that
where
as $n\to \infty $.
\[ {\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma })=\mathbf{s}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma })-\operatorname{\mathsf{E}}\mathbf{s}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma }).\]
Then, due to (6),
\[ {\sum \limits_{j=1}^{n}}{a_{j;n}^{l}}{\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma })={\sum \limits_{j=1}^{n}}{a_{j;n}^{l}}\mathbf{s}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma })-{\sum \limits_{m=1}^{M}}{\sum \limits_{j=1}^{n}}{a_{j;n}^{l}}{p_{j;n}^{m}}\operatorname{\mathsf{E}}\mathbf{s}({\boldsymbol{\xi }_{(m)}},\boldsymbol{\gamma })\]
\[ ={\sum \limits_{j=1}^{n}}{a_{j;n}^{l}}\mathbf{s}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma })={\mathbf{S}_{n}^{(l)}}(\boldsymbol{\gamma }).\]
So
(40)
\[ {\mathbf{S}_{n}^{(l)}}({\hat{\boldsymbol{\vartheta }}_{n}^{(l)}})={\sum \limits_{j=1}^{n}}{a_{j;n}^{l}}{\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\hat{\boldsymbol{\vartheta }}_{n}^{(l)}})=0.\](41)
\[ {\mathbf{S}_{-in}^{(l)}}({\hat{\boldsymbol{\vartheta }}_{-in}^{(l)}})=\sum \limits_{j\ne i}{a_{j;-in}^{l}}{\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\hat{\boldsymbol{\vartheta }}_{-in}^{(l)}})=0.\](42)
\[ -{\mathbf{S}_{-in}^{(l)}}({\hat{\boldsymbol{\vartheta }}_{n}^{(l)}})={\mathbf{S}_{-in}^{(l)}}({\hat{\boldsymbol{\vartheta }}_{-in}^{(l)}})-{\mathbf{S}_{-in}^{(l)}}({\hat{\boldsymbol{\vartheta }}_{n}^{(l)}})={\dot{\mathbf{S}}_{-in}^{(l)}}({\boldsymbol{\zeta }_{-in}^{(l)}})({\hat{\boldsymbol{\vartheta }}_{-in}^{(l)}}-{\hat{\boldsymbol{\vartheta }}_{n}^{(l)}}),\]
\[ {\boldsymbol{\zeta }_{-in}^{(l)}}=(1-{t_{ni}^{l}}){\hat{\boldsymbol{\vartheta }}_{-in}^{(l)}}+{t_{ni}^{l}}{\hat{\boldsymbol{\vartheta }}_{n}^{(l)}}.\]
Observe that, by Assumption (JK1),
\[ |{\dot{\mathbf{s}}_{j;n}}(\mathbf{x},{\boldsymbol{\gamma }_{1}})-{\dot{\mathbf{s}}_{j;n}}(\mathbf{x},{\boldsymbol{\gamma }_{2}})|\le 2h(\mathbf{x})|{\boldsymbol{\gamma }_{1}}-{\boldsymbol{\gamma }_{2}}|.\]
Applying Lemma 4 with ${\psi _{j;n}}={\dot{\mathbf{s}}_{j;n}^{(l)}}$, $\rho ({\boldsymbol{\gamma }_{1}},{\boldsymbol{\gamma }_{2}})$, we obtain
(43)
\[ \underset{i=1,\dots ,n}{\max }\underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\dot{\mathbf{S}}_{-in}^{(l)}}(\boldsymbol{\gamma })-{\mathbf{M}^{(l)}}(\boldsymbol{\gamma })|\stackrel{\text{P}}{\longrightarrow }0\]By Assumption (JK1), applying the Lebesgue Dominated Convergence theorem, we obtain ${\mathbf{M}^{(l)}}(\boldsymbol{\gamma })\to {\mathbf{M}^{(l)}}({\boldsymbol{\vartheta }^{(l)}})$ as $\boldsymbol{\gamma }\to {\boldsymbol{\vartheta }^{(l)}}$. So, by Assumptions (JK2) and (JK3), we obtain
So, with probability which tends to 1 as $n\to \infty $,
where ${\tilde{\boldsymbol{\zeta }}_{-in}^{(l)}}$ are some intermediate points between ${\hat{\boldsymbol{\vartheta }}_{n}^{(l)}}$ and ${\boldsymbol{\vartheta }^{(l)}}$. From (43) and Assumption (JK2) we obtain
Then
\[ \underset{i=1,\dots ,n}{\max }|{\dot{\mathbf{S}}_{-in}^{(l)}}({\boldsymbol{\zeta }_{-in}^{(l)}})-{\mathbf{M}^{(l)}}({\boldsymbol{\vartheta }^{(l)}})|\stackrel{\text{P}}{\longrightarrow }0.\]
By (AN5) ${\mathbf{M}^{(l)}}({\boldsymbol{\vartheta }^{(l)}})={\mathbf{M}^{(l)}}$ is nonsingular, so
\[ \operatorname{\mathsf{P}}\{\det {\dot{\mathbf{S}}_{-in}^{(l)}}({\boldsymbol{\zeta }_{-in}^{(l)}})\ne 0,\forall i=1,\dots ,n\}\to 1\]
and
(44)
\[ {\Lambda _{n}^{l}}=\underset{i=1,\dots ,n}{\max }|{({\dot{\mathbf{S}}_{-in}^{(l)}}({\boldsymbol{\zeta }_{-in}^{(l)}}))^{-1}}|={O_{p}}(1).\](45)
\[ \begin{aligned}{}|{\hat{\boldsymbol{\vartheta }}_{-in}^{(l)}}-{\hat{\boldsymbol{\vartheta }}_{n}^{(l)}}|& =|{({\dot{\mathbf{S}}_{-in}^{(l)}}({\boldsymbol{\zeta }_{-in}^{(l)}}))^{-1}}(-{\mathbf{S}_{-in}^{(l)}}({\hat{\boldsymbol{\vartheta }}_{n}}))|\le {\Lambda _{n}^{l}}|{\mathbf{S}_{-in}^{(l)}}({\hat{\boldsymbol{\vartheta }}_{n}^{(l)}})|\\ {} & ={\Lambda _{n}^{l}}|{\mathbf{S}_{-in}^{(l)}}({\boldsymbol{\vartheta }^{(l)}})+{\dot{\mathbf{S}}_{-in}^{(l)}}({\tilde{\boldsymbol{\zeta }}_{-in}^{(l)}})({\hat{\boldsymbol{\vartheta }}_{n}^{(l)}}-{\boldsymbol{\vartheta }^{(l)}})|,\end{aligned}\](46)
\[ {\dot{\mathbf{S}}_{-in}^{(l)}}({\tilde{\boldsymbol{\zeta }}_{ni}^{(l)}})({\hat{\boldsymbol{\vartheta }}_{n}^{(l)}}-{\boldsymbol{\vartheta }^{(l)}})={O_{P}}({n^{-1/2}}).\]
\[ {\mathbf{S}_{-in}^{(l)}}({\boldsymbol{\vartheta }^{(l)}})={\mathbf{S}_{n}^{(l)}}({\boldsymbol{\vartheta }^{(l)}})-{a_{i;n}^{l}}{\mathbf{s}_{i;n}}({\boldsymbol{\xi }_{i;n}},{\boldsymbol{\vartheta }^{(l)}})-\sum \limits_{j\ne i}({a_{j;n}^{l}}-{a_{j;-in}^{l}}){\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\boldsymbol{\vartheta }^{(l)}}).\]
Observe that $\operatorname{\mathsf{E}}{\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\boldsymbol{\vartheta }^{(l)}})=0$ and
\[ \operatorname{\mathsf{E}}|{\mathbf{S}_{n}^{(l)}}({\boldsymbol{\vartheta }^{(l)}}){|^{2}}={\sum \limits_{j=1}^{n}}{({a_{j;n}^{(l)}})^{2}}\operatorname{\mathsf{E}}|{\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\boldsymbol{\vartheta }^{(l)}}){|^{2}}=O({n^{-1}})\]
due to Lemma 1. So ${\mathbf{S}_{n}^{(l)}}({\boldsymbol{\vartheta }^{(l)}})={O_{P}}({n^{-1/2}})$.By Lemmas 1, 2 and Assumption (JK1),
for any $\beta \ge 1/\alpha $. With $\beta =1/2$ we get
Let ${\mathbf{M}_{-in}^{(l)}}={\dot{\mathbf{S}}_{-in}^{(l)}}({\boldsymbol{\zeta }_{-in}^{(l)}})$. By (40)–(42), we obtain
Put
Then, by (49),
where
Consider
We will show that
and
Convergences (56) and (57) for all $k,m=1,\dots ,M$ imply the statement of the theorem.
(47)
\[ \underset{i=1,\dots ,n}{\max }|{a_{i;n}^{l}}{\mathbf{s}_{i;n}}({\boldsymbol{\xi }_{i;n}},{\boldsymbol{\vartheta }^{(l)}})+\sum \limits_{j\ne i}({a_{j;n}^{l}}-{a_{j;-in}^{l}}){\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\boldsymbol{\vartheta }^{(l)}})|={O_{P}}({n^{\beta -1}})\]
\[ \underset{i=1,\dots ,n}{\max }|{\mathbf{S}_{-in}^{(l)}}({\boldsymbol{\vartheta }^{(l)}})|={O_{P}}({n^{-1/2}}).\]
This with (44)–(46) yields
(48)
\[ \underset{i=1,\dots ,n}{\max }|{\hat{\boldsymbol{\vartheta }}_{-in}^{(l)}}-{\hat{\boldsymbol{\vartheta }}^{(l)}}|={O_{P}}({n^{-1/2}}).\](49)
\[ \begin{aligned}{}{\hat{\boldsymbol{\vartheta }}_{-in}^{(l)}}-{\hat{\boldsymbol{\vartheta }}_{n}^{(l)}}& ={({\mathbf{M}_{-in}^{(l)}})^{-1}}({\mathbf{S}_{n}^{(l)}}({\hat{\boldsymbol{\vartheta }}_{n}^{(l)}})-{\mathbf{S}_{-in}^{(l)}}({\hat{\boldsymbol{\vartheta }}_{n}^{(l)}}))\\ {} & ={({\mathbf{M}_{-in}^{(l)}})^{-1}}\left({a_{i;n}^{l}}{\mathbf{s}_{i;n}}({\boldsymbol{\xi }_{i;n}},{\hat{\boldsymbol{\vartheta }}_{n}^{(l)}})\hspace{-0.1667em}-\hspace{-0.1667em}\sum \limits_{j\ne i}({a_{j;n}^{l}}\hspace{-0.1667em}-\hspace{-0.1667em}{a_{j;-in}^{l}}){\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\hat{\boldsymbol{\vartheta }}_{n}^{(l)}})\right)\hspace{-0.1667em}.\end{aligned}\](50)
\[ {\mathbf{U}_{i}^{(l)}}={a_{i;n}^{l}}{\mathbf{s}_{i;n}}({\boldsymbol{\xi }_{i;n}},{\boldsymbol{\vartheta }^{(l)}})+\sum \limits_{j\ne i}({a_{j;n}^{l}}-{a_{j;-in}^{l}}){\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\boldsymbol{\vartheta }^{(l)}}),\](51)
\[ \begin{aligned}{}{\Delta _{i}^{(l)U}}=& {a_{i;n}^{l}}({\mathbf{s}_{i;n}}({\boldsymbol{\xi }_{i;n}},{\hat{\boldsymbol{\vartheta }}_{n}^{(l)}})-{\mathbf{s}_{i;n}}({\boldsymbol{\xi }_{i;n}},{\boldsymbol{\vartheta }^{(l)}}))\\ {} & +\sum \limits_{j\ne i}({a_{j;n}^{l}}-{a_{j;-in}^{l}})({\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\hat{\boldsymbol{\vartheta }}_{n}})-{\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\boldsymbol{\vartheta }^{(l)}})),\end{aligned}\]
\[ {\hat{\boldsymbol{\vartheta }}_{-in}^{(l)}}-{\hat{\boldsymbol{\vartheta }}_{n}^{(l)}}=({({\mathbf{M}^{(l)}})^{-1}}+{\Delta _{i}^{(l)M}})({\mathbf{U}_{i}^{(l)}}+{\Delta _{i}^{(l)U}}).\]
So
(53)
\[ {\hat{\mathbf{V}}_{n}^{(k,m)}}=n{\sum \limits_{i=1}^{n}}({\hat{\boldsymbol{\vartheta }}_{-in}^{(k)}}-{\hat{\boldsymbol{\vartheta }}_{n}^{(k)}}){({\hat{\boldsymbol{\vartheta }}_{-in}^{(m)}}-{\hat{\boldsymbol{\vartheta }}_{n}^{(m)}})^{T}}=n{\sum \limits_{i=1}^{n}}{v_{i}},\](54)
\[ {\mathbf{v}_{i}}=({({\mathbf{M}^{(k)}})^{-1}}+{\Delta _{i}^{(k)M}})({\mathbf{U}_{i}^{(k)}}+{\Delta _{i}^{(k)U}}){({\mathbf{U}_{i}^{(m)}}+{\Delta _{i}^{(m)U}})^{T}}{({({\mathbf{M}^{(m)}})^{-1}}+{\Delta _{i}^{(m)M}})^{-T}}.\](55)
\[ {\tilde{\mathbf{V}}_{n}^{(k,m)}}=n{\sum \limits_{i=1}^{n}}{\tilde{\mathbf{v}}_{i}},\hspace{2.5pt}{\tilde{\mathbf{v}}_{i}}={({\mathbf{M}^{(k)}})^{-1}}{\mathbf{U}_{i}^{(k)}}{({\mathbf{U}_{i}^{(m)}})^{T}}{({\mathbf{M}^{(m)}})^{-T}}.\](56)
\[ |{\hat{\mathbf{V}}_{n}^{(k,m)}}-{\tilde{\mathbf{V}}_{n}^{(k,m)}}|\stackrel{\text{P}}{\longrightarrow }0,\hspace{2.5pt}\text{as}\hspace{2.5pt}n\to \infty ,\](57)
\[ {\tilde{\mathbf{V}}_{n}}(k,m)\stackrel{\text{P}}{\longrightarrow }{\mathbf{V}^{(k,m)}},\hspace{2.5pt}\text{as}\hspace{2.5pt}n\to \infty .\]To show (56) consider the following expansion
Let us estimate each ${\mathbf{v}_{i}^{l}}$ separately. At first we bound ${\Delta ^{(k)M}}$.
\[ {\mathbf{v}_{i}}-{\tilde{\mathbf{v}}_{i}}={\mathbf{v}_{i}^{1}}+{\mathbf{v}_{i}^{2}}+{\mathbf{v}_{1}^{3}}+{\mathbf{v}_{1}^{4}},\]
where
(58)
\[ \begin{aligned}{}{\mathbf{v}_{i}^{1}}& ={\Delta _{i}^{(k)M}}({\mathbf{U}_{i}^{(k)}}+{\Delta _{i}^{(k)U}}){({\mathbf{U}_{i}^{(m)}}+{\Delta _{i}^{(m)U}})^{T}}{({({\mathbf{M}^{(m)}})^{-1}}+{\Delta _{i}^{(m)M}})^{T}},\\ {} {\mathbf{v}_{i}^{2}}& ={({\mathbf{M}^{(k)}})^{-1}}{\Delta _{i}^{(k)U}}{({\mathbf{U}_{i}^{(m)}}+{\Delta _{i}^{(m)U}})^{T}}{({({\mathbf{M}^{(m)}})^{-1}}+{\Delta _{i}^{(m)M}})^{T}},\\ {} {\mathbf{v}_{i}^{3}}& ={({\mathbf{M}^{(k)}})^{-1}}{\mathbf{U}_{i}^{(k)}}{({\Delta _{i}^{(m)U}})^{T}}{({({\mathbf{M}^{(m)}})^{-1}}+{\Delta _{i}^{(m)M}})^{T}},\\ {} {\mathbf{v}_{i}^{4}}& ={({\mathbf{M}^{(k)}})^{-1}}{\mathbf{U}_{i}^{(k)}}{({\mathbf{U}_{i}^{(m)}})^{T}}({({\Delta _{i}^{(m)M}})^{T}}.\end{aligned}\]Applying Lemma 4 to ${\psi _{j;n}}(\mathbf{x},\boldsymbol{\gamma })=\frac{\partial }{\partial {\gamma ^{(l)}}}{\dot{\mathbf{s}}_{j;n}}(\mathbf{x},\boldsymbol{\gamma })$, $l=1,\dots ,d$, by the same way as for ${\mathbf{S}_{-in}^{(k)}}$ one obtains
Then, by the Mean Value theorem and Assumption (JK2), we get
Applying Lemmas 1 and 2 by the same way as in (47) to ${\dot{\mathbf{s}}_{j;n}^{(l)}}$ we obtain
Variances of each entry of ${\dot{\mathbf{S}}_{n}^{(k)}}(\boldsymbol{\vartheta })$ can be estimated as $O({n^{-1}})$, so
Since ${\mathbf{M}_{-in}^{(k)}}={\dot{\mathbf{S}}_{-in}^{(k)}}({\boldsymbol{\zeta }_{-in}^{(k)}})$, formulas (60)–(62) yield
So, due to Assumption (AN5),
(59)
\[ \underset{i=1,\dots ,n}{\max }\underset{\boldsymbol{\gamma }\in \Theta }{\sup }\left|\frac{\partial }{\partial {\gamma ^{l}}}{\mathbf{S}_{-in}^{(k)}}(\boldsymbol{\gamma })\right|={O_{p}}(1).\](60)
\[ \underset{i=1,\dots ,n}{\max }|{\dot{\mathbf{S}}_{-in}^{(k)}}({\boldsymbol{\zeta }_{-in}^{(k)}})-{\dot{\mathbf{S}}_{-in}^{(k)}}({\boldsymbol{\vartheta }^{(k)}})|={O_{P}}(1){O_{P}}({n^{-1/2}})={O_{P}}({n^{-1/2}}).\](61)
\[ \underset{i=1,\dots ,n}{\max }|{\dot{\mathbf{S}}_{-in}^{(k)}}({\boldsymbol{\vartheta }^{(k)}})-{\dot{\mathbf{S}}^{(k)}}({\boldsymbol{\vartheta }^{(k)}})|={O_{P}}({n^{\beta -1}}).\](62)
\[ |{\dot{\mathbf{S}}_{n}^{(k)}}({\boldsymbol{\vartheta }^{(k)}})-{\mathbf{M}^{(k)}}|={O_{P}}({n^{-1/2}}).\]By Lemmas 1 and 2 and Assumption (JK2),
Let us bound
(here ${\boldsymbol{\zeta }_{j}}$ are some intermediate points between ${\hat{\boldsymbol{\vartheta }}_{n}^{(k)}}$ and ${\boldsymbol{\vartheta }^{(k)}}$).
(65)
\[ \begin{aligned}{}\underset{i=1,\dots ,n}{\max }|{\Delta _{i}^{(k)U}}|\le & \underset{i=1,\dots ,n}{\max }(|{a_{i;n}^{k}})|\cdot |{\dot{\mathbf{s}}_{i;n}}({\boldsymbol{\zeta }_{i}})|\cdot |{\hat{\boldsymbol{\vartheta }}_{n}^{(k)}}-{\boldsymbol{\vartheta }^{(k)}}|\\ {} & +\sum \limits_{j\ne i}|{a_{j;n}^{k}}-{a_{j;-in}^{k}}|\cdot |{\dot{\mathbf{s}}_{i;n}}({\boldsymbol{\zeta }_{j}})|\cdot |{\hat{\boldsymbol{\vartheta }}_{n}^{(k)}}-{\boldsymbol{\vartheta }^{(k)}}|.\end{aligned}\]By Lemma 2, ${\max _{j=1,\dots ,n}}{\sup _{\boldsymbol{\gamma }\in \Theta }}|{\dot{\mathbf{s}}_{j;n}}(\boldsymbol{\gamma })|={O_{P}}({n^{\beta }})$. So (65), Lemma 1 and (JK2) imply
Now we bound ${\mathbf{v}_{i}^{l}}$ defined by (58).
By (63), (64) and (66),
\[\begin{aligned}{}\underset{i=1,\dots ,n}{\max }|{\mathbf{v}_{i}^{1}}|& \le \underset{i=1,\dots ,n}{\max }|{\Delta _{i}^{(k)M}}|\cdot (|{\mathbf{U}_{i}^{(k)}}|+|{\Delta _{i}^{(k)U}}|)\\ {} & \cdot (|{\mathbf{U}_{i}^{(m)}}|+|{\Delta _{i}^{(m)U}}|)(|{({\mathbf{M}^{(m)}})^{-1}}|+|{\Delta _{i}^{(m)M}}|)\\ {} & ={O_{P}}({n^{-1/2}}){O_{P}}{({n^{\beta -1}})^{2}}{O_{P}}(1)={O_{P}}({n^{2\beta -5/2}}).\end{aligned}\]
Similarly,
For ${\mathbf{v}_{i}^{2}}$ (and, similarly, ${\mathbf{v}_{i}^{3}}$), we have
\[\begin{aligned}{}\underset{i=1,\dots ,n}{\max }|{\mathbf{v}_{i}^{2}}|& \le \underset{i=1,\dots ,n}{\max }|{({\mathbf{M}^{(k)}})^{-1}}|\cdot |{\Delta _{i}^{(k)U}}|\\ {} & \cdot (|{\mathbf{U}_{i}^{(m)}}|+|{\Delta _{i}^{(m)U}}|)(|{({\mathbf{M}^{(m)}})^{-1}}|+|{\Delta _{i}^{(m)M}}|)\\ {} & ={O_{P}}({n^{\beta -3/2}}){O_{P}}({n^{\beta -1}}){O_{P}}(1)={O_{P}}({n^{2\beta -5/2}}).\end{aligned}\]
Therefore
\[ |{\hat{\mathbf{V}}_{n}^{(m,k)}}-{\tilde{\mathbf{V}}_{n}^{(m,k)}}|\le n{\sum \limits_{i=1}^{n}}|{\mathbf{v}_{i}}-{\tilde{\mathbf{v}}_{i}}|\le {n^{2}}\underset{i=1,\dots ,n}{\max }{\sum \limits_{l=1}^{4}}|{\mathbf{v}_{i}^{l}}|={O_{P}}({n^{2\beta -1/2}})={o_{P}}(1)\]
for $1/\alpha \le \beta <1/4$. (Recall that we can take any $\beta \ge 1/\alpha $ and $\alpha >4$).Consider
Let us bound
(here and below ${\hat{Z}_{n}^{pq(k,m)}}$ is the $(p,q)$ entry of the matrix ${\hat{\mathbf{Z}}_{n}^{(k,m)}}$, ${U_{i}^{p(k)}}$ is the p-th entry of the vector ${\mathbf{U}^{(k)}}$).
\[ {\hat{\mathbf{Z}}_{n}^{(k,m)}}=n{\sum \limits_{i=1}^{n}}{\mathbf{U}_{i}^{(k)}}{({\mathbf{U}_{i}^{(m)}})^{T}}.\]
Then $\operatorname{\mathsf{E}}{\hat{\mathbf{Z}}_{n}^{(k,m)}}={\bar{\mathbf{Z}}_{1,n}}+{\bar{\mathbf{Z}}_{2,n}}$, where
\[ {\bar{\mathbf{Z}}_{1,n}}=n{\sum \limits_{j=1}^{n}}{a_{j;n}^{k}}{a_{j;n}^{m}}\operatorname{\mathsf{E}}{\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\boldsymbol{\vartheta }^{(k)}}){\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\boldsymbol{\vartheta }^{(m)}}),\]
\[ {\bar{\mathbf{Z}}_{2,n}}=n{\sum \limits_{j=1}^{n}}\sum \limits_{j\ne i}({a_{j;n}^{k}}-{a_{j;-in}^{k}})({a_{j;n}^{m}}-{a_{j;-in}^{m}})\operatorname{\mathsf{E}}{\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\boldsymbol{\vartheta }^{(k)}}){\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\boldsymbol{\vartheta }^{(m)}}).\]
By Lemma 1,
\[ \underset{i=1,\dots ,n}{\max }|\hspace{-0.1667em}\sum \limits_{j\ne i}({a_{j;n}^{k}}\hspace{-0.1667em}-{a_{j;-in}^{k}})({a_{j;n}^{m}}-{a_{j;-in}^{m}})\hspace{-0.1667em}\operatorname{\mathsf{E}}{\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\boldsymbol{\vartheta }^{(k)}}){\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},\hspace{-0.1667em}{\boldsymbol{\vartheta }^{(m)}})|\hspace{-0.1667em}=\hspace{-0.1667em}O({n^{-3}}),\]
so ${\bar{\mathbf{Z}}_{2,n}}={n^{2}}O({n^{-3}})=O({n^{-1}})$. This implies
(67)
\[ \operatorname{\mathsf{E}}{\hat{\mathbf{Z}}_{n}^{(k,m)}}\sim {\bar{\mathbf{Z}}_{1,n}}\to {\mathbf{Z}^{(k,m)}}.\](68)
\[ \operatorname{\mathsf{E}}|{\hat{\mathbf{Z}}_{n}^{(k,m)}}-\operatorname{\mathsf{E}}{\hat{\mathbf{Z}}_{n}^{(k,m)}}{|^{2}}\le {\sum \limits_{p,q=1}^{d}}\operatorname{\mathsf{Var}}({\hat{Z}_{n}^{pq(k,m)}})\]Consider
(We applied the Cauchy–Schwarz inequality here.)
(69)
\[ \begin{aligned}{}\operatorname{\mathsf{Var}}({\hat{Z}_{n}^{pq(k,m)}})& ={n^{2}}{\sum \limits_{i=1}^{n}}\operatorname{\mathsf{Var}}({U_{i}^{p(k)}}{U_{i}^{q(m)}})\le {n^{3}}\underset{i,p,q}{\max }\operatorname{\mathsf{E}}{({U_{i}^{p(k)}}{U_{i}^{q(m)}})^{2}}\\ {} & \le {n^{3}}\sqrt{\underset{i,p,q}{\max }\operatorname{\mathsf{E}}{({U_{i}^{p(k)}})^{4}}\operatorname{\mathsf{E}}{({U_{i}^{q(m)}})^{4}}}\le {n^{3}}\underset{i,p,l}{\max }\operatorname{\mathsf{E}}{({U_{i}^{p(l)}})^{4}}.\end{aligned}\]Let ${\eta _{j}}={s_{j;n}^{p}}({\boldsymbol{\xi }_{j;n}},{\boldsymbol{\vartheta }^{(k)}})$, ${b_{ij}}={a_{j;n}^{k}}-{a_{j;-in}^{k}}$. Then $\operatorname{\mathsf{E}}{\eta _{i}}=0$, so
where
(70)
\[ \operatorname{\mathsf{E}}{({U_{i}^{p(k)}})^{4}}=\operatorname{\mathsf{E}}{({a_{i}^{k}}{\eta _{i}}+\sum \limits_{j\ne i}{b_{ij}}{\eta _{j}})^{4}}={J_{1;n}}+{J_{2;n}}+{J_{3,n}},\]
\[ {J_{1;n}}={({a_{i;n}^{k}})^{4}}\operatorname{\mathsf{E}}{({\eta _{i}})^{4}},\hspace{2.5pt}{J_{2;n}}=6{({a_{i;n}^{k}})^{2}}\operatorname{\mathsf{E}}{({\eta _{i}})^{2}}\operatorname{\mathsf{E}}{(\sum \limits_{j\ne i}{b_{ij}}{\eta _{j}})^{2}},\]
By Assumption (JK1) and Lemma 1, ${J_{1;n}}=O({n^{-4}})$,
\[ {J_{2;n}}\le \frac{C}{{n^{2}}}\sum \limits_{j\ne i}{({b_{ij}})^{2}}\operatorname{\mathsf{E}}{({\eta _{j}})^{2}}=O({n^{-5}}),\]
\[ {J_{3;n}}\le \sum \limits_{j\ne i}{({b_{ij}})^{4}}\operatorname{\mathsf{E}}{({\eta _{j}})^{4}}+6\sum \limits_{{j_{1}},{j_{2}}\ne i}{({b_{i{j_{1}}}})^{2}}{({b_{i{j_{2}}}})^{2}}\operatorname{\mathsf{E}}{({\eta _{{j_{1}}}})^{2}}\operatorname{\mathsf{E}}{({\eta _{{j_{2}}}})^{2}}\]
So, from (70) we obtain
and (69), (68) yield
\[ \operatorname{\mathsf{E}}|{\hat{Z}_{n}^{(k,m)}}-\operatorname{\mathsf{E}}{\hat{Z}_{n}^{(k,m)}}{|^{2}}=O({n^{-4}}){n^{3}}=O({n^{-1}}).\]
This with (67) imply (57). □