Modern Stochastics: Theory and Applications logo


  • Help
Login Register

  1. Home
  2. Issues
  3. Volume 9, Issue 4 (2022)
  4. Jackknife for nonlinear estimating equat ...

Modern Stochastics: Theory and Applications

Submit your article Information Become a Peer-reviewer
  • Article info
  • Full article
  • Related articles
  • Cited by
  • More
    Article info Full article Related articles Cited by

Jackknife for nonlinear estimating equations
Volume 9, Issue 4 (2022), pp. 377–399
Rostyslav Maiboroda ORCID icon link to view author Rostyslav Maiboroda details   Vitalii Miroshnychenko   Olena Sugakova  

Authors

 
Placeholder
https://doi.org/10.15559/22-VMSTA208
Pub. online: 14 June 2022      Type: Research Article      Open accessOpen Access

Received
23 January 2022
Revised
26 May 2022
Accepted
26 May 2022
Published
14 June 2022

Abstract

In mixture with varying concentrations model (MVC) one deals with a nonhomogeneous sample which consists of subjects belonging to a fixed number of different populations (mixture components). The population which a subject belongs to is unknown, but the probabilities to belong to a given component are known and vary from observation to observation. The distribution of subjects’ observed features depends on the component which it belongs to.
Generalized estimating equations (GEE) for Euclidean parameters in MVC models are considered. Under suitable assumptions the obtained estimators are asymptotically normal. A jackknife (JK) technique for the estimation of their asymptotic covariance matrices is described. Consistency of JK-estimators is demonstrated. An application to a model of mixture of nonlinear regressions and a real life example are presented.

1 Introduction

This paper continues studies of the jackknife (JK) technique application for statistical inference based on the model of mixture with varying concentrations (MVC). JK is a powerful tool of asymptotic covariance estimation of asymptotically normal statistics introduced by Quenouille (1949) and Tukey (1958). On its applications for homogeneous samples, see [11] and [1]. The JK technique was applied to heteroscedasitc nonlinear regression models in [9]. Applications to errors-in-variables models are considered in [12, 13].
In MVC models one deals with a nonhomogeneous sample which consists of subjects belonging to M different subpopulations (mixture components). One knows the probabilities with which a subject belongs to the mixture components and these probabilities are different for different subjects. So the considered observations are independent but not identically distributed. Modification of JK to such data analysis is a challenging problem.
On parametric inference in regression MVC models, see [2]. Estimation in nonparametric MVC models is discussed in [3]. In [5] a jackknife application to MVC of linear regressions models with errors in variables is considered. It is shown that the JK-estimators are consistent and allow to construct asymptotic confidence intervals for regression coefficients based on orthogonal regression estimators. In [7] a general result on asymptotic normality of generalized estimating equation (GEE) estimators for MVC is obtained which is applied to derive asymptotic normality of a modification of least squares (LS) estimators for MVC of nonlinear regressions models. A JK estimator for the asymptotic covariance was introduced in [7] also, but its properties were not investigated analytically.
In this paper we consider JK estimation of asymptotic covariance of GEE estimators in MVC models and show its consistency. The MVC model and the GEE estimator are discussed in Section 2. A version of JK for MVC is described in Section 3. Main results on consistency, asymptotic normality of GEE estimator and consistency of JK estimator of asymptotic covariance are presented in Section 4. Here we also consider an application to some nonlinear regression model. In Section 5 the developed statistical techniques are applied to a real life sociological data. Conclusive remarks are placed in Section 6. Section 7 contains technical proofs.

2 MVC model and GEE estimation

In the MVC model we assume that each observed subject O belongs to one of M different mixture components (subpopulations) ${\mathcal{P}_{k}}$, $k=1,\dots ,M$. The sample contains n subjects ${O_{1}}$,…, ${O_{n}}$. Let ${\kappa _{j}}=k$ iff ${O_{j}}\in {\mathcal{P}_{k}}$. The true ${\kappa _{j}}$ are unknown, but one knows the mixing probabilities
\[ {p_{j;n}^{m}}=\operatorname{\mathsf{P}}\{{\kappa _{j}}=k\}.\]
These probabilities are also called the concentrations of the k-th component at j-th observation.
The D-dimensional vector of observed variables of O will be denoted by $\boldsymbol{\xi }(O)={({\xi ^{1}}(O),\dots ,{\xi ^{D}}(O))^{T}}\in {\mathbb{R}^{D}}$, ${\boldsymbol{\xi }_{j}}={\boldsymbol{\xi }_{j;n}}=\boldsymbol{\xi }({O_{j}})$.
Let ${F^{(k)}}$ be the distribution of $\boldsymbol{\xi }(O)$ for $O\in {\mathcal{P}_{k}}$, i.e.
\[ {F^{(k)}}(A)=\operatorname{\mathsf{P}}\{\boldsymbol{\xi }(O)\in A\hspace{2.5pt}|\hspace{2.5pt}O\in {\mathcal{P}_{k}}\}\]
for all Borel sets $A\subseteq {\mathbb{R}^{D}}$. Then
(1)
\[ \operatorname{\mathsf{P}}\{{\boldsymbol{\xi }_{j}}\in A\}={\sum \limits_{k=1}^{M}}{p_{j;n}^{k}}{F^{(k)}}(A).\]
So, in the MVC model one observes independent ${\boldsymbol{\xi }_{j}}$, $j=1,\dots ,n$, with the distribution defined by (1). In this paper we adopt a semiparametric model of components’ distributions
(2)
\[ {F^{(k)}}(A)=F(A,{\boldsymbol{\vartheta }^{(k)}},{\boldsymbol{\nu }^{(k)}}),\hspace{2.5pt}k=1,\dots ,M,\]
where F is some known function of its arguments, ${\boldsymbol{\vartheta }^{(k)}}\in \Theta \subseteq {\mathbb{R}^{d}}$ are unknown Euclidean parameters of interest, ${\boldsymbol{\nu }^{(k)}}$ are some nonparametric nuisance parameters.
In what follows we will denote by ${\boldsymbol{\xi }_{(k)}}$ a random vector with distribution ${F^{(k)}}$ which can be considered as the value of $\boldsymbol{\xi }(O)$ for a subject O selected at random from the component ${\mathcal{P}_{k}}$.
Example.
Consider the model of mixture of regressions from [7]. In this model the observations are ${\boldsymbol{\xi }_{j}}={({Y_{j}},{X_{j}^{1}},\dots ,{X_{j}^{m}})^{T}}$, where ${Y_{j}}$ is the response and ${\mathbf{X}_{j}}={({X_{j}^{1}},\dots ,{X_{j}^{m}})^{T}}$ is the vector of regressors in the regression model
(3)
\[ {Y_{j}}=g({\mathbf{X}_{j}};{\boldsymbol{\vartheta }^{({\kappa _{j}})}})+{\varepsilon _{j}},\]
where g is a known regression function, ${\boldsymbol{\vartheta }^{(k)}}$ is a vector of unknown regression coefficients in the k-th mixture component, ${\varepsilon _{j}}$ are regression error terms. (In [7] somewhat more general model is considered, in which the regression functions and parameter spaces can be different for different components. In this presentation we restrict ourselves to simplify notation. The main result on JK-consistency can be extended to the general case considered in [7]).
We assume that ${\varepsilon _{j}}$ are independent for different j and, for each j, ${\varepsilon _{j}}$ and ${\mathbf{X}_{j}}$ are conditionally independent given ${\kappa _{j}}$. Let ${F_{X}^{(k)}}$ and ${F_{\varepsilon }^{(k)}}$ be the conditional distributions of ${\mathbf{X}_{j}}$ and ${\varepsilon _{j}}$ given ${O_{j}}\in {\mathcal{P}_{k}}$. We assume that $\operatorname{\mathsf{E}}[{\varepsilon _{j}}\hspace{2.5pt}|\hspace{2.5pt}{\kappa _{j}}=k]=\textstyle\int x{F_{X}^{(k)}}(dx)=0$ for all $k=1,\dots ,M$.
Model (3) is a partial case of model (1)–(2) in which the nuisance parameters are the distributions of regressors and errors for all components, i.e., ${\boldsymbol{\nu }^{(k)}}=({F_{X}^{(k)}},{F_{\varepsilon }^{(k)}})$.
To estimate ${\boldsymbol{\vartheta }^{(k)}}$ in (1)–(2) we apply the technique of generalized estimating equations (GEE) considered in [7]. (On GEE estimation technique and its relations to least squares, maximum likelihood and M-estimators in context of i.i.d. observations, see Section 5.4 in [10].) Let us choose an elementary estimating function $\mathbf{s}:{\mathbb{R}^{D}}\times \Theta \to {\mathbb{R}^{d}}$ such that
(4)
\[ \operatorname{\mathsf{E}}\mathbf{s}({\boldsymbol{\xi }_{(k)}},\boldsymbol{\gamma })=0\hspace{2.5pt}\text{iff}\hspace{2.5pt}\boldsymbol{\gamma }={\boldsymbol{\vartheta }^{(k)}}.\]
So, considering (4) as an equation of $\boldsymbol{\gamma }={({\gamma ^{1}},\dots {\gamma ^{d}})^{T}}\in \Theta \subseteq {\mathbb{R}^{d}}$, we observe that its unique solution is ${\boldsymbol{\vartheta }^{(k)}}$. To obtain an estimator for ${\boldsymbol{\vartheta }^{(k)}}$ we replace $\operatorname{\mathsf{E}}\mathbf{s}({\boldsymbol{\xi }_{(k)}},\boldsymbol{\gamma })$ by its estimator
(5)
\[ {\mathbf{S}_{n}^{(k)}}(\boldsymbol{\gamma })={\sum \limits_{j=1}^{n}}{a_{j;n}^{k}}\mathbf{s}({\boldsymbol{\xi }_{j;n}};\boldsymbol{\gamma }),\]
where ${a_{j;n}^{k}}$ are nonrandom weights satisfying the assumption
(6)
\[ {\sum \limits_{j=1}^{n}}{a_{j;n}^{k}}{p_{j;n}^{m}}=\left\{\begin{array}{l@{\hskip10.0pt}l}1\hspace{1em}& \hspace{2.5pt}\text{if}\hspace{2.5pt}k=m,\\ {} 0\hspace{1em}& \hspace{2.5pt}\text{if}\hspace{2.5pt}k\ne m,\end{array}\right.\hspace{2.5pt}\hspace{2.5pt}\hspace{2.5pt}\text{for all}\hspace{2.5pt}m=1,\dots ,M.\]
Observe that ${\mathbf{S}_{n}^{(k)}}(\boldsymbol{\gamma })$ is an unbiased estimator for $\operatorname{\mathsf{E}}\mathbf{s}({\boldsymbol{\xi }_{(k)}},\boldsymbol{\gamma })$ under (6), i.e., $\operatorname{\mathsf{E}}{\mathbf{S}_{n}^{(k)}}(\boldsymbol{\gamma })=\operatorname{\mathsf{E}}\mathbf{s}({\boldsymbol{\xi }_{(k)}},\boldsymbol{\gamma })$. The GEE estimator to ${\boldsymbol{\vartheta }^{(k)}}$ is any statistics ${\hat{\boldsymbol{\vartheta }}_{n}^{(k)}}$ such that
(7)
\[ {\mathbf{S}_{n}^{(k)}}({\hat{\boldsymbol{\vartheta }}_{n}^{(k)}})=0,\hspace{2.5pt}\text{a.s.}\]
E.g., in the model (3) differentiation of the least squares functional yields the elementary estimating function
(8)
\[ \mathbf{s}(\boldsymbol{\gamma },Y,\mathbf{X})=(Y-g(\mathbf{X},\boldsymbol{\gamma }))\dot{\mathbf{g}}(\mathbf{X},\boldsymbol{\gamma }),\]
where
\[ \dot{\mathbf{g}}(\mathbf{X},\boldsymbol{\gamma })={\left(\frac{\partial g(\mathbf{X},\boldsymbol{\gamma })}{\partial {\boldsymbol{\gamma }^{1}}},\dots \frac{\partial g(\mathbf{X},\boldsymbol{\gamma }}{\partial {\boldsymbol{\gamma }^{d}}}\right)^{T}}.\]
In this paper we consider only the minimax weights ${a_{j;n}^{k}}$, which can be defined as follows. Let ${\mathbf{p}_{;n}}$ be the matrix of all concentrations for all components of the mixture:
\[ {\mathbf{p}_{;n}}=\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}{p_{1;n}^{1}}& \dots & {p_{1;n}^{M}}\\ {} \vdots & \ddots & \vdots \\ {} {p_{n;n}^{1}}& \dots & {p_{n;n}^{M}}\end{array}\right).\]
Then the matrix of all weights ${\mathbf{a}_{;n}}={({a_{j;n}^{m}})_{j=1,\dots ,n,m=1,\dots ,M}}$ is defined as
(9)
\[ {\mathbf{a}_{;n}}={\mathbf{p}_{;n}}{\boldsymbol{\Gamma }_{;n}^{-1}},\]
where ${\boldsymbol{\Gamma }_{;n}}={\mathbf{p}_{;n}^{T}}{\mathbf{p}_{;n}}$. (We assume that $\det {\boldsymbol{\Gamma }_{;n}}\ne 0$). Minimax properties of ${\mathbf{a}_{;n}}$ were discussed in [3]. For one alternative approach to weighting in GEE for MVC, see [4].

3 Jackknife for MVC

Consider the set of parameters ${\boldsymbol{\vartheta }^{(k)}}$, $k=1,\dots ,M$, for different components as one long vector parameter $\boldsymbol{\vartheta }={({({\boldsymbol{\vartheta }^{(1)}})^{T}},\dots ,{({\boldsymbol{\vartheta }^{(M)}})^{T}})^{T}}$ and similarly for the set of estimators ${\hat{\boldsymbol{\vartheta }}_{n}}={({({\hat{\boldsymbol{\vartheta }}_{n}^{(1)}})^{T}},\dots ,{({\hat{\boldsymbol{\vartheta }}_{n}^{(M)}})^{T}})^{T}}$. (Recall that ${\boldsymbol{\vartheta }^{(k)}},{\hat{\boldsymbol{\vartheta }}_{n}^{(k)}}\in {\mathbb{R}^{d}}$.) It was shown in [7] that under suitable assumptions (see Theorem 3 below) the estimator ${\hat{\boldsymbol{\vartheta }}_{;n}}$ is asymptotically normal, i.e.
\[ \sqrt{n}({\hat{\boldsymbol{\vartheta }}_{;n}}-\boldsymbol{\vartheta })\stackrel{\text{W}}{\longrightarrow }N(0,\mathbf{V}).\]
To apply asymptotic normality for hypotheses testing one needs an estimator for the dispersion matrix (asymptotic covariance) V given by (16). Jackknife (JK) is a powerful tool for constructing such estimators. We now consider its modification for the MVC models (cf. [5]). Let
\[ {\mathbf{p}_{;-i,n}}=\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}{p_{1;n}^{1}}& \dots & {p_{1;n}^{M}}\\ {} \vdots & \ddots & \vdots \\ {} {p_{i-1;n}^{1}}& \dots & {p_{i-1;n}^{M}}\\ {} 0& \dots & 0\\ {} {p_{i+1;n}^{1}}& \dots & {p_{i+1;n}^{M}}\\ {} \vdots & \ddots & \vdots \\ {} {p_{n;n}^{1}}& \dots & {p_{n;n}^{M}}\end{array}\right),\]
i.e. ${\mathbf{p}_{;-i,n}}$ is the matrix ${\mathbf{p}_{;n}}$ with the i-th row replaced by the zero row. Then ${\boldsymbol{\Gamma }_{;-i,n}}={\mathbf{p}_{;-i,n}^{T}}{\mathbf{p}_{;-i,n}}$ and
(10)
\[ {\mathbf{a}_{;-i,n}}={\mathbf{p}_{;-i,n}}{\boldsymbol{\Gamma }_{;-i,n}^{-1}}.\]
Let
(11)
\[ {\mathbf{S}_{-in}^{(k)}}(\boldsymbol{\gamma })=\sum \limits_{j\ne i}{a_{j;-i,n}^{k}}\mathbf{s}({\boldsymbol{\xi }_{j;n}};\boldsymbol{\gamma })\]
and define ${\hat{\boldsymbol{\vartheta }}_{-in}^{(k)}}$ as a statistics which satisfy
(12)
\[ {\mathbf{S}_{-in}^{(k)}}({\hat{\boldsymbol{\vartheta }}_{-in}^{(k)}})=0,\hspace{2.5pt}\text{a.s.},\]
${\hat{\boldsymbol{\vartheta }}_{-in}}={({({\hat{\boldsymbol{\vartheta }}_{-in}^{(1)}})^{T}},\dots ,{({\hat{\boldsymbol{\vartheta }}_{-in}^{(M)}})^{T}})^{T}}$. In fact, ${\hat{\boldsymbol{\vartheta }}_{-in}}$ is the GEE estimator for $\boldsymbol{\vartheta }$ calculated by the sample which contains all the observed subjects ${O_{j}}$, except the i-th one. Then the JK estimator for V is
(13)
\[ {\hat{\mathbf{V}}_{n}}=n{\sum \limits_{i=1}^{n}}({\hat{\boldsymbol{\vartheta }}_{-in}}-{\hat{\boldsymbol{\vartheta }}_{n}}){({\hat{\boldsymbol{\vartheta }}_{-in}}-{\hat{\boldsymbol{\vartheta }}_{n}})^{T}}.\]
(On some efficient algorithms for calculation of ${\mathbf{a}_{;-i,n}}$ and ${\hat{\mathbf{V}}_{n}}$ see [5]).

4 Main theorems

In this section we consider asymptotic behavior of ${\hat{\boldsymbol{\vartheta }}_{;n}}$ and ${\hat{\mathbf{V}}_{n}}$ as $n\to \infty $. Note that we do not assume any relationship between the samples $\{{\boldsymbol{\xi }_{j;n}},\hspace{2.5pt}j=1,\dots ,n\}$ for different n. They can be independent or dependent, or a smaller sample can be a part of a larger one. The concentration arrays ${\mathbf{p}_{;n}}$ are also unrelated for different n.
To formulate the theorems we need some notations and assumptions. In what follows we assume that the limit
(14)
\[ {\boldsymbol{\Gamma }_{\infty }}=\underset{n\to \infty }{\lim }\frac{1}{n}{\mathbf{p}_{;n}^{T}}{\mathbf{p}_{;n}}\]
exists and $\det {\boldsymbol{\Gamma }_{\infty }}\ne 0$.
For a vector x, the symbol $|\mathbf{x}|$ means the Euclidean norm. For a matrix A, $|\mathbf{A}|$ is the operator norm. Let $\boldsymbol{\psi }(\mathbf{x},\boldsymbol{\gamma })$ be any function of $\mathbf{x}\in {\mathbb{R}^{D}}$, $\boldsymbol{\gamma }\in \Theta $, maybe vector- or matrix-valued, $h:{\mathbb{R}^{D}}\to \mathbb{R}$, $\rho :\Theta \times \Theta \to \mathbb{R}$. We say that $\boldsymbol{\psi }$ satisfy Condition ${\boldsymbol{\Psi }_{h,\rho }}$ iff, for all ${\boldsymbol{\gamma }_{1}},{\boldsymbol{\gamma }_{2}}\in \Theta $ and all $\mathbf{x}\in {\mathbb{R}^{D}}$,
\[ |\boldsymbol{\psi }(\mathbf{x},{\boldsymbol{\gamma }_{1}})-\boldsymbol{\psi }(\mathbf{x},{\boldsymbol{\gamma }_{2}})|\le h(\mathbf{x})\rho ({\boldsymbol{\gamma }_{1}},{\boldsymbol{\gamma }_{2}}).\]
A set of functions $\{{\boldsymbol{\psi }_{i}},i\in I\}$ satisfy Condition ${\boldsymbol{\Psi }_{h,\rho }}$ if ${\boldsymbol{\psi }_{i}}$ satisfy ${\boldsymbol{\Psi }_{h,\rho }}$ for each $i\in I$.
The following theorem states conditions of consistency of ${\hat{\boldsymbol{\vartheta }}_{n}}$ and consistency of ${\hat{\boldsymbol{\vartheta }}_{-in}}$ uniform by i.
Theorem 1 (Consistency).
Let the following assumptions hold.
  • (C1) Θ is a compact set in ${\mathbb{R}^{d}}$.
  • (C2) Condition ${\boldsymbol{\Psi }_{h,\rho }}$ holds for the elementary estimating functions s with some functions ρ and h.
  • (C3) ρ is a continuous function on $\Theta \times \Theta $ with $\rho (\boldsymbol{\gamma },\boldsymbol{\gamma })=0$ for all $\boldsymbol{\gamma }\in \Theta $.
  • (C4) For all $l=1,\dots ,M$, $\operatorname{\mathsf{E}}|\mathbf{s}({\boldsymbol{\xi }_{(l)}},{\boldsymbol{\vartheta }^{(l)}}){|^{2}}<\infty $ and $\operatorname{\mathsf{E}}{(h({\boldsymbol{\xi }_{(l)}}))^{2}}<\infty $.
  • (C5) $\operatorname{\mathsf{E}}\mathbf{s}({\xi _{(k)}},\boldsymbol{\gamma })=0$ if and only if $\boldsymbol{\gamma }={\boldsymbol{\vartheta }^{(k)}}$.
  • (C6) $\det {\boldsymbol{\Gamma }_{\infty }}>0$.
  • (C7) $\operatorname{\mathsf{P}}\{\exists \boldsymbol{\gamma }\in \Theta \textit{, such that}\hspace{2.5pt}{\mathbf{S}_{n}^{(k)}}(\boldsymbol{\gamma })=0\}\to 1$ as $n\to \infty $.
  • (C7’) $\operatorname{\mathsf{P}}\{\forall i=1,\dots ,n,\exists {\boldsymbol{\gamma }_{i}}\in \Theta \textit{, such that}\hspace{2.5pt}{\mathbf{S}_{-in}^{(k)}}({\boldsymbol{\gamma }_{i}})=0\}\to 1$ as $n\to \infty $.
Then, under the assumptions (C1)–(C7), ${\hat{\boldsymbol{\vartheta }}_{n}^{(k)}}\stackrel{\text{P}}{\longrightarrow }{\boldsymbol{\vartheta }^{(k)}}$ as $n\to \infty $, and under the assumptions (C1)–(C6), (C7’),
(15)
\[ \underset{i=1,\dots ,n}{\sup }|{\hat{\boldsymbol{\vartheta }}_{-in}^{(k)}}-{\boldsymbol{\vartheta }^{(k)}}|\stackrel{\text{P}}{\longrightarrow }0\]
as $n\to \infty $.
Assumptions (C7) and (C7’) claim the existence of GEE solutions with probability tending to 1 as $n\to \infty $. They seem rather imperfect. The following theorem provides conditions under which they hold.
Let $\dot{\mathbf{S}}(\boldsymbol{\gamma })$ be the Jacobian of a vector-valued function S, ${\mathbf{S}_{\infty }}(\boldsymbol{\gamma })=\operatorname{\mathsf{E}}\mathbf{s}({\boldsymbol{\xi }_{(k)}},\boldsymbol{\gamma })$.
Theorem 2 (Existence).
Let the assumptions (C1)–(C6) of Theorem 1 hold and, moreover,
  • (E1) ${\boldsymbol{\vartheta }^{(k)}}$ is an inner point of Θ.
  • (E2) ${\dot{\mathbf{S}}_{\infty }^{(k)}}({\boldsymbol{\vartheta }^{(k)}})$ exists and $\det {\dot{\mathbf{S}}_{\infty }^{(k)}}({\boldsymbol{\vartheta }^{(k)}})\ne 0$.
Then assumptions (C7) and (C7’) of Theorem 1 hold.
To formulate the asymptotic normality result we need some additional notations. Let
\[ {\mathbf{M}^{(k)}}(\boldsymbol{\gamma })=\operatorname{\mathsf{E}}\dot{\mathbf{s}}({\boldsymbol{\xi }_{(k)}},\boldsymbol{\gamma }),\hspace{2.5pt}{\mathbf{M}^{(k)}}={\mathbf{M}^{(k)}}({\boldsymbol{\vartheta }^{(k)}})=\operatorname{\mathsf{E}}\dot{\mathbf{s}}({\boldsymbol{\xi }_{(k)}},{\boldsymbol{\vartheta }^{(k)}}),\]
\[ \langle {\mathbf{a}^{k}}{\mathbf{a}^{m}}{\mathbf{p}^{l}}{\mathbf{p}^{i}}\rangle =\underset{n\to \infty }{\lim }n{\sum \limits_{j=1}^{n}}{a_{j;n}^{k}}{a_{j;n}^{m}}{p_{j;n}^{l}}{p_{j;n}^{i}},\]
\[ \langle {\mathbf{a}^{k}}{\mathbf{a}^{m}}{\mathbf{p}^{l}}\rangle =\underset{n\to \infty }{\lim }n{\sum \limits_{j=1}^{n}}{a_{j;n}^{k}}{a_{j;n}^{m}}{p_{j;n}^{l}}\]
(existence of these limits is a condition in the following Theorem 3). Now
\[\begin{aligned}{}{\mathbf{Z}^{(m,l)}}& ={\sum \limits_{i=1}^{M}}\langle {\mathbf{a}^{m}}{\mathbf{a}^{l}}{\mathbf{p}^{i}}\rangle \operatorname{\mathsf{E}}\mathbf{s}({\boldsymbol{\xi }_{(i)}},{\boldsymbol{\vartheta }^{(m)}})\mathbf{s}{({\boldsymbol{\xi }_{(i)}},{\boldsymbol{\vartheta }^{(l)}})^{T}}\\ {} & \hspace{1em}-{\sum \limits_{{i_{i}},{i_{2}}=1}^{M}}\langle {\mathbf{a}^{m}}{\mathbf{a}^{l}}{\mathbf{p}^{{i_{1}}}}{\mathbf{p}^{{i_{2}}}}\rangle \operatorname{\mathsf{E}}\mathbf{s}({\boldsymbol{\xi }_{({i_{1}})}},{\boldsymbol{\vartheta }^{(m)}})\operatorname{\mathsf{E}}\mathbf{s}{({\boldsymbol{\xi }_{({i_{2}})}},{\boldsymbol{\vartheta }^{(l)}})^{T}},\end{aligned}\]
\[ {\mathbf{V}^{(m,l)}}={({\mathbf{M}^{(m)}})^{-1}}{\mathbf{Z}^{(m,l)}}{({\mathbf{M}^{(m)}})^{-T}}\]
(here and below ${\mathbf{M}^{-T}}={({\mathbf{M}^{-1}})^{T}}$). Let’s pack all the matrices ${\mathbf{V}^{(m,l)}}$ into one $(Md)\times (Md)$ matrix
(16)
\[ \mathbf{V}=\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}{\mathbf{V}^{(1,1)}}& \dots & {\mathbf{V}^{(1,M)}}\\ {} \vdots & \ddots & \vdots \\ {} {\mathbf{V}^{(M,1)}}& \dots & {\mathbf{V}^{(M,M)}}\end{array}\right).\]
Let $\mathbf{s}(\mathbf{x},\boldsymbol{\gamma })={({s^{1}}(\mathbf{x},\boldsymbol{\gamma }),\dots ,{s^{d}}(\mathbf{x},\boldsymbol{\gamma }))^{T}}$.
Theorem 3 (Asymptotic normality).
Let the following assumptions hold.
  • (AN1) $\boldsymbol{\vartheta }$ is an inner point of ${\Theta ^{M}}=\Theta \times \cdots \times \Theta $.
  • (AN2) There exists an open ball B centered in $\boldsymbol{\vartheta }$, such that the derivatives
    \[ \frac{{\partial ^{2}}{s^{l}}(\mathbf{x},\boldsymbol{\gamma })}{\partial {\gamma ^{i}}\partial {\gamma ^{j}}}\]
    exist for all $\boldsymbol{\gamma }={({\gamma ^{1}},\dots ,{\gamma ^{d}})^{T}}\in B$, all $l,i,j=1,\dots ,d$, and almost all x (w.r.t. all ${F^{(k)}}$, $k=1,\dots ,M$).
  • (AN3) There exists a function $h:{\mathbb{R}^{D}}\to \mathbb{R}$ such that
    \[ \underset{l,i,j}{\max }\underset{\boldsymbol{\gamma }\in B}{\sup }\left|\frac{{\partial ^{2}}{s^{l}}(\mathbf{x},\boldsymbol{\gamma })}{\partial {\gamma ^{i}}\partial {\gamma ^{j}}}\right|\le h(x)\]
    and $\operatorname{\mathsf{E}}{(h({\boldsymbol{\xi }_{(k)}}))^{\alpha }}<\infty $ for some $\alpha >1$ for all $k=1,\dots M$.
  • (AN4) $\operatorname{\mathsf{E}}|\mathbf{s}({\boldsymbol{\xi }_{(k)}},{\boldsymbol{\vartheta }^{(k)}}){|^{2}}<\infty $ for all $k=1,\dots M$.
  • (AN5) ${\mathbf{M}^{(k)}}$ are finite and nonsingular for all $k=1,\dots ,M$.
  • (AN6) The limits $\langle {\mathbf{a}^{k}}{\mathbf{a}^{m}}{\mathbf{p}^{i}}{\mathbf{p}^{l}}\rangle $ exist for all $k,m,i,l=1,\dots ,M$.
  • (AN7) Matrix ${\boldsymbol{\Gamma }_{\infty }}$ exists and is nonsingular.
  • (AN8) ${\hat{\boldsymbol{\vartheta }}_{n}}$ exists and is a consistent estimator for $\boldsymbol{\vartheta }$.
Then
\[ \sqrt{n}({\hat{\boldsymbol{\vartheta }}_{n}}-\boldsymbol{\vartheta })\stackrel{\text{W}}{\longrightarrow }N(0,\mathbf{V})\]
as $n\to \infty $.
(Note that in Theorems 1–4 and the lemmas in Section 7 below the functions h can be different).
In fact, Theorem 3 is just Theorem 2 from [7] reformulated in terms of the present paper.
Now we are ready to formulate the theorem on consistency of the JK estimator of V.
Theorem 4.
Assume that assumptions (AN1), (AN5), (AN6), (AN7) of Theorem 3 hold and, moreover:
  • (JK1) There exists a function $h:{\mathbb{R}^{D}}\to \mathbb{R}$ such that
    \[ \underset{\boldsymbol{\gamma }\in \Theta }{\sup }|\mathbf{s}(\mathbf{x},\boldsymbol{\gamma })|\le h(\mathbf{x}),\hspace{2.5pt}\underset{\boldsymbol{\gamma }\in \Theta }{\sup }|\dot{\mathbf{s}}(\mathbf{x},\boldsymbol{\gamma })|\le h(\mathbf{x}),\]
    \[ \underset{l,i,j}{\max }\underset{\boldsymbol{\gamma }\in B}{\sup }\left|\frac{{\partial ^{2}}{s^{l}}(\mathbf{x},\boldsymbol{\gamma })}{\partial {\gamma ^{i}}\partial {\gamma ^{j}}}\right|\le h(x)\]
    and for some $\alpha >4$,
    \[ \operatorname{\mathsf{E}}{(h({\boldsymbol{\xi }_{(l)}}))^{\alpha }}<\infty ,\]
  • (JK2) ${\hat{\boldsymbol{\vartheta }}_{n}}$ is a $\sqrt{n}$-consistent estimator of $\boldsymbol{\vartheta }$,
  • (JK3) ${\sup _{i=1,\dots ,n}}|{\hat{\boldsymbol{\vartheta }}_{-in}}-\boldsymbol{\vartheta }|\stackrel{\text{P}}{\longrightarrow }0$ as $n\to \infty $.
Then ${\hat{\mathbf{V}}_{n}}\stackrel{\text{P}}{\longrightarrow }\mathbf{V}$ as $n\to \infty $.
Example.
Let the observed data be ${\boldsymbol{\xi }_{j}}={({Y_{j}},{X_{j}})^{T}}$, $j=1,\dots ,n$, where dependence between ${X_{j}}$ and ${Y_{j}}$ is described by the regression model (3) with
(17)
\[ g({X_{j}},{\boldsymbol{\vartheta }^{(k)}})=\frac{1}{1+\exp (-{\boldsymbol{\vartheta }_{0}^{(k)}}-{\boldsymbol{\vartheta }_{1}^{(k)}}{X_{j}})},\]
where ${\boldsymbol{\vartheta }^{(k)}}={({\boldsymbol{\vartheta }_{0}^{(k)}},{\boldsymbol{\vartheta }_{1}^{(k)}})^{T}}$ are the vectors of regression coefficients for the k-th mixture component.
Assume that $\boldsymbol{\gamma }\in \Theta $, where Θ is a compact set in ${\mathbb{R}^{2}}$. Then for the elementary estimating function s defined by (8) we obtain
\[ |\mathbf{s}({\boldsymbol{\xi }_{j}},\boldsymbol{\gamma })|\le C(1+|{X_{j}}|)(1+|{\varepsilon _{j}}|)\hspace{2.5pt}\text{and}\hspace{2.5pt}|\frac{{\partial ^{2}}}{\partial {\gamma ^{i}}{\gamma ^{i}}}{s^{l}}({\boldsymbol{\xi }_{j}},\boldsymbol{\gamma })|\le C(1+|{X_{j}}{|^{3}})(1+|{\varepsilon _{j}}|),\]
where $C<\infty $ is some constant. So Assumption (JK1) holds if $\operatorname{\mathsf{E}}[{({\varepsilon _{j}})^{4}}\hspace{2.5pt}|\hspace{2.5pt}{\kappa _{j}}=k]<\infty $ and $\operatorname{\mathsf{E}}[|{X_{j}}{|^{12}}\hspace{2.5pt}|\hspace{2.5pt}{\kappa _{j}}=k]<\infty $ for all $k=1,\dots ,M$. Assumption (AN5) holds if
(18)
\[ \operatorname{\mathsf{Var}}[{X_{j}}\hspace{2.5pt}|\hspace{2.5pt}{\kappa _{j}}=k]>0.\]
Assumption (C5) also holds under (18), see Theorem 2 in [8]. So, under rather mild assumptions Theorems 1–4 hold for generalized least squares estimator in this model. In [7] confidence sets for ${\boldsymbol{\vartheta }^{(k)}}$ are constructed based on the asymptotic normality of ${\hat{\boldsymbol{\vartheta }}_{n}^{(k)}}$ and consistency of ${\hat{\mathbf{V}}_{n}^{(k,k)}}$. Namely, the confidence ellipsoid for ${\boldsymbol{\vartheta }^{(k)}}$ is defined as
\[ {B_{\alpha ,n}}=\{\boldsymbol{\gamma }\in {\mathbb{R}^{d}}:\hspace{2.5pt}{(\boldsymbol{\gamma }-{\boldsymbol{\vartheta }^{(k)}})^{T}}{({\hat{\mathbf{V}}_{n}^{(k,k)}})^{-1}}(\boldsymbol{\gamma }-{\boldsymbol{\vartheta }^{(k)}})\le {Q^{\eta }}(1-\alpha )\},\]
where ${Q^{\eta }}(1-\alpha )$ is the quantile of level $1-\alpha $ of the ${\chi ^{2}}$-distribution with d degrees of freedom. Then under the assumptions of Theorems 1–4, if $\det {\mathbf{Z}^{(k,k)}}\ne 0$,
\[ \underset{n\to \infty }{\lim }\operatorname{\mathsf{P}}\{{\boldsymbol{\vartheta }^{(k)}}\in {B_{\alpha ,n}}\}=1-\alpha .\]
In [7] results of simulations are presented which show that these ellipsoids can be used for samples large enough.

5 Application to sociological data

In this section we show how the considered technique can be applied to the statistical analysis of real life data. In many sociological problems one deals with two sets of data from two different sources. The first (I) set consists of individual records with values of variables which present personal information of investigated persons. The second (A) set contain some averaged information on large groups of a variable of investigated persons which is not presented in the I-set. The problem is how to merge information from A- and I-sets to infer on the model involving variables from both sets.
We consider as the I-set a data of results of Ukrainian External Independent Testing (EIT) in 2016 from the official site of Ukrainian Center for Educational Quality Assessment. EIT exams are to be passed by high school graduates for admission to universities. Information on scores in Ukrainian language and literature (Ukr) and on Mathematics (Math) for nearly 246 000 examinees of EIT-2016 is available.1 In [5] linear regression dependence between Ukr and Math is assumed. In this paper ve consider the model
(19)
\[ \text{Ukr}=\frac{1}{1+\exp (-{\boldsymbol{\vartheta }_{0}^{(k)}}-{\boldsymbol{\vartheta }_{1}^{(k)}}\text{Math})}+\varepsilon ,\]
in which the coefficients ${\boldsymbol{\vartheta }_{0}^{(k)}}$ and ${\boldsymbol{\vartheta }_{1}^{(k)}}$ depend on the political attitudes of the adult environment in which the student was brought up. There can be a family of Ukrainian independence adherents or an environment critical to existence of Ukrainian state and culture.
vmsta208_g001.jpg
Fig. 1.
Estimated logistic regression lines by EIT-2016 data. Solid line for 1st component, dashed line for 2nd, dotted line for 3rd one
EIT-2016 does not contain information on political issues. But for each examine the region of Ukraine is recorded where he/she graduated. So we used data on results of Ukrainian Parliament (Verhovna Rada) elections-2014 to get approximate proportions of adherents of different political choices in regions of Ukraine (A-set). All possible electoral choices at these elections (voting for one of parties, voting against all or not to take part in the voting) were divided into three groups (components): (1) pro-Ukrainian, (2) contra-Ukrainian and (3) neutral (see [5] for details). The concentrations of components are taken as frequencies of adherents of corresponding electoral choice at the region where j-th examinee attended high school. On Fig. 1 the fitted regression lines are presented. The dependence between Math and Ukr on this picture seems significantly different in the three components. Say, in the pro component it is increasing and seemingly nonlinear, in the contra component it is decreasing, in the neutral one it is increasing and quite near to linear dependence.
vmsta208_g002.jpg
Fig. 2.
Confidence ellipsoids for logistic regression parameters by EIT-2016 data
To verify significance of these differences we constructed the confidence ellipsoids for the parameters as described in Section 4. By the Bonferroni rule, to infer with the significance level ${\alpha _{0}}=0.05$, we took the levels of the ellipsoids $\alpha ={\alpha _{0}}/3=0.01666$. Obtained ellipsoids are presented on Fig. 2. Since they are not intersecting, we conclude that the differences between the parameters are significant for all the components.

6 Conclusion

So, we obtained conditions under which the JK estimator for asymptotic variance is valid for nonlinear GEE estimators in MVC models. The presented example of sociological data analysis demonstrates possibilities of practical applications of this estimator.

7 Proofs

We start from some auxiliary lemmas.
Lemma 1.
Assume that ${\boldsymbol{\Gamma }_{\infty }}$ exists and is nonsingular. Then for some constant ${C_{a}}<\infty $,
\[ |{a_{j;n}^{k}}|\le \frac{{C_{a}}}{n},\hspace{2.5pt}|{a_{j;n}^{k}}-{a_{j;-in}^{k}}|\le \frac{{C_{a}}}{{n^{2}}}\]
for all $k=1,\dots ,M$, $1\le i\ne j\le n$, $n=1,2,\dots \hspace{0.1667em}$.
For the proof, see [5], lemma 1.
Let ${\psi _{j;n}}$, $j=1,\dots ,n$, $n=1,2,\dots \hspace{0.1667em}$, be any set of functions with domain ${\mathbb{R}^{D}}$ and values in a set of scalars, or vectors, or matrices. We use the following notation
\[ {\Psi _{n}^{(k)}}(\boldsymbol{\gamma })={\sum \limits_{j=1}^{n}}{a_{j;n}^{k}}{\psi _{j;n}}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma }),\]
\[ {\Psi _{-in}^{(k)}}(\boldsymbol{\gamma })=\sum \limits_{j\ne i}{a_{j;-in}^{k}}{\psi _{j;n}}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma }).\]
In this notation ψ can be replaced by any other symbol, e.g., $\tilde{\psi }$, s or $\dot{\mathbf{s}}$.
Lemma 2.
Let $H:{\mathbb{R}^{D}}\to \mathbb{R}$ be some function, such that
  • (i) ${\max _{j,n}}{\sup _{\boldsymbol{\gamma }\in \Theta }}|{\psi _{j;n}}(x,\boldsymbol{\gamma })|\le H(x)$,
  • (ii) for some $\alpha >1$ and all $k=1,\dots ,M$, $\operatorname{\mathsf{E}}{(H({\boldsymbol{\xi }_{(k)}}))^{\alpha }}<\infty $.
Then
\[ \underset{j=1,\dots ,n}{\max }\underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\psi _{j;n}}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma })|={o_{P}}({n^{1/\alpha }}).\]
Proof.
Let
\[ {\tilde{\psi }_{j;n}}=\underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\psi _{j;n}}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma })|.\]
Then
(20)
\[ {\mathbb{P}_{n}}(C)=\operatorname{\mathsf{P}}\{\underset{j=1,\dots ,n}{\max }{\tilde{\psi }_{j;n}}>C\}=1-{\prod \limits_{j=1}^{n}}(1-\operatorname{\mathsf{P}}\{{\tilde{\psi }_{j;n}}>C\}).\]
Put ${C_{n}}={C_{0}}{n^{1/\alpha }}$ for some ${C_{0}}>0$. Then
(21)
\[ {\tilde{p}_{n}}=\underset{j=1,\dots ,n}{\sup }\operatorname{\mathsf{P}}\{{\tilde{\psi }_{j;n}}>{C_{n}}\}=o(1/n).\]
Really,
\[ \operatorname{\mathsf{P}}\{{\tilde{\psi }_{j;n}}>{C_{n}}\}\le \operatorname{\mathsf{P}}\{H({\boldsymbol{\xi }_{j;n}})>{C_{n}}\}\]
\[ \le {\sum \limits_{k=1}^{M}}{p_{j;n}^{k}}\operatorname{\mathsf{P}}\{H({\boldsymbol{\xi }_{(k)}})>{C_{n}}\}\le {\sum \limits_{k=1}^{M}}\operatorname{\mathsf{P}}\{H({\boldsymbol{\xi }_{(k)}})>{C_{n}}\}.\]
So, to show (21) one needs only to observe that
(22)
\[ n\operatorname{\mathsf{P}}\{H({\boldsymbol{\xi }_{(k)}})>{C_{n}}\}=o(1)\hspace{2.5pt}\text{for all}\hspace{2.5pt}k=1,\dots M.\]
But
\[ n\operatorname{\mathsf{P}}\{H({\boldsymbol{\xi }_{(k)}})>{C_{n}}\}=n\operatorname{\mathsf{E}}\mathbf{1}\{H({\boldsymbol{\xi }_{(k)}})>{C_{n}}\}\]
\[ \le n\operatorname{\mathsf{E}}\mathbf{1}\{H({\boldsymbol{\xi }_{(k)}})>{C_{n}}\}\frac{H{({\boldsymbol{\xi }_{(k)}})^{\alpha }}}{{C_{n}^{\alpha }}}\le \frac{1}{{C_{0}^{\alpha }}}\operatorname{\mathsf{E}}H{({\boldsymbol{\xi }_{(k)}})^{\alpha }}\mathbf{1}\{H({\boldsymbol{\xi }_{(k)}})>{C_{n}}\}=o(1)\]
due to the assumption (ii) of the lemma. So (21) holds. From (20) and (21) we obtain
\[ {\mathbb{P}_{n}}(C)\le 1-\exp (n\log (1-{\tilde{p}_{n}}))=1-\exp (-n\cdot o(1/n))=o(1)\]
for any ${C_{0}}>0$.  □
Lemma 3.
Let the following assumptions hold.
  • (i) $\det {\Gamma _{\infty }}\ne 0$.
  • (ii) A set of functions ${\psi _{j;n}}$, $j=1,\dots ,n$, $n\in \mathbb{N}$, satisfy Condition ${\boldsymbol{\Psi }_{h,\rho }}$ with a function h such that $\operatorname{\mathsf{E}}(h{({\boldsymbol{\xi }_{(k)}})^{2}}<\infty $ for all $k=1,\dots ,M$.
Then there exists a sequence of random variables ${\zeta _{n}}={O_{p}}(1)$, such that for all $n=1,2,\dots \hspace{0.1667em}$, all ${\gamma _{1}},{\gamma _{2}}\in \Theta $ and all $i=1,\dots n$ the following inequalities hold:
(23)
\[ |{\Psi _{n}^{(k)}}({\boldsymbol{\gamma }_{1}})-{\Psi _{n}^{(k)}}({\boldsymbol{\gamma }_{2}})|\le {\zeta _{n}}\rho ({\boldsymbol{\gamma }_{1}},{\boldsymbol{\gamma }_{2}}),\]
(24)
\[ |{\Psi _{-in}^{(k)}}({\boldsymbol{\gamma }_{1}})-{\Psi _{-in}^{(k)}}({\boldsymbol{\gamma }_{2}})|\le {\zeta _{n}}\rho ({\boldsymbol{\gamma }_{1}},{\boldsymbol{\gamma }_{2}}).\]
Proof.
Let us start with (23). Observe that by Condition ${\boldsymbol{\Psi }_{h,\rho }}$ and Lemma 1,
\[ |{\Psi _{n}^{(k)}}({\boldsymbol{\gamma }_{1}})-{\Psi _{n}^{(k)}}({\boldsymbol{\gamma }_{2}})|\le {\sum \limits_{j=1}^{n}}|{a_{j;n}^{k}}|h({\boldsymbol{\xi }_{j;n}})\rho ({\boldsymbol{\gamma }_{1}},{\boldsymbol{\gamma }_{2}})\le \frac{{C_{a}}\rho ({\boldsymbol{\gamma }_{1}},{\boldsymbol{\gamma }_{2}})}{n}{\sum \limits_{j=1}^{n}}h({\boldsymbol{\xi }_{j;n}}).\]
By assumption (ii) of the lemma
\[ {A_{1}}=\underset{j,n}{\max }\operatorname{\mathsf{E}}h({\boldsymbol{\xi }_{j;n}})\le \underset{k=1,\dots ,M}{\max }\operatorname{\mathsf{E}}h({\boldsymbol{\xi }_{(k)}})<\infty \]
and ${A_{2}}=\operatorname{\mathsf{E}}{(h({\boldsymbol{\xi }_{j;n}}))^{2}}<\infty $. So, for any $\lambda >{A_{1}}$,
\[ \operatorname{\mathsf{P}}\left\{\frac{1}{n}{\sum \limits_{j=1}^{n}}h({\xi _{j;n}})>\lambda \right\}\le \frac{\operatorname{\mathsf{Var}}\frac{1}{n}{\textstyle\textstyle\sum _{j=1}^{n}}h({\xi _{j;n}})}{{\left(\lambda -\operatorname{\mathsf{E}}\frac{1}{n}{\textstyle\textstyle\sum _{j=1}^{n}}h({\xi _{j;n}})\right)^{2}}}\le \frac{{A_{2}}}{n{(\lambda -{A_{1}})^{2}}}\to 0\]
as $n\to \infty $. So (23) holds with ${\zeta _{n}}=\frac{{C_{a}}}{n}{\textstyle\sum _{j=1}^{n}}h({\xi _{j;n}})$.
To show (24) observe that
\[ |{\Psi _{-in}^{(k)}}({\boldsymbol{\gamma }_{1}})-{\Psi _{-in}^{(k)}}({\boldsymbol{\gamma }_{2}})|\le {\sum \limits_{j=1}^{n}}|{a_{j;-in}^{k}}|h({\boldsymbol{\xi }_{j;n}})\rho ({\boldsymbol{\gamma }_{1}},{\boldsymbol{\gamma }_{2}})\le \frac{{C_{a}}\rho ({\boldsymbol{\gamma }_{1}},{\boldsymbol{\gamma }_{2}})}{n}{\sum \limits_{j=1}^{n}}h({\boldsymbol{\xi }_{j;n}})\]
and
\[ |{a_{j;-in}^{k}}|\le |{a_{j;n}}|+|{a_{j;n}}-{a_{j;-in}^{k}}|\le \frac{{C^{\prime }_{a}}}{n}\]
for some ${C^{\prime }_{a}}<\infty $ due to Lemma 1. The rest of the proof is the same as for (23).  □
Lemma 4.
Let the following assumptions hold.
  • (i) Θ is a compact in ${\mathbb{R}^{d}}$.
  • (ii) Condition ${\boldsymbol{\Psi }_{h,\rho }}$ holds for $\{{\psi _{j;n}},\hspace{2.5pt}j=1,\dots ,n,\hspace{2.5pt}n=1,2,\dots \hspace{0.1667em}\}$.
  • (iii) ρ is a continuous function on $\Theta \times \Theta $ and $\rho (\boldsymbol{\gamma },\boldsymbol{\gamma })=0$ for all $\boldsymbol{\gamma }\in \Theta $.
  • (iv) For all $k=1,\dots ,M$, $\operatorname{\mathsf{E}}{(h({\boldsymbol{\xi }_{(k)}}))^{2}}<\infty $, ${\max _{j,n}}\operatorname{\mathsf{E}}|{\psi _{j;n}}({\boldsymbol{\xi }_{(k)}},{\boldsymbol{\vartheta }^{(k)}}){|^{2}}<\infty $.
  • (v) $\det {\Gamma _{\infty }}\ne 0$.
Then
(25)
\[ \underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\Psi _{n}^{(k)}}(\boldsymbol{\gamma })-\operatorname{\mathsf{E}}{\Psi _{n}^{(k)}}(\boldsymbol{\gamma })|\stackrel{\text{P}}{\longrightarrow }0,\]
(26)
\[ \underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\Psi _{-in}^{(k)}}(\boldsymbol{\gamma })-\operatorname{\mathsf{E}}{\Psi _{-in}^{(k)}}(\boldsymbol{\gamma })|\stackrel{\text{P}}{\longrightarrow }0\]
as $n\to \infty $.
Proof.
Let ${\bar{\psi }_{j;n}}(\boldsymbol{\gamma })=\operatorname{\mathsf{E}}{\psi _{j;n}}({\boldsymbol{\xi }_{j;n}}\boldsymbol{\gamma })$,
\[ {\tilde{\psi }_{j;n}}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma })={\psi _{j;n}}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma })-{\bar{\psi }_{j;n}}(\boldsymbol{\gamma }).\]
Then ${\tilde{\psi }_{j;n}}$ satisfy Condition ${\boldsymbol{\Psi }_{\tilde{h},\rho }}$ with $\tilde{h}(\mathbf{x})=h(\mathbf{x})+{C_{\Psi }}$, where ${C_{\Psi }}$ is some constant, e.g.,
\[ {C_{\Psi }}=\underset{k=1,\dots ,M}{\max }\operatorname{\mathsf{E}}h({\boldsymbol{\xi }_{(k)}})<\infty .\]
By the assumption (iv),
(27)
\[ \operatorname{\mathsf{E}}{(\tilde{h}({\boldsymbol{\xi }_{(k)}}))^{2}}<\infty \]
and
(28)
\[ \underset{j,n}{\max }\operatorname{\mathsf{E}}|{\tilde{\psi }_{j;n}}({\boldsymbol{\xi }_{(k)}},{\boldsymbol{\vartheta }^{(k)}}){|^{2}}<\infty ,\text{for all}\hspace{2.5pt}k=1,\dots ,M.\]
To prove the lemma is sufficient to show that
(29)
\[ \underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\tilde{\Psi }_{n}^{(k)}}(\boldsymbol{\gamma })|\stackrel{\text{P}}{\longrightarrow }0\]
and
(30)
\[ \underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\tilde{\Psi }_{-in}^{(k)}}(\boldsymbol{\gamma })|\stackrel{\text{P}}{\longrightarrow }0\]
as $n\to \infty $.
Let us show (29). Consider the case when ${\psi _{j;n}}$ are scalar-valued. Then
(31)
\[ \operatorname{\mathsf{E}}{\tilde{\Psi }_{n}^{(k)}}(\boldsymbol{\gamma })={\sum \limits_{j=1}^{n}}{a_{j;n}^{k}}\operatorname{\mathsf{E}}{\tilde{\psi }_{j;n}}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma })=0\]
and
(32)
\[ \operatorname{\mathsf{Var}}{\tilde{\Psi }_{n}^{(k)}}(\boldsymbol{\gamma })={\sum \limits_{j=1}^{n}}{({a_{j;n}^{k}})^{2}}\operatorname{\mathsf{Var}}{\tilde{\psi }_{j;n}}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma })\le \frac{{C_{a}^{2}}}{n}\underset{j,n}{\max }\operatorname{\mathsf{Var}}{\tilde{\psi }_{j;n}}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma }).\]
Assumptions (iii) and (iv) with inequalities (27) and (28) imply
\[ \underset{j,n}{\max }\operatorname{\mathsf{Var}}{\tilde{\psi }_{j;n}}({\boldsymbol{\xi }_{j;n}},,\boldsymbol{\gamma })<\infty .\]
So, by (31) and (32), we get
(33)
\[ {\tilde{\Psi }_{n}^{(k)}}(\boldsymbol{\gamma })\stackrel{\text{P}}{\longrightarrow }0\hspace{2.5pt}\text{as}\hspace{2.5pt}n\to \infty \]
for any $\boldsymbol{\gamma }\in \Theta $.
It is obvious that if ${\psi _{j;n}}$ are vector- or matrix-valued, then (33) holds coordinatewise.
Applying Lemma 3 to ${\tilde{\psi }_{j;n}}$ one obtains
(34)
\[ |{\tilde{\Psi }_{n}^{(k)}}({\boldsymbol{\gamma }_{1}})-{\tilde{\Psi }_{n}^{(k)}}({\boldsymbol{\gamma }_{2}})|\le {\zeta _{n}}\rho ({\boldsymbol{\gamma }_{1}},{\boldsymbol{\gamma }_{1}})\hspace{2.5pt}\text{for all}\hspace{2.5pt}{\boldsymbol{\gamma }_{1}},{\boldsymbol{\gamma }_{2}}\in \Theta \]
with some ${\zeta _{n}}={O_{P}}(1)$. To prove (29) we have to show that for any $\delta >0$ and $\varepsilon >0$ there exists such ${n_{0}}$ that for all $n>{n_{0}}$
(35)
\[ \operatorname{\mathsf{P}}\{\underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\tilde{\Psi }_{n}^{(k)}}(\boldsymbol{\gamma })|>\delta \}<\varepsilon .\]
Fix ε and δ. Choose a nonrandom ${C_{\zeta }}<\infty $ such that $\operatorname{\mathsf{P}}\{{\zeta _{n}}>{C_{\zeta }}\}\le \varepsilon /2$ for all n.
Since Θ is compact by assumption (iii) of the lemma there exists a finite set $T=\{{\mathbf{t}_{1}},\dots ,{\mathbf{t}_{L}}\}\subset \Theta $ such that for each $\boldsymbol{\gamma }\in \Theta $ one can choose $l(\boldsymbol{\gamma })\in \{1,\dots ,L\}$ with
(36)
\[ \rho (\boldsymbol{\gamma },{\mathbf{t}_{l(\boldsymbol{\gamma })}})<\frac{\delta }{2{C_{\zeta }}}.\]
Note that
\[ |{\tilde{\Psi }_{n}^{(k)}}(\boldsymbol{\gamma })|\le |{\tilde{\Psi }_{n}^{(k)}}(\boldsymbol{\gamma })-{\tilde{\Psi }_{n}^{(k)}}({\mathbf{t}_{l(\boldsymbol{\gamma })}})|+|{\tilde{\Psi }_{n}^{(k)}}({\mathbf{t}_{l(\boldsymbol{\gamma })}})|,\]
so
\[ \underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\tilde{\Psi }_{n}^{(k)}}(\boldsymbol{\gamma })|\le \underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\tilde{\Psi }_{n}^{(k)}}(\boldsymbol{\gamma })-{\tilde{\Psi }_{n}^{(k)}}({\mathbf{t}_{l(\boldsymbol{\gamma })}}|+\underset{l=1,\dots .L}{\max }|{\tilde{\Psi }_{n}^{(k)}}({\mathbf{t}_{l}})|\]
\[ \le {\zeta _{n}}\underset{\boldsymbol{\gamma }\in \Theta }{\sup }\rho (\boldsymbol{\gamma },{\mathbf{t}_{l(\boldsymbol{\gamma })}})+\underset{l=1,\dots .L}{\max }|{\tilde{\Psi }_{n}^{(k)}}({\mathbf{t}_{l}})|.\]
Therefore
(37)
\[ \operatorname{\mathsf{P}}\{\underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\tilde{\Psi }_{n}^{(k)}}(\boldsymbol{\gamma })>\delta \}\le \operatorname{\mathsf{P}}\{{\zeta _{n}}\underset{\boldsymbol{\gamma }\in \Theta }{\sup }\rho (\boldsymbol{\gamma },{\mathbf{t}_{l(\boldsymbol{\gamma })}})>\delta /2\}+\operatorname{\mathsf{P}}\{\underset{l=1,\dots ,L}{\max }|{\tilde{\Psi }_{n}^{(k)}}({\mathbf{t}_{l}})|>\delta /2\}.\]
The second term in the RHS of (37) tends to 0 as $n\to \infty $ due to (33). So it is less then $\varepsilon /2$ for n large enough. By (36) the first term is less then $\operatorname{\mathsf{P}}\{{\zeta _{n}}>{C_{\zeta }}\}<\varepsilon /2$.
So (35) holds and (25) is shown.
Let us show (26). Observe that
\[ {\tilde{\Psi }_{n}^{(k)}}(\boldsymbol{\gamma })-{\tilde{\Psi }_{-in}^{(k)}}(\boldsymbol{\gamma })={a_{i;n}^{k}}{\tilde{\psi }_{i}}({\boldsymbol{\xi }_{i,n}},\boldsymbol{\gamma })+\sum \limits_{j\ne i}({a_{j;n}^{k}}-{a_{j;-in}^{k}}){\tilde{\psi }_{j;n}}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma }).\]
To estimate ${\max _{j,n}}{\sup _{\boldsymbol{\gamma }\in \Theta }}|{\tilde{\psi }_{j;n}}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma })|$ we apply Lemma 2 with $H(x)=|x|$. Assumption (iv) of Lemma 4 implies that the assertion of Lemma 2 holds with $\alpha =2$. So
\[ \underset{j,n}{\max }\underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\tilde{\psi }_{j;n}}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma })|={O_{P}}({n^{1/2}}).\]
From Lemma 1 we get
\[ |{a_{i;n}^{k}}|\le {C_{a}}/n,\hspace{2.5pt}|{a_{j;n}^{k}}-{a_{j;-in}^{k}}|\le {C_{a}}/{n^{2}},\]
so
\[ \underset{i=1,\dots ,n}{\max }\underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\tilde{\Psi }_{n}^{(k)}}(\boldsymbol{\gamma })-{\tilde{\Psi }_{-in}^{(k)}}(\boldsymbol{\gamma })|={O_{P}}({n^{-1/2}}).\]
This with (25) implies (26).  □
Proof of Theorem 1.
1. We will show that ${\hat{\boldsymbol{\vartheta }}_{n}^{(k)}}\stackrel{\text{P}}{\longrightarrow }{\boldsymbol{\vartheta }^{(k)}}$ as $n\to \infty $.
Let ${\mathbf{S}_{\infty }^{(k)}}(\boldsymbol{\gamma })=\operatorname{\mathsf{E}}\mathbf{s}({\boldsymbol{\xi }_{(k)}},\boldsymbol{\gamma })$. Assumptions (C3) and (C4) imply that ${\mathbf{S}_{\infty }^{(k)}}(\boldsymbol{\gamma })$ is continuous on $\boldsymbol{\gamma }\in \Theta $. By (C5), $|{\mathbf{S}_{\infty }^{(k)}}(\boldsymbol{\gamma })|>0$ for all $\boldsymbol{\gamma }\ne {\boldsymbol{\vartheta }^{(k)}}$.
Fix any $\varepsilon >0$ and consider ${\mathcal{N}_{\varepsilon }}=\{\boldsymbol{\gamma }\in \Theta :\hspace{2.5pt}|\boldsymbol{\gamma }-{\boldsymbol{\vartheta }^{(k)}}|\ge \varepsilon \}$. Then
\[ {s_{\text{min}}}=\underset{\boldsymbol{\gamma }\in {\mathcal{N}_{\varepsilon }}}{\inf }|{\mathbf{S}_{\infty }^{(k)}}(\boldsymbol{\gamma })|>0.\]
For $\boldsymbol{\gamma }\in {\mathcal{N}_{\varepsilon }}$
\[ |{\mathbf{S}_{n}^{(k)}}(\boldsymbol{\gamma })|\ge |{\mathbf{S}_{\infty }^{(k)}}(\boldsymbol{\gamma })|-|{\mathbf{S}_{\infty }^{(k)}}(\boldsymbol{\gamma })-{\mathbf{S}_{n}^{(k)}}(\boldsymbol{\gamma })|\]
\[ \ge {s_{\text{min}}}-\underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\mathbf{S}_{\infty }^{(k)}}(\boldsymbol{\gamma })-{\mathbf{S}_{n}^{(k)}}(\boldsymbol{\gamma })|.\]
So
\[ \operatorname{\mathsf{P}}\left\{\underset{\boldsymbol{\gamma }\in {\mathcal{N}_{\varepsilon }}}{\inf }|{\mathbf{S}_{n}^{(k)}}(\boldsymbol{\gamma })|\le \frac{{s_{\text{min}}}}{2}\right\}\le \operatorname{\mathsf{P}}\left\{\underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\mathbf{S}_{\infty }^{(k)}}(\boldsymbol{\gamma })-{\mathbf{S}_{n}^{(k)}}(\boldsymbol{\gamma })|\ge \frac{{s_{\text{min}}}}{2}\right\}.\]
Applying Lemma 4 with ${\psi _{j;n}}=\mathbf{s}$ one obtains that
(38)
\[ \underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\mathbf{S}_{\infty }^{(k)}}(\boldsymbol{\gamma })-{\mathbf{S}_{n}^{(k)}}(\boldsymbol{\gamma })|\stackrel{\text{P}}{\longrightarrow }0\]
as $n\to \infty $. Therefore
\[ \operatorname{\mathsf{P}}\left\{\underset{\boldsymbol{\gamma }\in {\mathcal{N}_{\varepsilon }}}{\inf }|{\mathbf{S}_{n}^{(k)}}(\boldsymbol{\gamma })|\le \frac{{s_{\text{min}}}}{2}\right\}\to 0.\]
Since ${\mathbf{S}_{n}^{(k)}}({\hat{\boldsymbol{\vartheta }}_{n}^{(k)}})=0$, this implies $\operatorname{\mathsf{P}}\{{\hat{\boldsymbol{\vartheta }}_{n}^{(k)}}\in {\mathcal{N}_{\varepsilon }}\}\to 0$ as $n\to \infty $, i.e. ${\hat{\boldsymbol{\vartheta }}_{n}^{(k)}}\stackrel{\text{P}}{\longrightarrow }{\boldsymbol{\vartheta }^{(k)}}$.
2. Let us show (15).
By Lemma 4,
(39)
\[ \underset{i=1,\dots ,n}{\max }\underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\mathbf{S}_{-in}^{(k)}}-{\mathbf{S}_{\infty }^{(k)}}(\boldsymbol{\gamma })|\stackrel{\text{P}}{\longrightarrow }0.\]
Then
\[ \operatorname{\mathsf{P}}\left\{\underset{i=1,\dots ,n}{\min }\underset{\boldsymbol{\gamma }\in {\mathcal{N}_{\varepsilon }}}{\inf }|{\mathbf{S}_{-in}^{(k)}}(\boldsymbol{\gamma })|\le \frac{{s_{\text{min}}}}{2}\right\}\]
\[ \le \operatorname{\mathsf{P}}\left\{\underset{i=1,\dots ,n}{\max }\underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\mathbf{S}_{\infty }^{(k)}}(\boldsymbol{\gamma })-{\mathbf{S}_{-in}^{(k)}}(\boldsymbol{\gamma })|\ge \frac{{s_{\text{min}}}}{2}\right\}\stackrel{\text{P}}{\longrightarrow }0.\]
From this we obtain (15) by the same way as in the first part of the proof.  □
Proof of Theorem 2.
The proof follows the lines of the proof of Theorem A.10 in [6] with the use of (38) and (39) instead of the law of large numbers.  □
Proof of Theorem 4.
Let
\[ {\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma })=\mathbf{s}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma })-\operatorname{\mathsf{E}}\mathbf{s}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma }).\]
Then, due to (6),
\[ {\sum \limits_{j=1}^{n}}{a_{j;n}^{l}}{\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma })={\sum \limits_{j=1}^{n}}{a_{j;n}^{l}}\mathbf{s}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma })-{\sum \limits_{m=1}^{M}}{\sum \limits_{j=1}^{n}}{a_{j;n}^{l}}{p_{j;n}^{m}}\operatorname{\mathsf{E}}\mathbf{s}({\boldsymbol{\xi }_{(m)}},\boldsymbol{\gamma })\]
\[ ={\sum \limits_{j=1}^{n}}{a_{j;n}^{l}}\mathbf{s}({\boldsymbol{\xi }_{j;n}},\boldsymbol{\gamma })={\mathbf{S}_{n}^{(l)}}(\boldsymbol{\gamma }).\]
So
(40)
\[ {\mathbf{S}_{n}^{(l)}}({\hat{\boldsymbol{\vartheta }}_{n}^{(l)}})={\sum \limits_{j=1}^{n}}{a_{j;n}^{l}}{\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\hat{\boldsymbol{\vartheta }}_{n}^{(l)}})=0.\]
Similarly,
(41)
\[ {\mathbf{S}_{-in}^{(l)}}({\hat{\boldsymbol{\vartheta }}_{-in}^{(l)}})=\sum \limits_{j\ne i}{a_{j;-in}^{l}}{\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\hat{\boldsymbol{\vartheta }}_{-in}^{(l)}})=0.\]
By the Mean Value theorem, there exists ${t_{ni}^{l}}\in [0,1]$ such that
(42)
\[ -{\mathbf{S}_{-in}^{(l)}}({\hat{\boldsymbol{\vartheta }}_{n}^{(l)}})={\mathbf{S}_{-in}^{(l)}}({\hat{\boldsymbol{\vartheta }}_{-in}^{(l)}})-{\mathbf{S}_{-in}^{(l)}}({\hat{\boldsymbol{\vartheta }}_{n}^{(l)}})={\dot{\mathbf{S}}_{-in}^{(l)}}({\boldsymbol{\zeta }_{-in}^{(l)}})({\hat{\boldsymbol{\vartheta }}_{-in}^{(l)}}-{\hat{\boldsymbol{\vartheta }}_{n}^{(l)}}),\]
where
\[ {\boldsymbol{\zeta }_{-in}^{(l)}}=(1-{t_{ni}^{l}}){\hat{\boldsymbol{\vartheta }}_{-in}^{(l)}}+{t_{ni}^{l}}{\hat{\boldsymbol{\vartheta }}_{n}^{(l)}}.\]
Observe that, by Assumption (JK1),
\[ |{\dot{\mathbf{s}}_{j;n}}(\mathbf{x},{\boldsymbol{\gamma }_{1}})-{\dot{\mathbf{s}}_{j;n}}(\mathbf{x},{\boldsymbol{\gamma }_{2}})|\le 2h(\mathbf{x})|{\boldsymbol{\gamma }_{1}}-{\boldsymbol{\gamma }_{2}}|.\]
Applying Lemma 4 with ${\psi _{j;n}}={\dot{\mathbf{s}}_{j;n}^{(l)}}$, $\rho ({\boldsymbol{\gamma }_{1}},{\boldsymbol{\gamma }_{2}})$, we obtain
(43)
\[ \underset{i=1,\dots ,n}{\max }\underset{\boldsymbol{\gamma }\in \Theta }{\sup }|{\dot{\mathbf{S}}_{-in}^{(l)}}(\boldsymbol{\gamma })-{\mathbf{M}^{(l)}}(\boldsymbol{\gamma })|\stackrel{\text{P}}{\longrightarrow }0\]
as $n\to \infty $.
By Assumption (JK1), applying the Lebesgue Dominated Convergence theorem, we obtain ${\mathbf{M}^{(l)}}(\boldsymbol{\gamma })\to {\mathbf{M}^{(l)}}({\boldsymbol{\vartheta }^{(l)}})$ as $\boldsymbol{\gamma }\to {\boldsymbol{\vartheta }^{(l)}}$. So, by Assumptions (JK2) and (JK3), we obtain
\[ \underset{i=1,\dots ,n}{\max }|{\dot{\mathbf{S}}_{-in}^{(l)}}({\boldsymbol{\zeta }_{-in}^{(l)}})-{\mathbf{M}^{(l)}}({\boldsymbol{\vartheta }^{(l)}})|\stackrel{\text{P}}{\longrightarrow }0.\]
By (AN5) ${\mathbf{M}^{(l)}}({\boldsymbol{\vartheta }^{(l)}})={\mathbf{M}^{(l)}}$ is nonsingular, so
\[ \operatorname{\mathsf{P}}\{\det {\dot{\mathbf{S}}_{-in}^{(l)}}({\boldsymbol{\zeta }_{-in}^{(l)}})\ne 0,\forall i=1,\dots ,n\}\to 1\]
and
(44)
\[ {\Lambda _{n}^{l}}=\underset{i=1,\dots ,n}{\max }|{({\dot{\mathbf{S}}_{-in}^{(l)}}({\boldsymbol{\zeta }_{-in}^{(l)}}))^{-1}}|={O_{p}}(1).\]
So, with probability which tends to 1 as $n\to \infty $,
(45)
\[ \begin{aligned}{}|{\hat{\boldsymbol{\vartheta }}_{-in}^{(l)}}-{\hat{\boldsymbol{\vartheta }}_{n}^{(l)}}|& =|{({\dot{\mathbf{S}}_{-in}^{(l)}}({\boldsymbol{\zeta }_{-in}^{(l)}}))^{-1}}(-{\mathbf{S}_{-in}^{(l)}}({\hat{\boldsymbol{\vartheta }}_{n}}))|\le {\Lambda _{n}^{l}}|{\mathbf{S}_{-in}^{(l)}}({\hat{\boldsymbol{\vartheta }}_{n}^{(l)}})|\\ {} & ={\Lambda _{n}^{l}}|{\mathbf{S}_{-in}^{(l)}}({\boldsymbol{\vartheta }^{(l)}})+{\dot{\mathbf{S}}_{-in}^{(l)}}({\tilde{\boldsymbol{\zeta }}_{-in}^{(l)}})({\hat{\boldsymbol{\vartheta }}_{n}^{(l)}}-{\boldsymbol{\vartheta }^{(l)}})|,\end{aligned}\]
where ${\tilde{\boldsymbol{\zeta }}_{-in}^{(l)}}$ are some intermediate points between ${\hat{\boldsymbol{\vartheta }}_{n}^{(l)}}$ and ${\boldsymbol{\vartheta }^{(l)}}$. From (43) and Assumption (JK2) we obtain
(46)
\[ {\dot{\mathbf{S}}_{-in}^{(l)}}({\tilde{\boldsymbol{\zeta }}_{ni}^{(l)}})({\hat{\boldsymbol{\vartheta }}_{n}^{(l)}}-{\boldsymbol{\vartheta }^{(l)}})={O_{P}}({n^{-1/2}}).\]
Then
\[ {\mathbf{S}_{-in}^{(l)}}({\boldsymbol{\vartheta }^{(l)}})={\mathbf{S}_{n}^{(l)}}({\boldsymbol{\vartheta }^{(l)}})-{a_{i;n}^{l}}{\mathbf{s}_{i;n}}({\boldsymbol{\xi }_{i;n}},{\boldsymbol{\vartheta }^{(l)}})-\sum \limits_{j\ne i}({a_{j;n}^{l}}-{a_{j;-in}^{l}}){\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\boldsymbol{\vartheta }^{(l)}}).\]
Observe that $\operatorname{\mathsf{E}}{\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\boldsymbol{\vartheta }^{(l)}})=0$ and
\[ \operatorname{\mathsf{E}}|{\mathbf{S}_{n}^{(l)}}({\boldsymbol{\vartheta }^{(l)}}){|^{2}}={\sum \limits_{j=1}^{n}}{({a_{j;n}^{(l)}})^{2}}\operatorname{\mathsf{E}}|{\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\boldsymbol{\vartheta }^{(l)}}){|^{2}}=O({n^{-1}})\]
due to Lemma 1. So ${\mathbf{S}_{n}^{(l)}}({\boldsymbol{\vartheta }^{(l)}})={O_{P}}({n^{-1/2}})$.
By Lemmas 1, 2 and Assumption (JK1),
(47)
\[ \underset{i=1,\dots ,n}{\max }|{a_{i;n}^{l}}{\mathbf{s}_{i;n}}({\boldsymbol{\xi }_{i;n}},{\boldsymbol{\vartheta }^{(l)}})+\sum \limits_{j\ne i}({a_{j;n}^{l}}-{a_{j;-in}^{l}}){\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\boldsymbol{\vartheta }^{(l)}})|={O_{P}}({n^{\beta -1}})\]
for any $\beta \ge 1/\alpha $. With $\beta =1/2$ we get
\[ \underset{i=1,\dots ,n}{\max }|{\mathbf{S}_{-in}^{(l)}}({\boldsymbol{\vartheta }^{(l)}})|={O_{P}}({n^{-1/2}}).\]
This with (44)–(46) yields
(48)
\[ \underset{i=1,\dots ,n}{\max }|{\hat{\boldsymbol{\vartheta }}_{-in}^{(l)}}-{\hat{\boldsymbol{\vartheta }}^{(l)}}|={O_{P}}({n^{-1/2}}).\]
Let ${\mathbf{M}_{-in}^{(l)}}={\dot{\mathbf{S}}_{-in}^{(l)}}({\boldsymbol{\zeta }_{-in}^{(l)}})$. By (40)–(42), we obtain
(49)
\[ \begin{aligned}{}{\hat{\boldsymbol{\vartheta }}_{-in}^{(l)}}-{\hat{\boldsymbol{\vartheta }}_{n}^{(l)}}& ={({\mathbf{M}_{-in}^{(l)}})^{-1}}({\mathbf{S}_{n}^{(l)}}({\hat{\boldsymbol{\vartheta }}_{n}^{(l)}})-{\mathbf{S}_{-in}^{(l)}}({\hat{\boldsymbol{\vartheta }}_{n}^{(l)}}))\\ {} & ={({\mathbf{M}_{-in}^{(l)}})^{-1}}\left({a_{i;n}^{l}}{\mathbf{s}_{i;n}}({\boldsymbol{\xi }_{i;n}},{\hat{\boldsymbol{\vartheta }}_{n}^{(l)}})\hspace{-0.1667em}-\hspace{-0.1667em}\sum \limits_{j\ne i}({a_{j;n}^{l}}\hspace{-0.1667em}-\hspace{-0.1667em}{a_{j;-in}^{l}}){\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\hat{\boldsymbol{\vartheta }}_{n}^{(l)}})\right)\hspace{-0.1667em}.\end{aligned}\]
Put
(50)
\[ {\mathbf{U}_{i}^{(l)}}={a_{i;n}^{l}}{\mathbf{s}_{i;n}}({\boldsymbol{\xi }_{i;n}},{\boldsymbol{\vartheta }^{(l)}})+\sum \limits_{j\ne i}({a_{j;n}^{l}}-{a_{j;-in}^{l}}){\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\boldsymbol{\vartheta }^{(l)}}),\]
(51)
\[ \begin{aligned}{}{\Delta _{i}^{(l)U}}=& {a_{i;n}^{l}}({\mathbf{s}_{i;n}}({\boldsymbol{\xi }_{i;n}},{\hat{\boldsymbol{\vartheta }}_{n}^{(l)}})-{\mathbf{s}_{i;n}}({\boldsymbol{\xi }_{i;n}},{\boldsymbol{\vartheta }^{(l)}}))\\ {} & +\sum \limits_{j\ne i}({a_{j;n}^{l}}-{a_{j;-in}^{l}})({\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\hat{\boldsymbol{\vartheta }}_{n}})-{\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\boldsymbol{\vartheta }^{(l)}})),\end{aligned}\]
(52)
\[ {\Delta _{i}^{(l)M}}={({\mathbf{M}_{-in}^{(l)}}))^{-1}}-{({\mathbf{M}^{(l)}})^{-1}}.\]
Then, by (49),
\[ {\hat{\boldsymbol{\vartheta }}_{-in}^{(l)}}-{\hat{\boldsymbol{\vartheta }}_{n}^{(l)}}=({({\mathbf{M}^{(l)}})^{-1}}+{\Delta _{i}^{(l)M}})({\mathbf{U}_{i}^{(l)}}+{\Delta _{i}^{(l)U}}).\]
So
(53)
\[ {\hat{\mathbf{V}}_{n}^{(k,m)}}=n{\sum \limits_{i=1}^{n}}({\hat{\boldsymbol{\vartheta }}_{-in}^{(k)}}-{\hat{\boldsymbol{\vartheta }}_{n}^{(k)}}){({\hat{\boldsymbol{\vartheta }}_{-in}^{(m)}}-{\hat{\boldsymbol{\vartheta }}_{n}^{(m)}})^{T}}=n{\sum \limits_{i=1}^{n}}{v_{i}},\]
where
(54)
\[ {\mathbf{v}_{i}}=({({\mathbf{M}^{(k)}})^{-1}}+{\Delta _{i}^{(k)M}})({\mathbf{U}_{i}^{(k)}}+{\Delta _{i}^{(k)U}}){({\mathbf{U}_{i}^{(m)}}+{\Delta _{i}^{(m)U}})^{T}}{({({\mathbf{M}^{(m)}})^{-1}}+{\Delta _{i}^{(m)M}})^{-T}}.\]
Consider
(55)
\[ {\tilde{\mathbf{V}}_{n}^{(k,m)}}=n{\sum \limits_{i=1}^{n}}{\tilde{\mathbf{v}}_{i}},\hspace{2.5pt}{\tilde{\mathbf{v}}_{i}}={({\mathbf{M}^{(k)}})^{-1}}{\mathbf{U}_{i}^{(k)}}{({\mathbf{U}_{i}^{(m)}})^{T}}{({\mathbf{M}^{(m)}})^{-T}}.\]
We will show that
(56)
\[ |{\hat{\mathbf{V}}_{n}^{(k,m)}}-{\tilde{\mathbf{V}}_{n}^{(k,m)}}|\stackrel{\text{P}}{\longrightarrow }0,\hspace{2.5pt}\text{as}\hspace{2.5pt}n\to \infty ,\]
and
(57)
\[ {\tilde{\mathbf{V}}_{n}}(k,m)\stackrel{\text{P}}{\longrightarrow }{\mathbf{V}^{(k,m)}},\hspace{2.5pt}\text{as}\hspace{2.5pt}n\to \infty .\]
Convergences (56) and (57) for all $k,m=1,\dots ,M$ imply the statement of the theorem.
To show (56) consider the following expansion
\[ {\mathbf{v}_{i}}-{\tilde{\mathbf{v}}_{i}}={\mathbf{v}_{i}^{1}}+{\mathbf{v}_{i}^{2}}+{\mathbf{v}_{1}^{3}}+{\mathbf{v}_{1}^{4}},\]
where
(58)
\[ \begin{aligned}{}{\mathbf{v}_{i}^{1}}& ={\Delta _{i}^{(k)M}}({\mathbf{U}_{i}^{(k)}}+{\Delta _{i}^{(k)U}}){({\mathbf{U}_{i}^{(m)}}+{\Delta _{i}^{(m)U}})^{T}}{({({\mathbf{M}^{(m)}})^{-1}}+{\Delta _{i}^{(m)M}})^{T}},\\ {} {\mathbf{v}_{i}^{2}}& ={({\mathbf{M}^{(k)}})^{-1}}{\Delta _{i}^{(k)U}}{({\mathbf{U}_{i}^{(m)}}+{\Delta _{i}^{(m)U}})^{T}}{({({\mathbf{M}^{(m)}})^{-1}}+{\Delta _{i}^{(m)M}})^{T}},\\ {} {\mathbf{v}_{i}^{3}}& ={({\mathbf{M}^{(k)}})^{-1}}{\mathbf{U}_{i}^{(k)}}{({\Delta _{i}^{(m)U}})^{T}}{({({\mathbf{M}^{(m)}})^{-1}}+{\Delta _{i}^{(m)M}})^{T}},\\ {} {\mathbf{v}_{i}^{4}}& ={({\mathbf{M}^{(k)}})^{-1}}{\mathbf{U}_{i}^{(k)}}{({\mathbf{U}_{i}^{(m)}})^{T}}({({\Delta _{i}^{(m)M}})^{T}}.\end{aligned}\]
Let us estimate each ${\mathbf{v}_{i}^{l}}$ separately. At first we bound ${\Delta ^{(k)M}}$.
Applying Lemma 4 to ${\psi _{j;n}}(\mathbf{x},\boldsymbol{\gamma })=\frac{\partial }{\partial {\gamma ^{(l)}}}{\dot{\mathbf{s}}_{j;n}}(\mathbf{x},\boldsymbol{\gamma })$, $l=1,\dots ,d$, by the same way as for ${\mathbf{S}_{-in}^{(k)}}$ one obtains
(59)
\[ \underset{i=1,\dots ,n}{\max }\underset{\boldsymbol{\gamma }\in \Theta }{\sup }\left|\frac{\partial }{\partial {\gamma ^{l}}}{\mathbf{S}_{-in}^{(k)}}(\boldsymbol{\gamma })\right|={O_{p}}(1).\]
Then, by the Mean Value theorem and Assumption (JK2), we get
(60)
\[ \underset{i=1,\dots ,n}{\max }|{\dot{\mathbf{S}}_{-in}^{(k)}}({\boldsymbol{\zeta }_{-in}^{(k)}})-{\dot{\mathbf{S}}_{-in}^{(k)}}({\boldsymbol{\vartheta }^{(k)}})|={O_{P}}(1){O_{P}}({n^{-1/2}})={O_{P}}({n^{-1/2}}).\]
Applying Lemmas 1 and 2 by the same way as in (47) to ${\dot{\mathbf{s}}_{j;n}^{(l)}}$ we obtain
(61)
\[ \underset{i=1,\dots ,n}{\max }|{\dot{\mathbf{S}}_{-in}^{(k)}}({\boldsymbol{\vartheta }^{(k)}})-{\dot{\mathbf{S}}^{(k)}}({\boldsymbol{\vartheta }^{(k)}})|={O_{P}}({n^{\beta -1}}).\]
Variances of each entry of ${\dot{\mathbf{S}}_{n}^{(k)}}(\boldsymbol{\vartheta })$ can be estimated as $O({n^{-1}})$, so
(62)
\[ |{\dot{\mathbf{S}}_{n}^{(k)}}({\boldsymbol{\vartheta }^{(k)}})-{\mathbf{M}^{(k)}}|={O_{P}}({n^{-1/2}}).\]
Since ${\mathbf{M}_{-in}^{(k)}}={\dot{\mathbf{S}}_{-in}^{(k)}}({\boldsymbol{\zeta }_{-in}^{(k)}})$, formulas (60)–(62) yield
\[ \underset{i=1,\dots ,n}{\max }|{\mathbf{M}_{-in}^{(k)}}-{\mathbf{M}^{(k)}}|={O_{P}}({n^{-1/2}}),\]
So, due to Assumption (AN5),
(63)
\[ \underset{i=1,\dots ,n}{\max }|{\Delta _{i}^{(k)M}}|={O_{P}}({n^{-1/2}}).\]
By Lemmas 1 and 2 and Assumption (JK2),
(64)
\[ \underset{i=1,\dots ,n}{\max }|{\mathbf{U}_{i}^{(k)}}|={O_{P}}({n^{\beta -1}}).\]
Let us bound
(65)
\[ \begin{aligned}{}\underset{i=1,\dots ,n}{\max }|{\Delta _{i}^{(k)U}}|\le & \underset{i=1,\dots ,n}{\max }(|{a_{i;n}^{k}})|\cdot |{\dot{\mathbf{s}}_{i;n}}({\boldsymbol{\zeta }_{i}})|\cdot |{\hat{\boldsymbol{\vartheta }}_{n}^{(k)}}-{\boldsymbol{\vartheta }^{(k)}}|\\ {} & +\sum \limits_{j\ne i}|{a_{j;n}^{k}}-{a_{j;-in}^{k}}|\cdot |{\dot{\mathbf{s}}_{i;n}}({\boldsymbol{\zeta }_{j}})|\cdot |{\hat{\boldsymbol{\vartheta }}_{n}^{(k)}}-{\boldsymbol{\vartheta }^{(k)}}|.\end{aligned}\]
(here ${\boldsymbol{\zeta }_{j}}$ are some intermediate points between ${\hat{\boldsymbol{\vartheta }}_{n}^{(k)}}$ and ${\boldsymbol{\vartheta }^{(k)}}$).
By Lemma 2, ${\max _{j=1,\dots ,n}}{\sup _{\boldsymbol{\gamma }\in \Theta }}|{\dot{\mathbf{s}}_{j;n}}(\boldsymbol{\gamma })|={O_{P}}({n^{\beta }})$. So (65), Lemma 1 and (JK2) imply
(66)
\[ \underset{i=1,\dots ,n}{\max }|{\Delta _{i}^{(k)U}}|={O_{P}}({n^{\beta -3/2}}).\]
Now we bound ${\mathbf{v}_{i}^{l}}$ defined by (58).
By (63), (64) and (66),
\[\begin{aligned}{}\underset{i=1,\dots ,n}{\max }|{\mathbf{v}_{i}^{1}}|& \le \underset{i=1,\dots ,n}{\max }|{\Delta _{i}^{(k)M}}|\cdot (|{\mathbf{U}_{i}^{(k)}}|+|{\Delta _{i}^{(k)U}}|)\\ {} & \cdot (|{\mathbf{U}_{i}^{(m)}}|+|{\Delta _{i}^{(m)U}}|)(|{({\mathbf{M}^{(m)}})^{-1}}|+|{\Delta _{i}^{(m)M}}|)\\ {} & ={O_{P}}({n^{-1/2}}){O_{P}}{({n^{\beta -1}})^{2}}{O_{P}}(1)={O_{P}}({n^{2\beta -5/2}}).\end{aligned}\]
Similarly,
\[ \underset{i=1,\dots ,n}{\max }|{\mathbf{v}_{i}^{4}}|={O_{P}}({n^{2\beta -5/2}}).\]
For ${\mathbf{v}_{i}^{2}}$ (and, similarly, ${\mathbf{v}_{i}^{3}}$), we have
\[\begin{aligned}{}\underset{i=1,\dots ,n}{\max }|{\mathbf{v}_{i}^{2}}|& \le \underset{i=1,\dots ,n}{\max }|{({\mathbf{M}^{(k)}})^{-1}}|\cdot |{\Delta _{i}^{(k)U}}|\\ {} & \cdot (|{\mathbf{U}_{i}^{(m)}}|+|{\Delta _{i}^{(m)U}}|)(|{({\mathbf{M}^{(m)}})^{-1}}|+|{\Delta _{i}^{(m)M}}|)\\ {} & ={O_{P}}({n^{\beta -3/2}}){O_{P}}({n^{\beta -1}}){O_{P}}(1)={O_{P}}({n^{2\beta -5/2}}).\end{aligned}\]
Therefore
\[ |{\hat{\mathbf{V}}_{n}^{(m,k)}}-{\tilde{\mathbf{V}}_{n}^{(m,k)}}|\le n{\sum \limits_{i=1}^{n}}|{\mathbf{v}_{i}}-{\tilde{\mathbf{v}}_{i}}|\le {n^{2}}\underset{i=1,\dots ,n}{\max }{\sum \limits_{l=1}^{4}}|{\mathbf{v}_{i}^{l}}|={O_{P}}({n^{2\beta -1/2}})={o_{P}}(1)\]
for $1/\alpha \le \beta <1/4$. (Recall that we can take any $\beta \ge 1/\alpha $ and $\alpha >4$).
So (56) holds. To prove the theorem we need only to verify (57).
Consider
\[ {\hat{\mathbf{Z}}_{n}^{(k,m)}}=n{\sum \limits_{i=1}^{n}}{\mathbf{U}_{i}^{(k)}}{({\mathbf{U}_{i}^{(m)}})^{T}}.\]
Then $\operatorname{\mathsf{E}}{\hat{\mathbf{Z}}_{n}^{(k,m)}}={\bar{\mathbf{Z}}_{1,n}}+{\bar{\mathbf{Z}}_{2,n}}$, where
\[ {\bar{\mathbf{Z}}_{1,n}}=n{\sum \limits_{j=1}^{n}}{a_{j;n}^{k}}{a_{j;n}^{m}}\operatorname{\mathsf{E}}{\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\boldsymbol{\vartheta }^{(k)}}){\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\boldsymbol{\vartheta }^{(m)}}),\]
\[ {\bar{\mathbf{Z}}_{2,n}}=n{\sum \limits_{j=1}^{n}}\sum \limits_{j\ne i}({a_{j;n}^{k}}-{a_{j;-in}^{k}})({a_{j;n}^{m}}-{a_{j;-in}^{m}})\operatorname{\mathsf{E}}{\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\boldsymbol{\vartheta }^{(k)}}){\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\boldsymbol{\vartheta }^{(m)}}).\]
By Lemma 1,
\[ \underset{i=1,\dots ,n}{\max }|\hspace{-0.1667em}\sum \limits_{j\ne i}({a_{j;n}^{k}}\hspace{-0.1667em}-{a_{j;-in}^{k}})({a_{j;n}^{m}}-{a_{j;-in}^{m}})\hspace{-0.1667em}\operatorname{\mathsf{E}}{\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},{\boldsymbol{\vartheta }^{(k)}}){\mathbf{s}_{j;n}}({\boldsymbol{\xi }_{j;n}},\hspace{-0.1667em}{\boldsymbol{\vartheta }^{(m)}})|\hspace{-0.1667em}=\hspace{-0.1667em}O({n^{-3}}),\]
so ${\bar{\mathbf{Z}}_{2,n}}={n^{2}}O({n^{-3}})=O({n^{-1}})$. This implies
(67)
\[ \operatorname{\mathsf{E}}{\hat{\mathbf{Z}}_{n}^{(k,m)}}\sim {\bar{\mathbf{Z}}_{1,n}}\to {\mathbf{Z}^{(k,m)}}.\]
Let us bound
(68)
\[ \operatorname{\mathsf{E}}|{\hat{\mathbf{Z}}_{n}^{(k,m)}}-\operatorname{\mathsf{E}}{\hat{\mathbf{Z}}_{n}^{(k,m)}}{|^{2}}\le {\sum \limits_{p,q=1}^{d}}\operatorname{\mathsf{Var}}({\hat{Z}_{n}^{pq(k,m)}})\]
(here and below ${\hat{Z}_{n}^{pq(k,m)}}$ is the $(p,q)$ entry of the matrix ${\hat{\mathbf{Z}}_{n}^{(k,m)}}$, ${U_{i}^{p(k)}}$ is the p-th entry of the vector ${\mathbf{U}^{(k)}}$).
Consider
(69)
\[ \begin{aligned}{}\operatorname{\mathsf{Var}}({\hat{Z}_{n}^{pq(k,m)}})& ={n^{2}}{\sum \limits_{i=1}^{n}}\operatorname{\mathsf{Var}}({U_{i}^{p(k)}}{U_{i}^{q(m)}})\le {n^{3}}\underset{i,p,q}{\max }\operatorname{\mathsf{E}}{({U_{i}^{p(k)}}{U_{i}^{q(m)}})^{2}}\\ {} & \le {n^{3}}\sqrt{\underset{i,p,q}{\max }\operatorname{\mathsf{E}}{({U_{i}^{p(k)}})^{4}}\operatorname{\mathsf{E}}{({U_{i}^{q(m)}})^{4}}}\le {n^{3}}\underset{i,p,l}{\max }\operatorname{\mathsf{E}}{({U_{i}^{p(l)}})^{4}}.\end{aligned}\]
(We applied the Cauchy–Schwarz inequality here.)
Let ${\eta _{j}}={s_{j;n}^{p}}({\boldsymbol{\xi }_{j;n}},{\boldsymbol{\vartheta }^{(k)}})$, ${b_{ij}}={a_{j;n}^{k}}-{a_{j;-in}^{k}}$. Then $\operatorname{\mathsf{E}}{\eta _{i}}=0$, so
(70)
\[ \operatorname{\mathsf{E}}{({U_{i}^{p(k)}})^{4}}=\operatorname{\mathsf{E}}{({a_{i}^{k}}{\eta _{i}}+\sum \limits_{j\ne i}{b_{ij}}{\eta _{j}})^{4}}={J_{1;n}}+{J_{2;n}}+{J_{3,n}},\]
where
\[ {J_{1;n}}={({a_{i;n}^{k}})^{4}}\operatorname{\mathsf{E}}{({\eta _{i}})^{4}},\hspace{2.5pt}{J_{2;n}}=6{({a_{i;n}^{k}})^{2}}\operatorname{\mathsf{E}}{({\eta _{i}})^{2}}\operatorname{\mathsf{E}}{(\sum \limits_{j\ne i}{b_{ij}}{\eta _{j}})^{2}},\]
\[ {J_{3;n}}=\operatorname{\mathsf{E}}{(\sum \limits_{j\ne i}{b_{ij}}{\eta _{j}})^{4}}.\]
By Assumption (JK1) and Lemma 1, ${J_{1;n}}=O({n^{-4}})$,
\[ {J_{2;n}}\le \frac{C}{{n^{2}}}\sum \limits_{j\ne i}{({b_{ij}})^{2}}\operatorname{\mathsf{E}}{({\eta _{j}})^{2}}=O({n^{-5}}),\]
\[ {J_{3;n}}\le \sum \limits_{j\ne i}{({b_{ij}})^{4}}\operatorname{\mathsf{E}}{({\eta _{j}})^{4}}+6\sum \limits_{{j_{1}},{j_{2}}\ne i}{({b_{i{j_{1}}}})^{2}}{({b_{i{j_{2}}}})^{2}}\operatorname{\mathsf{E}}{({\eta _{{j_{1}}}})^{2}}\operatorname{\mathsf{E}}{({\eta _{{j_{2}}}})^{2}}\]
\[ =O({n^{-7}})+{n^{2}}O({n^{-8}})=O({n^{-6}}).\]
So, from (70) we obtain
\[ \underset{i,n}{\sup }\operatorname{\mathsf{E}}{({U_{i}^{p(l)}})^{4}}=O({n^{-4}})\]
and (69), (68) yield
\[ \operatorname{\mathsf{E}}|{\hat{Z}_{n}^{(k,m)}}-\operatorname{\mathsf{E}}{\hat{Z}_{n}^{(k,m)}}{|^{2}}=O({n^{-4}}){n^{3}}=O({n^{-1}}).\]
This with (67) imply (57).  □

Footnotes

1 Original EIT scores range from 100 to 200. In this presentation we rescale them onto $[0,1]$.

References

[1] 
Efron, B., Stein, C.: The jackknife estimate of variance. Ann. Appl. Stat. 9, 586–596 (1981). MR0615434
[2] 
Grün, B., Leisch, F.: Fitting finite mixtures of linear regression models with varying & fixed effects in R. In: Rizzi, A., Vichi, M. (eds.) Compstat 2006 – Proceedings in Computational Statistics, pp. 853–860. Physica Verlag, Heidelberg, Germany (2006). MR2173118
[3] 
Maiboroda, R., Sugakova, O.: Statistics of mixtures with varying concentrations with application to DNA microarray data analysis. J. Nonparametr. Stat. 24(1), 201–205 (2012). MR2885834. https://doi.org/10.1080/10485252.2011.630076
[4] 
Maiboroda, R., Sugakova, O., Doronin, A.: Generalized estimating equations for mixtures with varying concentrations. Can. J. Stat. 41(2), 217–236 (2013). MR3061876. https://doi.org/10.1002/cjs.11170
[5] 
Maiboroda, R., Sugakova, O.: Jackknife covariance matrix estimation for observations from mixture. Mod. Stoch. Theory Appl. 6(4), 495–513 (2019). MR4047396. https://doi.org/10.15559/19-vmsta145
[6] 
Masiuk, S., Kukush, A., Shklyar, S., Chepurny, M., Likhtarov, I.: Radiation Risk Estimation: Based on Measurement Error Models. De Gruyter, Berlin, (2017). MR3726857
[7] 
Miroshnichenko, V., Maiboroda, R.: Asymptotic normality of modified LS estimator for mixture of nonlinear regressions. Mod. Stoch. Theory Appl. 7(4), 435–448 (2020). MR4195645
[8] 
Miroshnychenko, V.O.: Generalized least squares estimates for mixture of nonlinear regressions. Bulletin of Taras Shevchenko National University of Kyiv Series: Physics & Mathematics 5, 25–29 (2019)
[9] 
Shao, J.: Consistency of least-squares estimator and its jackknife variance estimator in nonlinear models. Can. J. Stat. 20(4), 415–428 (1992). MR1208353. https://doi.org/10.2307/3315611
[10] 
Shao, J.: Mathematical statistics. Springer, (2003). MR2002723. https://doi.org/10.1007/b97553
[11] 
Shao, J., Tu, D.: The Jackknife and Bootstrap. Springer, (2012). MR1351010. https://doi.org/10.1007/978-1-4612-0795-5
[12] 
Wang, L., Yu, F. Jackknife resample method for precision estimation of weighted total least squares. Commun. Stat., Simul. Comput. (2019). MR4253814. https://doi.org/10.1080/03610918.2019.1580727
[13] 
Wang, L. Yu, F. Li, Z. Zou, C. Jackknife method for variance components estimation of partial EIV model. J. Surv. Eng. 146(4), (2020). https://doi.org/10.1061/(ASCE)SU.1943-5428.0000327
Reading mode PDF XML

Table of contents
  • 1 Introduction
  • 2 MVC model and GEE estimation
  • 3 Jackknife for MVC
  • 4 Main theorems
  • 5 Application to sociological data
  • 6 Conclusion
  • 7 Proofs
  • Footnotes
  • References

Copyright
© 2022 The Author(s). Published by VTeX
by logo by logo
Open access article under the CC BY license.

Keywords
Finite mixture model nonlinear regression mixture with varying concentrations generalized estimating equations jackknife confidence ellipsoid 62F40 62G10

Metrics
since March 2018
483

Article info
views

377

Full article
views

359

PDF
downloads

168

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

  • Figures
    2
  • Theorems
    4
vmsta208_g001.jpg
Fig. 1.
Estimated logistic regression lines by EIT-2016 data. Solid line for 1st component, dashed line for 2nd, dotted line for 3rd one
vmsta208_g002.jpg
Fig. 2.
Confidence ellipsoids for logistic regression parameters by EIT-2016 data
Theorem 1 (Consistency).
Theorem 2 (Existence).
Theorem 3 (Asymptotic normality).
Theorem 4.
vmsta208_g001.jpg
Fig. 1.
Estimated logistic regression lines by EIT-2016 data. Solid line for 1st component, dashed line for 2nd, dotted line for 3rd one
vmsta208_g002.jpg
Fig. 2.
Confidence ellipsoids for logistic regression parameters by EIT-2016 data
Theorem 1 (Consistency).
Let the following assumptions hold.
  • (C1) Θ is a compact set in ${\mathbb{R}^{d}}$.
  • (C2) Condition ${\boldsymbol{\Psi }_{h,\rho }}$ holds for the elementary estimating functions s with some functions ρ and h.
  • (C3) ρ is a continuous function on $\Theta \times \Theta $ with $\rho (\boldsymbol{\gamma },\boldsymbol{\gamma })=0$ for all $\boldsymbol{\gamma }\in \Theta $.
  • (C4) For all $l=1,\dots ,M$, $\operatorname{\mathsf{E}}|\mathbf{s}({\boldsymbol{\xi }_{(l)}},{\boldsymbol{\vartheta }^{(l)}}){|^{2}}<\infty $ and $\operatorname{\mathsf{E}}{(h({\boldsymbol{\xi }_{(l)}}))^{2}}<\infty $.
  • (C5) $\operatorname{\mathsf{E}}\mathbf{s}({\xi _{(k)}},\boldsymbol{\gamma })=0$ if and only if $\boldsymbol{\gamma }={\boldsymbol{\vartheta }^{(k)}}$.
  • (C6) $\det {\boldsymbol{\Gamma }_{\infty }}>0$.
  • (C7) $\operatorname{\mathsf{P}}\{\exists \boldsymbol{\gamma }\in \Theta \textit{, such that}\hspace{2.5pt}{\mathbf{S}_{n}^{(k)}}(\boldsymbol{\gamma })=0\}\to 1$ as $n\to \infty $.
  • (C7’) $\operatorname{\mathsf{P}}\{\forall i=1,\dots ,n,\exists {\boldsymbol{\gamma }_{i}}\in \Theta \textit{, such that}\hspace{2.5pt}{\mathbf{S}_{-in}^{(k)}}({\boldsymbol{\gamma }_{i}})=0\}\to 1$ as $n\to \infty $.
Then, under the assumptions (C1)–(C7), ${\hat{\boldsymbol{\vartheta }}_{n}^{(k)}}\stackrel{\text{P}}{\longrightarrow }{\boldsymbol{\vartheta }^{(k)}}$ as $n\to \infty $, and under the assumptions (C1)–(C6), (C7’),
(15)
\[ \underset{i=1,\dots ,n}{\sup }|{\hat{\boldsymbol{\vartheta }}_{-in}^{(k)}}-{\boldsymbol{\vartheta }^{(k)}}|\stackrel{\text{P}}{\longrightarrow }0\]
as $n\to \infty $.
Theorem 2 (Existence).
Let the assumptions (C1)–(C6) of Theorem 1 hold and, moreover,
  • (E1) ${\boldsymbol{\vartheta }^{(k)}}$ is an inner point of Θ.
  • (E2) ${\dot{\mathbf{S}}_{\infty }^{(k)}}({\boldsymbol{\vartheta }^{(k)}})$ exists and $\det {\dot{\mathbf{S}}_{\infty }^{(k)}}({\boldsymbol{\vartheta }^{(k)}})\ne 0$.
Then assumptions (C7) and (C7’) of Theorem 1 hold.
Theorem 3 (Asymptotic normality).
Let the following assumptions hold.
  • (AN1) $\boldsymbol{\vartheta }$ is an inner point of ${\Theta ^{M}}=\Theta \times \cdots \times \Theta $.
  • (AN2) There exists an open ball B centered in $\boldsymbol{\vartheta }$, such that the derivatives
    \[ \frac{{\partial ^{2}}{s^{l}}(\mathbf{x},\boldsymbol{\gamma })}{\partial {\gamma ^{i}}\partial {\gamma ^{j}}}\]
    exist for all $\boldsymbol{\gamma }={({\gamma ^{1}},\dots ,{\gamma ^{d}})^{T}}\in B$, all $l,i,j=1,\dots ,d$, and almost all x (w.r.t. all ${F^{(k)}}$, $k=1,\dots ,M$).
  • (AN3) There exists a function $h:{\mathbb{R}^{D}}\to \mathbb{R}$ such that
    \[ \underset{l,i,j}{\max }\underset{\boldsymbol{\gamma }\in B}{\sup }\left|\frac{{\partial ^{2}}{s^{l}}(\mathbf{x},\boldsymbol{\gamma })}{\partial {\gamma ^{i}}\partial {\gamma ^{j}}}\right|\le h(x)\]
    and $\operatorname{\mathsf{E}}{(h({\boldsymbol{\xi }_{(k)}}))^{\alpha }}<\infty $ for some $\alpha >1$ for all $k=1,\dots M$.
  • (AN4) $\operatorname{\mathsf{E}}|\mathbf{s}({\boldsymbol{\xi }_{(k)}},{\boldsymbol{\vartheta }^{(k)}}){|^{2}}<\infty $ for all $k=1,\dots M$.
  • (AN5) ${\mathbf{M}^{(k)}}$ are finite and nonsingular for all $k=1,\dots ,M$.
  • (AN6) The limits $\langle {\mathbf{a}^{k}}{\mathbf{a}^{m}}{\mathbf{p}^{i}}{\mathbf{p}^{l}}\rangle $ exist for all $k,m,i,l=1,\dots ,M$.
  • (AN7) Matrix ${\boldsymbol{\Gamma }_{\infty }}$ exists and is nonsingular.
  • (AN8) ${\hat{\boldsymbol{\vartheta }}_{n}}$ exists and is a consistent estimator for $\boldsymbol{\vartheta }$.
Then
\[ \sqrt{n}({\hat{\boldsymbol{\vartheta }}_{n}}-\boldsymbol{\vartheta })\stackrel{\text{W}}{\longrightarrow }N(0,\mathbf{V})\]
as $n\to \infty $.
Theorem 4.
Assume that assumptions (AN1), (AN5), (AN6), (AN7) of Theorem 3 hold and, moreover:
  • (JK1) There exists a function $h:{\mathbb{R}^{D}}\to \mathbb{R}$ such that
    \[ \underset{\boldsymbol{\gamma }\in \Theta }{\sup }|\mathbf{s}(\mathbf{x},\boldsymbol{\gamma })|\le h(\mathbf{x}),\hspace{2.5pt}\underset{\boldsymbol{\gamma }\in \Theta }{\sup }|\dot{\mathbf{s}}(\mathbf{x},\boldsymbol{\gamma })|\le h(\mathbf{x}),\]
    \[ \underset{l,i,j}{\max }\underset{\boldsymbol{\gamma }\in B}{\sup }\left|\frac{{\partial ^{2}}{s^{l}}(\mathbf{x},\boldsymbol{\gamma })}{\partial {\gamma ^{i}}\partial {\gamma ^{j}}}\right|\le h(x)\]
    and for some $\alpha >4$,
    \[ \operatorname{\mathsf{E}}{(h({\boldsymbol{\xi }_{(l)}}))^{\alpha }}<\infty ,\]
  • (JK2) ${\hat{\boldsymbol{\vartheta }}_{n}}$ is a $\sqrt{n}$-consistent estimator of $\boldsymbol{\vartheta }$,
  • (JK3) ${\sup _{i=1,\dots ,n}}|{\hat{\boldsymbol{\vartheta }}_{-in}}-\boldsymbol{\vartheta }|\stackrel{\text{P}}{\longrightarrow }0$ as $n\to \infty $.
Then ${\hat{\mathbf{V}}_{n}}\stackrel{\text{P}}{\longrightarrow }\mathbf{V}$ as $n\to \infty $.

MSTA

MSTA

  • Online ISSN: 2351-6054
  • Print ISSN: 2351-6046
  • Copyright © 2018 VTeX

About

  • About journal
  • Indexed in
  • Editors-in-Chief

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • ejournals-vmsta@vtex.lt
  • Mokslininkų 2A
  • LT-08412 Vilnius
  • Lithuania
Powered by PubliMill  •  Privacy policy