Asymptotic normality of modified LS estimator for mixture of nonlinear regressions

Miroshnichenko, Vitalii; Maiboroda, Rostyslav

doi:10.15559/20-VMSTA167

Abstract

We consider a mixture with varying concentrations in which each component is described by a nonlinear regression model. A modified least squares estimator is used to estimate the regressions parameters. Asymptotic normality of the derived estimators is demonstrated. This result is applied to confidence sets construction. Performance of the confidence sets is assessed by simulations.

1 Introduction

Nonlinear regression models are widely used in analysis of statistical data [14, 16]. In many applications the observed data are derived from a mixture of components with different dependencies between the variables in different components. In this case a finite mixture model can be used to describe the data [15, 19, 2]. If the concentrations of components in the mixture are different for different observations then the model of mixture with varying concentrations (MVC) can be applied [1, 12, 11]. Parametric models of nonlinear regression mixtures were considered in [5, 4]. Estimation in linear regression MVC models was studied also in [6, 8].

In this paper we adopt a semiparametric approach with the use of modified least squares (mLS) technique. The consistency of mLS estimators in regression MVC models was demonstrated in [9]. Our aim is to derive conditions of mLS estimators asymptotic normality and construct confidence sets for the true values of parameters.

The rest of the paper is organized as follows. In Section 2 we introduce the regression mixture model and the mLS estimator for the regression parameters. Asymptotic behavior of the estimator is discussed in Section 3. Confidence ellipsoids for the parameters are constructed in Section 4. Results of simulations are presented in Section 5. Conclusive remarks are made in Section 6.

2 The model and estimator

In this paper we consider regression technique application to data, which are described by the model of mixture with varying concentrations. It means that each observed subject belongs to one of M different sub-populations (i.e. components of the mixture). We observe n such subjects ${O_{1}}$, …, ${O_{n}}$. The true number of component which ${O_{j}}$ belongs to will be denoted by ${\kappa _{j}}$. These numbers are not observed, but one knows the probabilities

\[ {p_{j;n}^{k}}=\operatorname{\mathsf{P}}\{{\kappa _{j}}=k\}.\]

These probabilities are called the mixing probabilities or concentrations of the components at j-th observation.

For each subject ${O_{j}}$, one observes a set of numerical variables ${\boldsymbol{\xi }_{j;n}}={\boldsymbol{\xi }_{j}}=({Y_{j}},{X_{j}^{1}},\dots ,{X_{j}^{m}})$, where Y is the response and ${\mathbf{X}_{j}}={({X_{j}^{1}},\dots ,{X_{j}^{m}})^{T}}$ is the vector of independent variables in the regression model

\[ Y=g(\mathbf{X};\boldsymbol{\vartheta })+\varepsilon ,\]

where g is a known regression function, $\boldsymbol{\vartheta }={({\vartheta _{1}},\dots ,{\vartheta _{d}})^{T}}$ is a vector of unknown regression coefficients, ε is an unobservable regression error. In fact, the coefficients of the model can be different for different components: $\boldsymbol{\vartheta }={\boldsymbol{\vartheta }^{(k)}}$ if ${\kappa _{j}}=k$. The distribution of ε can also depend on ${\kappa _{j}}$. These dependencies are described in the following model

(1)

\[ {Y_{j}}=g({\mathbf{X}_{j}},{\boldsymbol{\vartheta }^{({\kappa _{j}})}})+{\varepsilon _{j}^{({\kappa _{j}})}}.\]

Here ${\boldsymbol{\vartheta }^{(k)}}\in {\Theta ^{(k)}}\subseteq {\mathbb{R}^{d}}$ is the vector of unknown regression coefficients corresponding to the k-th mixture component, ${\varepsilon _{j}^{(k)}}$, $j=1,\dots ,n$, $k=1,\dots ,M$ are independent random variables with distribution dependent on k but not on j.

We will assume that

\[ \operatorname{\mathsf{E}}{\varepsilon _{j}^{(k)}}=0,\hspace{2.5pt}\operatorname{Var}{\varepsilon _{j}^{(k)}}={\sigma ^{2(k)}}<\infty .\]

(The values of ${\sigma ^{2(k)}}$ are unknown.) The independent variables vectors ${\mathbf{X}_{j}}$ are considered as random vectors with distribution possibly dependent on ${\kappa _{j}}$. It is assumed that the error term ${\varepsilon _{j}^{({\kappa _{j}})}}$ and ${\mathbf{X}_{j}}$ are conditionally independent for given ${\kappa _{j}}$. The vectors $({\boldsymbol{\xi }_{j}},{\kappa _{j}})$ are independent for different j.

In what follows we will frequently use expectations and probabilities connected with different mixture components. To present them in a compact form, we introduce formal random vectors $({Y^{(m)}},{\mathbf{X}^{(m)}},{\varepsilon ^{(m)}})$ which have conditional distribution of $({Y_{j}},{\mathbf{X}_{j}},{\varepsilon _{j}^{({\kappa _{j}})}})$ given ${\kappa _{j}}=m$, i.e., the distribution of the m-th component.

We will also denote by ${\mathbf{p}_{;n}}$ the matrix of all concentrations for all observations and all components:

\[ {\mathbf{p}_{;n}}=\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}{p_{1;n}^{1}}& \dots & {p_{1;n}^{M}}\\ {} \vdots & \ddots & \vdots \\ {} {p_{n;n}^{1}}& \dots & {p_{n;n}^{M}}\end{array}\right),\]

${\mathbf{p}_{j;n}}={({p_{j;n}^{1}},\dots ,{p_{j;n}^{M}})^{T}}$, ${\mathbf{p}_{;n}^{m}}={({p_{1;n}^{m}},\dots ,{p_{n;n}^{m}})^{T}}$.

Similar notation is used for the weights matrix ${\mathbf{a}_{;n}}$ introduced below.

We are interested in estimating the parameters ${\boldsymbol{\vartheta }^{(k)}}$ for different components. The considered estimators are based on the modified least squares (mLS) approach. Namely, we consider the weighted least squares functional

\[ {J^{(k)}}(\mathbf{t})={\sum \limits_{j=1}^{n}}{a_{j;n}^{k}}{({Y_{j}}-g({\mathbf{X}_{j}};\mathbf{t}))^{2}},\]

where $\mathbf{t}\in {\Theta ^{(k)}}$ is a formal parameter, ${a_{j;n}^{k}}$ are some weights aimed to single out the k-th mixture component and suppress influence of all other components on the functional ${J^{(k)}}$. In this presentation, we restrict ourselves by the minimax weights matrix defined as

(2)

\[ {\mathbf{a}_{;n}}={\boldsymbol{\Gamma }_{;n}^{-1}}{\mathbf{p}_{;n}},\]

where

\[ {\boldsymbol{\Gamma }_{;n}}={\mathbf{p}_{;n}^{T}}{\mathbf{p}_{;n}}.\]

(It is assumed here that ${\boldsymbol{\Gamma }_{;n}}$ is nonsingular. See [11, 12] for the minimax properties of these weights.) It is readily seen that

(3)

\[ {({\mathbf{a}_{;n}^{k}})^{T}}{\mathbf{p}_{;n}^{m}}=\mathbb{1}\{m=k\}.\]

(Here $\mathbb{1}\{A\}$ is the indicator function of an event A.) So

\[\begin{array}{l}\displaystyle \operatorname{\mathsf{E}}{J^{(k)}}(\mathbf{t})={\sum \limits_{j=1}^{n}}{a_{j;n}^{k}}{\sum \limits_{m=1}^{M}}{p_{j;n}^{m}}\operatorname{\mathsf{E}}{({Y^{(m)}}-g({\mathbf{X}^{(m)}},\mathbf{t}))^{2}}=\operatorname{\mathsf{E}}{({Y^{(k)}}-g({\mathbf{X}^{(k)}},\mathbf{t}))^{2}}\\ {} \displaystyle \bar{=}\operatorname{\mathsf{E}}{(g({\mathbf{X}^{(k)}},{\vartheta ^{(k)}})-g({\mathbf{X}^{(k)}},\mathbf{t}))^{2}}+{\sigma ^{2(k)}}\stackrel{\text{def}}{=}\bar{J}(\mathbf{t}).\end{array}\]

The minimum of $\bar{J}(\mathbf{t})$ is attained at $\mathbf{t}={\boldsymbol{\vartheta }^{(k)}}$. Thus, if ${\boldsymbol{\vartheta }^{(k)}}$ is the unique minimum point, one expects that under suitable conditions ${J^{(k)}}(\mathbf{t})\to \bar{J}(\mathbf{t})$ by the law of large numbers, and ${\operatorname{argmin}_{\mathbf{t}\in {\Theta ^{(k)}}}}J(\mathbf{t})\to {\boldsymbol{\vartheta }^{(k)}}$ as $n\to \infty $. If g is smooth enough, the argmin can be found as a solution to

\[ {\dot{\mathbf{J}}^{(k)}}(\mathbf{t})=0,\]

where ${\dot{\mathbf{J}}^{(k)}}(\mathbf{t})$ denotes the vector of partial derivatives of ${J^{(k)}}(\mathbf{t})$ by each entry of t.

In what follows we define the mLS estimator ${\hat{\vartheta }_{;n}^{(k)}}$ for ${\boldsymbol{\vartheta }^{(k)}}$ as a statistic which is a solution to

(4)

\[ {\dot{\mathbf{J}}^{(k)}}(\mathbf{t})={\sum \limits_{j=1}^{n}}{a_{j;n}^{k}}({Y_{j}}-g({\mathbf{X}_{j}},\mathbf{t}))\dot{\mathbf{g}}({\mathbf{X}_{j}},\mathbf{t})=0,\]

where

\[ \dot{\mathbf{g}}({\mathbf{X}_{j}},\mathbf{t})={\left(\frac{\partial g({\mathbf{X}_{j}},\mathbf{t})}{\partial {t^{1}}},\dots ,\frac{\partial g({\mathbf{X}_{j}},\mathbf{t})}{\partial {t^{d}}}\right)^{T}}.\]

If there are many solutions to (4) then ${\hat{\vartheta }_{;n}^{(k)}}$ can be taken any of them, but it must be a measurable function from the observed data $({Y_{j}},{\mathbf{X}_{j}})$, $j=1,\dots ,n$.

Note that so defined mLS estimator can be a point of local minimum of ${J^{(k)}}(\mathbf{t})$. But we still call it mLS since in was obtained by a modification of the LS technique.

3 Asymptotic behavior of mLS estimators

In this section, we consider asymptotic behavior of ${\hat{\vartheta }_{;n}^{(k)}}$ as the sample size n tends to infinity. Let us start with some general assumptions on the model.

In this paper we make no assumptions on connections between ${\mathbf{p}_{;n}}$ and ${\mathbf{p}_{;m}}$, when $n\ne m$ and don’t assume that they tend to some limit as $n\to \infty $. Some assumptions are made only on asymptotic behavior of some averaged characteristics of concentrations.

Note that if a significant fraction of ${p_{j;n}^{k}}$ is bounded away from zero, then entries of the matrix ${\boldsymbol{\Gamma }_{;n}}={\mathbf{p}_{;n}^{T}}{\mathbf{p}_{;n}}$ are of order n as $n\to \infty $. In what follows we will assume that the limit matrix

(5)

\[ \underset{n\to \infty }{\lim }\frac{1}{n}{\boldsymbol{\Gamma }_{;n}}=\boldsymbol{\Gamma }\]

exists and is nonsingular.

Then the weights ${a_{j;n}^{k}}$ are of order $1/n$ as $n\to \infty $ and the sums of the form ${\textstyle\sum _{j=1}^{n}}{a_{j;n}^{k}}{a_{j;n}^{m}}{p_{j;n}^{l}}{p_{j;n}^{i}}$ are of order $1/n$ as well.

We will assume that the limits

(6)

\[ \underset{n\to \infty }{\lim }n{\sum \limits_{j=1}^{n}}{a_{j;n}^{k}}{a_{j;n}^{m}}{p_{j;n}^{l}}{p_{j;n}^{i}}\stackrel{\text{def}}{=}\langle {\mathbf{a}^{k}}{\mathbf{a}^{m}}{\mathbf{p}^{l}}{\mathbf{p}^{i}}\rangle \]

exist, for all k, m, l, $i=1,\dots ,M$. Then

\[ \underset{n\to \infty }{\lim }n{\sum \limits_{j=1}^{n}}{a_{j;n}^{k}}{a_{j;n}^{m}}{p_{j;n}^{l}}\stackrel{\text{def}}{=}\langle {\mathbf{a}^{k}}{\mathbf{a}^{m}}{\mathbf{p}^{l}}\rangle ={\sum \limits_{i=1}^{M}}\langle {\mathbf{a}^{k}}{\mathbf{a}^{m}}{\mathbf{p}^{l}}{\mathbf{p}^{i}}\rangle ,\]

since ${\textstyle\sum _{i=1}^{M}}{p_{j;n}^{i}}=1$.

We will also denote

(7)

\[ \mathbf{h}({\boldsymbol{\xi }_{j}},\mathbf{t})=({Y_{j}}-g({\mathbf{X}_{j}},\mathbf{t}))\dot{\mathbf{g}}({\mathbf{X}_{j}},\mathbf{t}),\]

so ${\hat{\boldsymbol{\vartheta }}_{;n}^{(k)}}$ is a solution to

(8)

\[ {\sum \limits_{j=1}^{n}}{a_{j;n}^{k}}\mathbf{h}({\boldsymbol{\xi }_{j}},\mathbf{t})=0.\]

Note that (8) is an unbiased generalized estimating equation (GEE, see [17], Section 5.4).

Conditions of the ${\hat{\boldsymbol{\vartheta }}_{;n}^{(k)}}$ consistency are presented in the next statement.

Theorem 1.

Assume the following.

1. Γ is nonsingular.

2. ${\Theta ^{(k)}}$ is a compact set in ${\mathbb{R}^{d}}$.

3. There exists $\delta >0$ such that $\operatorname{\mathsf{E}}|{\varepsilon ^{(m)}}{|^{\delta }}<\infty $, $\operatorname{\mathsf{E}}\| {\mathbf{X}^{(m)}}{\| ^{\delta }}<\infty $ and

\[ \operatorname{\mathsf{E}}\underset{\mathbf{t}\in {\Theta ^{(m)}}}{\sup }\| \mathbf{h}({\xi ^{(m)}},\mathbf{t}){\| ^{1+\delta }}<\infty ,\]

for all $m=1,\dots ,M$.

4. The families of functions $g(\mathbf{x},\cdot ),\mathbf{x}\in {\mathbb{R}^{M}}$ and $\dot{\mathbf{g}}(\mathbf{x},\cdot ),\mathbf{x}\in {\mathbb{R}^{M}}$ are equicontinuous on ${\Theta ^{(k)}}$.

5. $\operatorname{\mathsf{E}}\mathbf{h}({\xi ^{(k)}},\mathbf{t})\ne 0$ if $\mathbf{t}\ne {\boldsymbol{\vartheta }^{(k)}}$.

Then ${\hat{\boldsymbol{\vartheta }}_{;n}^{(k)}}\to {\boldsymbol{\vartheta }^{(k)}}$ in probability as $n\to \infty $.

Proof.

See [9]. □

Now consider the asymptotic normality of ${\hat{\boldsymbol{\vartheta }}_{;n}^{(k)}}$. We will start with a result formulated in more general terms of GEE estimation.

Assume that ${\Xi _{;n}}=({\boldsymbol{\xi }_{j;n}}$, $j=1,\dots ,n)$ are random observations in a measurable space $\mathfrak{X}$ described by the model of mixture with varying concentrations (MVC), i.e.,

(9)

\[ \operatorname{\mathsf{P}}\{{\boldsymbol{\xi }_{j;n}}\in A\}={\sum \limits_{m=1}^{M}}{p_{j;n}^{m}}{F^{(m)}}(A),\]

where ${F^{(m)}}(A)=\operatorname{\mathsf{P}}\{{\boldsymbol{\xi }^{(m)}}\in A\}$ is the distribution of the observed variable $\boldsymbol{\xi }$ for subjects from the m-th mixture component. Let $\boldsymbol{\vartheta }=\boldsymbol{\vartheta }(F)\in {\mathbb{R}^{d}}$ be a functional on a set of possible components’ distributions. To estimate ${\boldsymbol{\vartheta }^{(k)}}=\boldsymbol{\vartheta }({F^{(k)}})$, we consider an estimating equation of the form (8), where $\mathbf{h}={({h^{1}},\dots ,{h^{d}})^{T}}$ is some estimating function $\mathbf{h}:\mathfrak{X}\times {\mathbb{R}^{d}}\to {\mathbb{R}^{d}}$ such that

\[ \operatorname{\mathsf{E}}\mathbf{h}({\boldsymbol{\xi }^{(k)}},{\boldsymbol{\vartheta }^{(k)}})=0.\]

(I.e., h is an unbiased estimating function.) Any statistic ${\hat{\boldsymbol{\vartheta }}_{;n}^{(k)}}$ is called a GEE-estimator for ${\boldsymbol{\vartheta }^{(k)}}$ if it is an a.s. solution to (8), i.e.,

(10)

\[ {\mathbf{H}^{(k)}}({\hat{\boldsymbol{\vartheta }}_{;n}^{(k)}})\stackrel{\text{def}}{=}{\sum \limits_{j=1}^{n}}{a_{j;n}^{k}}\mathbf{h}({\boldsymbol{\xi }_{j}},{\hat{\boldsymbol{\vartheta }}_{;n}^{(k)}})=0\hspace{2.5pt}\text{a.s.}\]

We consider the joint parameters vector $\boldsymbol{\vartheta }={({({\boldsymbol{\vartheta }^{(1)}})^{T}},\dots ,{({\boldsymbol{\vartheta }^{(M)}})^{T}})^{T}}$ and the corresponding estimator ${\hat{\boldsymbol{\vartheta }}_{;n}}={({({\hat{\boldsymbol{\vartheta }}_{;n}^{(1)}})^{T}},\dots ,{({\hat{\boldsymbol{\vartheta }}_{;n}^{(M)}})^{T}})^{T}}$ and derive conditions, under which

(11)

\[ \sqrt{n}({\hat{\boldsymbol{\vartheta }}_{;n}}-\boldsymbol{\vartheta })\stackrel{\text{W}}{\longrightarrow }N(0,\mathbf{S}),\]

where S is some matrix which we now will describe. Denote

(12)

\[ {\mathbf{V}^{(m)}}\stackrel{\text{def}}{=}\operatorname{\mathsf{E}}\dot{\mathbf{h}}({\boldsymbol{\xi }^{(m)}},{\boldsymbol{\vartheta }^{(m)}}),\]

where

\[ \dot{\mathbf{h}}({\boldsymbol{\xi }^{(m)}},\mathbf{t})=\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}\frac{\partial {h^{1}}({\boldsymbol{\xi }^{(m)}},{\boldsymbol{\vartheta }^{(m)}})}{\partial {t^{1}}}& \dots & \frac{\partial {h^{1}}({\boldsymbol{\xi }^{(m)}},{\boldsymbol{\vartheta }^{(m)}})}{\partial {t^{d}}}\\ {} \vdots & \ddots & \vdots \\ {} \frac{\partial {h^{d}}({\boldsymbol{\xi }^{(m)}},{\boldsymbol{\vartheta }^{(m)}})}{\partial {t^{1}}}& \dots & \frac{\partial {h^{d}}({\boldsymbol{\xi }^{(m)}},{\boldsymbol{\vartheta }^{(m)}})}{\partial {t^{d}}}\end{array}\right).\]

Then

\[\begin{array}{l}\displaystyle {\mathbf{Z}^{(m,l)}}={\sum \limits_{i=1}^{M}}\langle {\mathbf{a}^{m}}{\mathbf{a}^{l}}{\mathbf{p}^{i}}\rangle \operatorname{\mathsf{E}}\mathbf{h}({\boldsymbol{\xi }^{(i)}},{\boldsymbol{\vartheta }^{(m)}}){\mathbf{h}^{T}}({\boldsymbol{\xi }^{(i)}},{\boldsymbol{\vartheta }^{(l)}})\\ {} \displaystyle -{\sum \limits_{i=1}^{M}}{\sum \limits_{k=1}^{M}}\langle {\mathbf{a}^{m}}{\mathbf{a}^{l}}{\mathbf{p}^{i}}{\mathbf{p}^{k}}\rangle \operatorname{\mathsf{E}}\mathbf{h}({\boldsymbol{\xi }^{(i)}},{\boldsymbol{\vartheta }^{(m)}})\operatorname{\mathsf{E}}{\mathbf{h}^{T}}({\boldsymbol{\xi }^{(k)}},{\boldsymbol{\vartheta }^{(l)}}),\\ {} \displaystyle \mathbf{S}=\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}{\mathbf{S}^{(1,1)}}& \dots & {\mathbf{S}^{(1,M)}}\\ {} \vdots & \ddots & \vdots \\ {} {\mathbf{S}^{(M,1)}}& \dots & {\mathbf{S}^{(M,M)}}\end{array}\right),\end{array}\]

where

(13)

\[ {\mathbf{S}^{(m,l)}}={({\mathbf{V}^{(m)}})^{-1}}{\mathbf{Z}^{(m,l)}}{({\mathbf{V}^{(m)}})^{-T}}.\]

(Here and below ${\mathbf{V}^{-T}}\stackrel{\text{def}}{=}{({\mathbf{V}^{-1}})^{T}}$.)

Theorem 2.

Let the following assumptions hold.

1. $\boldsymbol{\vartheta }$ is an inner point of $\Theta ={\Theta ^{(1)}}\times \cdots \times {\Theta ^{(M)}}$.

2. $\mathbf{h}(\mathbf{x},\mathbf{t})$ is continuousely differentiable by t, for almost all x ($\hspace{0.3em}\mathrm{mod} \hspace{0.3em}{F^{(m)}}$) for all $m=1,\dots M$.

3. For some $\delta >0$ and some open ball B, such that $\boldsymbol{\vartheta }\in B\subseteq \Theta $

\[ \operatorname{\mathsf{E}}\underset{\mathbf{t}\in B}{\sup }{\left\| \dot{\mathbf{h}}({\boldsymbol{\xi }^{(m)}},\mathbf{t})\right\| ^{1+\delta }}<\infty ,\]

for all $m=1,\dots ,M$.

4. $\operatorname{\mathsf{E}}\| \mathbf{h}({\boldsymbol{\xi }^{(m)}},\boldsymbol{\vartheta }){\| ^{2}}<\infty $, for all $m=1,\dots ,M$.

5. The matrices ${\mathbf{V}^{(m)}}$ are finite and nonsingular, for all $m=1,\dots ,M$.

6. The limits $\langle {\mathbf{a}^{k}}{\mathbf{a}^{m}}{\mathbf{p}^{l}}{\mathbf{p}^{i}}\rangle $ defined in (6) exist, for all $k,m,l,i=1,\dots ,M$.

7. The matrix Γ defined in (5) exists and is nonsingular.

8. ${\hat{\boldsymbol{\vartheta }}_{;n}}$ is a consistent estimator of $\boldsymbol{\vartheta }$.

Then (11) holds with the matrix S defined in (13).

Proof.

The proof of the theorem is quite standard. Applying the Taylor expansion to the LHS of (10) one obtains

\[ \sqrt{n}({\hat{\boldsymbol{\vartheta }}_{;n}^{(k)}}-{\boldsymbol{\vartheta }^{(k)}})=-{[{\dot{\mathbf{H}}^{(k)}}(\zeta )]^{-1}}(\sqrt{n}{\mathbf{H}^{(k)}}({\boldsymbol{\vartheta }^{(k)}})),\]

where ζ is an intermediate point between ${\hat{\boldsymbol{\vartheta }}_{;n}^{(k)}}$ and ${\boldsymbol{\vartheta }^{(k)}}$. In view of Assumptions 2–5 and 7 of the theorem, by the same way as in [17], Theorem 5.14 and Lemma 5.3., it can be shown that as $n\to \infty $,

\[ {\dot{\mathbf{H}}^{(k)}}(\zeta )\to \operatorname{\mathsf{E}}{\dot{\mathbf{H}}^{(k)}}({\boldsymbol{\vartheta }^{(k)}}).\]

Then a straightforward calculation with (3) in mind yields

\[ \operatorname{\mathsf{E}}{\dot{\mathbf{H}}^{(k)}}({\boldsymbol{\vartheta }^{(k)}})={\mathbf{V}^{(k)}}.\]

Note that by (3)

\[ \operatorname{\mathsf{E}}{\mathbf{H}^{(k)}}({\boldsymbol{\vartheta }^{(k)}})=0.\]

Hence

\[\begin{array}{l}\displaystyle \operatorname{Cov}({\mathbf{H}^{(m)}}({\boldsymbol{\vartheta }^{(m)}}),{\mathbf{H}^{(l)}}({\boldsymbol{\vartheta }^{(l)}}))={\sum \limits_{j=1}^{n}}{a_{j;n}^{m}}{a_{j;n}^{l}}\operatorname{Cov}(\mathbf{h}({\boldsymbol{\xi }_{j}},{\boldsymbol{\vartheta }^{(m)}}),\mathbf{h}({\boldsymbol{\xi }_{j}},{\boldsymbol{\vartheta }^{(l)}})),\\ {} \displaystyle \operatorname{Cov}(\mathbf{h}({\boldsymbol{\xi }_{j}},{\boldsymbol{\vartheta }^{(m)}}),\mathbf{h}({\boldsymbol{\xi }_{j}},{\boldsymbol{\vartheta }^{(l)}}))=\operatorname{\mathsf{E}}\mathbf{h}({\boldsymbol{\xi }_{j}},{\boldsymbol{\vartheta }^{(m)}}){\mathbf{h}^{T}}({\boldsymbol{\xi }_{j}},{\boldsymbol{\vartheta }^{(l)}})\\ {} \displaystyle -\operatorname{\mathsf{E}}\mathbf{h}({\boldsymbol{\xi }_{j}},{\boldsymbol{\vartheta }^{(m)}}){\big(\operatorname{\mathsf{E}}\mathbf{h}({\boldsymbol{\xi }_{j}},{\boldsymbol{\vartheta }^{(l)}})\big)^{T}}\\ {} \displaystyle ={\sum \limits_{i=1}^{M}}{p_{j}^{i}}\operatorname{\mathsf{E}}\mathbf{h}({\boldsymbol{\xi }^{(i)}},{\boldsymbol{\vartheta }^{(m)}}){\mathbf{h}^{T}}({\boldsymbol{\xi }_{j}},{\boldsymbol{\vartheta }^{(l)}})-{\sum \limits_{i,k=1}^{M}}\operatorname{\mathsf{E}}{p_{j}^{i}}{p_{j}^{k}}\mathbf{h}({\boldsymbol{\xi }^{(i)}},{\boldsymbol{\vartheta }^{(m)}}){\big(\operatorname{\mathsf{E}}\mathbf{h}({\boldsymbol{\xi }^{(k)}},{\boldsymbol{\vartheta }^{(l)}})\big)^{T}}.\end{array}\]

So,

(14)

\[ \underset{n\to \infty }{\lim }n\operatorname{Cov}({\mathbf{H}^{(m)}}({\boldsymbol{\vartheta }^{(m)}}),{\mathbf{H}^{(l)}}({\boldsymbol{\vartheta }^{(l)}}))={\mathbf{Z}^{(m,l)}}.\]

Then, applying the central limit theorem with the Lindeberg’s condition as in the proof of Theorem 3.1.1 in [11] one shows that the system of vectors $(\sqrt{n}{\mathbf{H}^{(k)}}({\boldsymbol{\vartheta }^{(k)}})$, $k=1,\dots ,M)$ converge weakly to the Gaussian system of vectors $({\mathbf{u}^{(k)}},k=1,\dots ,M)$, such that

\[ \operatorname{\mathsf{E}}{\mathbf{u}^{(k)}}=0,\hspace{2.5pt}\operatorname{\mathsf{E}}{\mathbf{u}^{(k)}}{({\mathbf{u}^{(m)}})^{T}}={\mathbf{Z}^{(k,m)}}.\]

This implies that the system of vectors $(\sqrt{n}({\hat{\boldsymbol{\vartheta }}_{;n}^{(k)}}-{\boldsymbol{\vartheta }^{(k)}}),k=1,\dots ,M)$ converges weakly to the system $({({\mathbf{V}^{(k)}})^{-1}}{\mathbf{u}^{(k)}},k=1,\dots ,M)$.

This result is just the statement of the theorem. □

Return to the regression mixture model (1). Obviously it is a partial case of the MVC model (9). How the matrices ${\mathbf{V}^{(m)}}$ and ${\mathbf{Z}^{(m,l)}}$ can be represented for the regression mixture model?

Assume that h is defined by (7) and the function $g(\mathbf{x},\mathbf{t})$ has second derivatives by t:

\[ \ddot{\mathbf{g}}({\mathbf{X}_{j}},\mathbf{t})=\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}\frac{{\partial ^{2}}g({\mathbf{X}_{j}},\mathbf{t})}{\partial {t^{1}}\partial {t^{1}}}& \dots & \frac{{\partial ^{2}}g({\mathbf{X}_{j}},\mathbf{t})}{\partial {t^{1}}\partial {t^{d}}}\\ {} \vdots & \ddots & \vdots \\ {} \frac{{\partial ^{2}}g({\mathbf{X}_{j}},\mathbf{t})}{\partial {t^{d}}\partial {t^{1}}}& \dots & \frac{{\partial ^{2}}g({\mathbf{X}_{j}},\mathbf{t})}{\partial {t^{d}}\partial {t^{d}}}\end{array}\right).\]

Then

\[\begin{array}{l}\displaystyle {\mathbf{V}^{(m)}}=\operatorname{\mathsf{E}}{\left.\frac{\partial }{\partial \mathbf{t}}[(g({\mathbf{X}^{(m)}},{\boldsymbol{\vartheta }^{(m)}})+{\varepsilon ^{(m)}}-g({\mathbf{X}^{(m)}},\mathbf{t}))\dot{\mathbf{g}}({\mathbf{X}^{(m)}},\mathbf{t})]\right|_{\mathbf{t}={\boldsymbol{\vartheta }^{(m)}}}}\\ {} \displaystyle =-\operatorname{\mathsf{E}}\dot{\mathbf{g}}({\mathbf{X}^{(m)}},{\boldsymbol{\vartheta }^{(m)}}){(\dot{\mathbf{g}}({\mathbf{X}^{(m)}},{\boldsymbol{\vartheta }^{(m)}}))^{T}}+\operatorname{\mathsf{E}}{\varepsilon ^{(m)}}\ddot{\mathbf{g}}({\mathbf{X}^{(m)}},{\boldsymbol{\vartheta }^{(m)}}).\end{array}\]

The second term is zero since ${\varepsilon ^{(m)}}$ is independent from ${\mathbf{X}^{(m)}}$ and $\operatorname{\mathsf{E}}{\varepsilon ^{(m)}}=0$. So

(15)

\[ {\mathbf{V}^{(m)}}=-\operatorname{\mathsf{E}}\dot{\mathbf{g}}({\mathbf{X}^{(m)}},{\boldsymbol{\vartheta }^{(m)}}){(\dot{\mathbf{g}}({\mathbf{X}^{(m)}},{\boldsymbol{\vartheta }^{(m)}}))^{T}}.\]

A similar algebra yields

\[\begin{array}{l}\displaystyle {\mathbf{Z}^{(m,k)}}={\sum \limits_{l=1}^{M}}\langle {\mathbf{a}^{m}}{\mathbf{a}^{k}}{\mathbf{p}^{l}}\rangle [{\sigma ^{2(l)}}A(l,k,m)+B(l,k,m)]\\ {} \displaystyle -{\sum \limits_{i,l=1}^{M}}\langle {\mathbf{a}^{m}}{\mathbf{a}^{k}}{\mathbf{p}^{l}}{\mathbf{p}^{i}}\rangle (G(i,i,k)-G(i,k,k))(G(l,l,m)-G(l,m,m)),\end{array}\]

where

\[\begin{array}{l}\displaystyle A(l,k,m)=\operatorname{\mathsf{E}}\dot{\mathbf{g}}({\mathbf{X}^{(l)}},{\boldsymbol{\vartheta }^{(k)}}){\dot{\mathbf{g}}^{T}}({\mathbf{X}^{(l)}},{\boldsymbol{\vartheta }^{(m)}}),\\ {} \displaystyle B(l,k,m)=\operatorname{\mathsf{E}}(g({\mathbf{X}^{(l)}},{\boldsymbol{\vartheta }^{(l)}})-g({\mathbf{X}^{(l)}},{\boldsymbol{\vartheta }^{(k)}}))\\ {} \displaystyle \times (g({\mathbf{X}^{(l)}},{\boldsymbol{\vartheta }^{(l)}})-g({\mathbf{X}^{(l)}},{\boldsymbol{\vartheta }^{(m)}}))\dot{\mathbf{g}}({\mathbf{X}^{(l)}},{\boldsymbol{\vartheta }^{(k)}}){\dot{\mathbf{g}}^{T}}({\mathbf{X}^{(l)}},{\boldsymbol{\vartheta }^{(m)}}),\\ {} \displaystyle G(l,m,k)=\operatorname{\mathsf{E}}g({\mathbf{X}^{(l)}},{\boldsymbol{\vartheta }^{(m)}})\dot{\mathbf{g}}({\mathbf{X}^{(l)}},{\boldsymbol{\vartheta }^{(k)}}).\end{array}\]

4 Confidence ellipsoids for regression parameters

Apply the results of Section 3 to the construction of asymptotic confidence sets for ${\boldsymbol{\vartheta }^{(k)}}$. For any $\mathbf{t}\in {\mathbb{R}^{d}}$ and any nonsingular $\mathbf{S}\in {\mathbb{R}^{d\times d}}$, define

\[ {T^{(k)}}(\mathbf{t},\mathbf{S})=n{({\hat{\boldsymbol{\vartheta }}_{;n}^{(k)}}-\mathbf{t})^{T}}{\mathbf{S}^{-1}}({\hat{\boldsymbol{\vartheta }}_{;n}^{(k)}}-\mathbf{t}).\]

It is obvious that if Theorem 2 holds and ${\mathbf{S}^{(k,k)}}$ is nonsingular, then

(16)

\[ {T^{(k)}}({\boldsymbol{\vartheta }^{(k)}},{\mathbf{S}^{(k,k)}})\stackrel{\text{W}}{\longrightarrow }{\chi _{d}^{2}},\]

where ${\chi _{d}^{2}}$ is the ${\chi ^{2}}$-distribution with d degrees of freedom. Note that (16) holds also if ${\mathbf{S}^{(k,k)}}$ is replaced by a consistent estimator ${\hat{\mathbf{S}}_{;n}^{(k,k)}}$. Let ${\chi _{\alpha }}$ be the α-upper quantile of ${\chi _{d}^{2}}$. Then the set

\[ {B_{;n}^{k}}(\alpha )=\{\mathbf{t}\in {\mathbb{R}^{d}}:{T^{(k)}}({\boldsymbol{\vartheta }^{(k)}},{\hat{\mathbf{S}}^{(k,k)}})<{\chi _{\alpha }}\}\]

is an asymptotic α-level confidence set for ${\boldsymbol{\vartheta }^{(k)}}$ in the sense that

\[ \operatorname{\mathsf{P}}\{{\boldsymbol{\vartheta }^{(k)}}\in {B_{;n}^{k}}(\alpha )\}\to 1-\alpha \]

as $n\to \infty $.

To accomplish the confidence set construction, we need convenient conditions for the ${\mathbf{S}^{(k,k)}}$ nonsingularity and a consistent estimator of this matrix.

4.1 Nonsingularity of ${\mathbf{S}^{(k,k)}}$

Since

\[ {\mathbf{S}^{(k,k)}}={({\mathbf{V}^{(k)}})^{-1}}{\mathbf{Z}^{(k,k)}}{({\mathbf{V}^{(k)}})^{-T}},\]

we need conditions for the nonsingularity of ${\mathbf{V}^{(k)}}$ and ${\mathbf{Z}^{(k,k)}}$.

Assumption Ilk$\mathbf{{I_{lk}}}$.

For all $\mathbf{c}\in {\mathbb{R}^{d}}$ such that $\mathbf{c}\ne 0$,

\[ \operatorname{\mathsf{P}}\{{\mathbf{c}^{T}}\dot{\mathbf{g}}({\mathbf{X}^{(l)}},{\boldsymbol{\vartheta }^{(k)}})\ne 0\}>0.\]

This assumption means that the functions

\[ {\dot{g}_{i}}(\cdot )={\left.\frac{\partial }{\partial {t^{i}}}g(\cdot ,\mathbf{t})\right|_{\mathbf{t}={\boldsymbol{\vartheta }^{(k)}}}},\hspace{2.5pt}i=1,\dots ,d\]

are linearly independent a.s. with respect to the distribution of X for the l-th component.

Lemma 1.

Assume that the matrix

\[ {\mathbf{A}^{l,k}}=\operatorname{\mathsf{E}}\dot{\mathbf{g}}({\mathbf{X}^{(l)}},{\boldsymbol{\vartheta }^{(k)}}){(\dot{\mathbf{g}}({\mathbf{X}^{(l)}},{\boldsymbol{\vartheta }^{(k)}}))^{T}}\]

exists, is finite and assumption $\mathbf{{I_{lk}}}$ holds. Then A is nonsingular.

Proof.

Observe that ${\mathbf{A}^{l,k}}$ is the Gram matrix of the set of functions $G=({\dot{g}_{1}},\dots ,{\dot{g}_{d}})$ in the ${L_{2}}$ space of functions on ${\mathbb{R}^{d}}$ with inner product

\[ (f,g)=\operatorname{\mathsf{E}}f({\mathbf{X}^{(l)}})g({\mathbf{X}^{(l)}}).\]

Assumption $\mathbf{{I_{lk}}}$ implies that the functions in G are linearly independent in this space. So, their Gram matrix is nonsingular. □

Theorem 3.

Assume that the matrix ${\mathbf{Z}^{(k,k)}}$ exists, is finite, assumption $\mathbf{{I_{kk}}}$ holds and ${\sigma ^{2(k)}}>0$. Then ${\mathbf{S}^{(k,k)}}$ exists and is nonsingular.

Proof.

From ${\mathbf{V}^{(k)}}=-{\mathbf{A}^{k,k}}$ one readily obtains the nonsingularity of ${\mathbf{V}^{(k,k)}}$. Show nonsingularity of ${\mathbf{Z}^{(k,k)}}$.

In what follows ≥ means the Loewner order for matrices, i.e., $\mathbf{A}\ge \mathbf{Z}$ means that $\mathbf{A}-\mathbf{Z}$ is a positive semidefinite matrix.

Observe that

\[\begin{array}{l}\displaystyle \operatorname{Cov}[\mathbf{h}({\xi _{j}},{\boldsymbol{\vartheta }^{(k)}})]=\operatorname{\mathsf{E}}\left[\operatorname{Cov}[\mathbf{h}({\xi _{j}},{\boldsymbol{\vartheta }^{(k)}})\hspace{2.5pt}|{\kappa _{j}}]\right]+\operatorname{Cov}\left[\operatorname{\mathsf{E}}[\mathbf{h}({\xi _{j}},{\boldsymbol{\vartheta }^{(k)}})\hspace{2.5pt}|{\kappa _{j}}]\right]\\ {} \displaystyle \ge \operatorname{\mathsf{E}}\left[\operatorname{Cov}[\mathbf{h}({\xi _{j}},{\boldsymbol{\vartheta }^{(k)}})\hspace{2.5pt}|{\kappa _{j}}]\right]={\sum \limits_{l=1}^{M}}{p_{j;n}^{l}}\operatorname{Cov}[\mathbf{h}({\xi ^{(l)}},{\boldsymbol{\vartheta }^{(k)}})]\\ {} \displaystyle \ge {p_{j;n}^{k}}\operatorname{Cov}[\mathbf{h}({\xi ^{(k)}},{\boldsymbol{\vartheta }^{(k)}})]={p_{j;n}^{k}}{\sigma ^{2(k)}}{\mathbf{A}^{k,k}}.\end{array}\]

So, by (14)

\[ {\mathbf{Z}^{(k,k)}}\ge \langle {({\mathbf{a}^{(k)}})^{2}}{\mathbf{p}^{k}}\rangle {\sigma ^{2(k)}}{\mathbf{A}^{k,k}}.\]

Since ${\mathbf{A}^{k,k}}\ge 0$ and $\det {\mathbf{A}^{k,k}}\ne 0$, to prove the theorem, it is enough to show that $\langle {({\mathbf{a}^{(k)}})^{2}}{\mathbf{p}^{k}}\rangle >0$.

To do this, observe that by (3)

\[ {\sum \limits_{j=1}^{n}}{a_{j;n}^{k}}{p_{j;n}^{k}}=1.\]

Then

\[ {\sum \limits_{j=1}^{n}}{a_{j;n}^{k}}{p_{j;n}^{k}}\mathbb{1}\{{a_{j;n}^{k}}>1/(2n)\}=1-{\sum \limits_{j=1}^{n}}{a_{j;n}^{k}}{p_{j;n}^{k}}\mathbb{1}\{{a_{j;n}^{k}}\le 1/(2n)\}\ge 1/2,\]

since $0\le {p_{j;n}^{k}}\le 1$, and

\[ n{\sum \limits_{j=1}^{n}}{({a_{j;n}^{k}})^{2}}{p_{j;n}^{k}}\ge n{\sum \limits_{j=1}^{n}}({a_{j;n}^{k}}{p_{j;n}^{k}})({a_{j;n}^{k}}\mathbb{1}\{{a_{j;n}^{k}}>1/(2n)\})\ge 1/4.\]

Therefore,

\[ \langle {({\mathbf{a}^{(k)}})^{2}}{\mathbf{p}^{k}}\rangle =\underset{n\to \infty }{\lim }n{\sum \limits_{j=1}^{n}}{({a_{j;n}^{k}})^{2}}{p_{j;n}^{k}}\ge 1/4.\]

□

4.2 Estimation of ${\mathbf{S}^{(k,k)}}$

There are at least two ways to estimate ${\mathbf{S}^{(k,k)}}$. The first is based on the plug-in technique. Namely, we construct empirical counterparts to ${\mathbf{V}^{(k)}}$ and ${\mathbf{Z}^{(k,k)}}$ and substitute them into (13) to obtain an estimator for ${\mathbf{S}^{(k,k)}}$. Formula (15) suggests the following estimator of ${\mathbf{V}^{(k)}}$:

\[ {\hat{\mathbf{V}}_{;n}^{k}}=-{\sum \limits_{j=1}^{n}}{a_{j;n}^{k}}\dot{\mathbf{g}}({\mathbf{X}_{j}},{\hat{\boldsymbol{\vartheta }}_{;n}^{(m)}}){(\dot{\mathbf{g}}({\mathbf{X}_{j}},{\hat{\boldsymbol{\vartheta }}_{;n}^{(m)}}))^{T}}.\]

Estimation of ${\mathbf{Z}^{(k,k)}}$ is more complicated. We can estimate ${\mathbf{M}^{(i,k)}}=\operatorname{\mathsf{E}}\mathbf{h}({\boldsymbol{\xi }^{(i)}},{\boldsymbol{\vartheta }^{(k)}})$ by

\[ {\hat{\mathbf{M}}_{;n}^{(i,k)}}={\sum \limits_{j=1}^{n}}{a_{j;n}^{i}}\mathbf{h}({\boldsymbol{\xi }_{j:n}},{\hat{\boldsymbol{\vartheta }}_{;n}^{(k)}}),\]

and

\[ {\mathbf{D}^{(i,k)}}=\operatorname{\mathsf{E}}\mathbf{h}({\boldsymbol{\xi }^{(i)}},{\boldsymbol{\vartheta }^{(k)}}){\mathbf{h}^{T}}({\boldsymbol{\xi }^{(i)}},{\boldsymbol{\vartheta }^{(k)}})\]

\[ {\hat{\mathbf{D}}_{;n}^{(i,k)}}={\sum \limits_{j=1}^{n}}{a_{j;n}^{i}}\mathbf{h}({\boldsymbol{\xi }^{j;n}},{\hat{\boldsymbol{\vartheta }}_{;n}^{(k)}}){\mathbf{h}^{T}}({\boldsymbol{\xi }^{j;n}},{\hat{\boldsymbol{\vartheta }}_{;n}^{(k)}}).\]

We also replace the limits $\langle {({\mathbf{a}^{(i)}})^{2}}{\mathbf{p}^{l}}\rangle $ and $\langle {({\mathbf{a}^{(i)}})^{2}}{\mathbf{p}^{l}}{\mathbf{p}^{m}}\rangle $ with their approximations

\[ \alpha (i,l)={\sum \limits_{j=1}^{n}}{({a_{j;n}^{i}})^{2}}{p_{j;n}^{l}},\hspace{2.5pt}\alpha (k,i,l)={\sum \limits_{j=1}^{n}}{({a_{j;n}^{k}})^{2}}{p_{j;n}^{i}}{p_{j;n}^{l}}.\]

Then the estimator of ${\mathbf{Z}^{(k,k)}}$ is

\[ {\hat{\mathbf{Z}}_{;n}^{(k,k)}}={\sum \limits_{i=1}^{M}}\alpha (k,i){\hat{\mathbf{D}}_{;n}^{(i,k)}}-{\sum \limits_{i,l=1}^{M}}\alpha (k,i,l){\hat{\mathbf{M}}_{;n}^{(i,k)}}{({\hat{\mathbf{M}}_{;n}^{(l,k)}})^{T}}.\]

Now, the resulting plug-in estimator for ${\mathbf{S}^{(k,k)}}$ is

\[ {\phantom{S}^{plug}}{\hat{\mathbf{S}}_{;n}^{(k,k)}}={({\hat{\mathbf{V}}_{;n}^{k}})^{-1}}{\hat{\mathbf{Z}}_{;n}^{(k,k)}}{({\hat{\mathbf{V}}_{;n}^{k}})^{-T}}.\]

By the same methods as in Theorem 5.15 in [17] it can be shown that under assumptions of Theorem 2, this estimator is consistent.

The second approach to estimation of ${\mathbf{S}^{(k,k)}}$ is based on the jackknife technique. Consider the dataset ${\Xi _{;-i,n}}=({\boldsymbol{\xi }_{1;n}},\dots ,{\boldsymbol{\xi }_{i-1;n}},{\boldsymbol{\xi }_{i+1;n}},\dots ,{\boldsymbol{\xi }_{n;n}})$, which consists of all observations from ${\Xi _{;n}}$ without the i-th one. Similarly, the matrix ${\mathbf{p}_{;-i,n}}$ contains all rows of ${\mathbf{p}_{;n}}$ except the ith one, ${\boldsymbol{\Gamma }_{;-i,n}}={\mathbf{p}_{;-i,n}}{\mathbf{p}_{;-i,n}^{T}}$ and ${\mathbf{a}_{;-i,n}}={\boldsymbol{\Gamma }_{;-i,n}^{-1}}{\mathbf{p}_{;-i,n}}$. Then ${\hat{\boldsymbol{\vartheta }}_{;-i,n}^{(k)}}$ is the GEE estimator ${\hat{\boldsymbol{\vartheta }}^{(k)}}$ constructed by the data ${\Xi _{;-i,n}}$ with the weights ${\mathbf{a}_{;-i,n}}$, i.e., it is a solution to the estimating equation

\[ \sum \limits_{j\ne i}{a_{j;-i,n}^{(k)}}{({Y_{j}}-g({\mathbf{X}_{j}};\mathbf{t}))^{2}}=0.\]

The jackknife estimator of ${\mathbf{S}^{(k,k)}}$ is defined as

\[ {\phantom{S}^{jn}}{\hat{\mathbf{S}}_{;n}^{(k,k)}}=n{\sum \limits_{i=1}^{n}}({\hat{\boldsymbol{\vartheta }}_{;-i,n}^{(k)}}-{\hat{\boldsymbol{\vartheta }}_{;n}^{(k)}}){({\hat{\boldsymbol{\vartheta }}_{;-i,n}^{(k)}}-{\hat{\boldsymbol{\vartheta }}_{;n}^{(k)}})^{T}}.\]

Jackknife estimators for i.i.d. sample are considered in [18]. Consistency of jackknife is demonstrated in [8] for samples from mixtures with varying concentrations in which the components are described by linear erroros-in-variables regression models. We do not state the consistency conditions for ${\phantom{S}^{jn}}{\hat{\mathbf{S}}_{;n}^{(k,k)}}$, but analyze its applicability to the confidence ellipsoids construction in a small simulation study.

5 Simulations results

In the simulation study, the performance of confidence ellipsoids constructed in Section 4 is tested on $N=1000$ simulated samples in each experiment. In all the experiments, we constructed confidence ellipsoids for the regression parameters with nominal covering probability 95% and calculated the obtained covering frequencies, i.e., the percent of ellipsoids which cover the true parameter vector.

The data were generated from a mixture of two components (i.e., $M=2$) with mixing probabilities which also were obtained by random generation:

\[ {p_{j;n}^{k}}=\frac{{u_{j}^{k}}}{{u_{j}^{1}}+{u_{j}^{2}}},\hspace{2.5pt}k=1,2,\]

where ${u_{j}^{i}}$, $i=1,2$, $j=1,\dots ,n$ are independent uniformly distributed on $[0,1]$.

Each observation contains two variables $(Y,X)$, and their distribution for the mth component follows the logistic regression model with continuous response:

\[\begin{array}{l}\displaystyle {Y^{(m)}}=g({X^{(m)}},{\boldsymbol{\vartheta }^{(m)}})+{\varepsilon ^{(m)}},\hspace{2.5pt}{X^{(m)}}\sim N({\mu ^{(m)}},{\Sigma ^{(m)}}),\\ {} \displaystyle g(X,\boldsymbol{\vartheta })=\frac{1}{1+\exp ({\vartheta _{0}}+{\vartheta _{1}}X)}.\end{array}\]

Here ${\boldsymbol{\vartheta }^{(m)}}={({\vartheta _{0}^{(m)}},{\vartheta _{1}^{(m)}})^{T}}$ is the vector of unknown regression parameters for the mth component to be estimated. The true values of parameters with which the data were generated are presented in Table 1. ${\varepsilon ^{(m)}}$ are zero mean regression errors independent of ${X^{(m)}}$, and their distributions were different in different experiments.

Table 1.

True parameters values for the regression model

m	1	2
${\mu ^{(m)}}$	0.0	1.0
${\Sigma ^{(m)}}$	2.0	2.0
${\vartheta _{0}^{(m)}}$	0.5	0.5
${\vartheta _{1}^{(m)}}$	2	−1/3

In each experiment, we calculated

(i) oracle 95% covering sets at which the true value of S is used;

(ii) plug-in 95% confidence ellipsoids based on ${\phantom{S}^{plug}}{\hat{\mathbf{S}}_{;n}^{(k,k)}}$;

(iii) jackknife ellipsoids based on ${\phantom{S}^{jk}}{\hat{\mathbf{S}}_{;n}^{(k,k)}}$.

The ellipsoids were constructed by 1000 simulated samples and covering frequencies were calculated. These frequencies are presented in the tables, for each experiment.

Experiment 1.

Here the error terms were zero mean normal with the variance ${\sigma ^{2(k)}}=0.25$ for $k=1,2$. The resulting covering frequencies are presented in Table 2. It seems that the accuracy of the plug-in confidence ellipsoids in this experiment is not high, but enough for the practical purposes for sample sizes larger then 1000. The plug-in ellipsoids accuracy is nearly the same as for the oracle covering sets, so the observed deviations of the covering frequencies from the nominal confidence probability can not be explained by errors in S estimation. The jackknife ellipsoids are almost as accurate as the plug-in ones.

Table 2.

Covering frequencies for normal regression errors

	first component			second component
n	oracle	plug-in	jk	oracle	plug-in	jk
100	0.668	0.942	0.955	0.729	0.939	0.953
500	0.929	0.931	0.956	0.948	0.934	0.939
1 000	0.954	0.951	0.95	0.944	0.937	0.939
5 000	0.959	0.951	0.943	0.952	0.940	0.931
7 500	0.961	0.942	0.933	0.951	0.938	0.957
1 000	0.954	0.949	0.944	0.944	0.947	0.954

Experiment 2.

Here we consider bounded regression errors, namely ${\varepsilon ^{(k)}}$ are uniform on $[-0.25,0.25]$. The resulting covering frequencies are presented in Table 3. It seems that the accuracy of plug-in and jackknife ellipsoids is nearly the same as in Experiment 1. Paradoxically, the oracle ellipsoids perform somewhat worse than the ones in Experiment 1.

Table 3.

Covering frequencies for uniform regression errors

	first component			second component
n	oracle	plug-in	jk	oracle	plug-in	jk
100	0.593	0.952	0.909	0.684	0.964	0.939
500	0.907	0.929	0.938	0.924	0.929	0.934
1 000	0.917	0.959	0.946	0.951	0.944	0.939
5 000	0.938	0.941	0.934	0.947	0.959	0.933
7 500	0.934	0.948	0.948	0.958	0.956	0.944
1 000	0.937	0.947	0.943	0.955	0.945	0.950

Experiment 3.

Here we compare the ellipsoids accuracy on the regression with heavy-tailed errors. The errors are taken with distribution of $\eta /10$, where η has Student’s t distribution with four degrees of freedom. The results are presented in the Table 4. In this case, the accuracy of jackknife ellipsoids seems significantly worse then in the Experiments 1 and 2. The plug-in ellipsoids show nearly the same performance as in the previous experiments.

Table 4.

Covering frequencies for uniform regression errors

	first component			second component
n	oracle	plug-in	jk	oracle	plug-in	jk
100	0.568	0.942	0.903	0.701	0.932	0.931
500	0.900	0.938	0.944	0.917	0.929	0.956
1 000	0.936	0.932	0.926	0.944	0.931	0.919
5 000	0.930	0.949	0.945	0.942	0.936	0.948
7 500	0.946	0.939	0.953	0.941	0.953	0.926
1 000	0.955	0.947	0.935	0.925	0.944	0.936

6 Conclusion

We presented theoretical results on the asymptotic normality of the modified least squares estimators for mixtures of nonlinear regressions. These results were applied to construction of confidence ellipsoids for the regression coefficients. Simulation results show that the proposed ellipsoids can be used for large enough samples.

Authors

Abstract

1 Introduction

2 The model and estimator

(1)

(2)

(3)

(4)

3 Asymptotic behavior of mLS estimators

(5)

(6)

(7)

(8)

Theorem 1.

Proof.

(9)

(10)

(11)

(12)

(13)

Theorem 2.

Proof.

(14)

(15)

4 Confidence ellipsoids for regression parameters

(16)

4.1 Nonsingularity of ${\mathbf{S}^{(k,k)}}$

Assumption Ilk$\mathbf{{I_{lk}}}$.

Lemma 1.

Proof.

Theorem 3.

Proof.

4.2 Estimation of ${\mathbf{S}^{(k,k)}}$

5 Simulations results

Table 1.

Experiment 1.

Table 2.

Experiment 2.

Table 3.

Experiment 3.

Table 4.

6 Conclusion

References

Export citation

Copy and paste formatted citation

Download citation in file