Linear regression by observations from mixture with varying concentrations

Liubashenko, Daryna; Maiboroda, Rostyslav

doi:10.15559/15-VMSTA41

Abstract

We consider a finite mixture model with varying mixing probabilities. Linear regression models are assumed for observed variables with coefficients depending on the mixture component the observed subject belongs to. A modification of the least-squares estimator is proposed for estimation of the regression coefficients. Consistency and asymptotic normality of the estimates is demonstrated.

1 Introduction

In this paper, we discuss a structural linear regression technique in the context of model of mixture with varying concentrations (MVC). MVC means that the observed subjects belong to M different subpopulations (mixture components). The true numbers of components to which the subjects $O_{j}$, $j=1,\dots ,N$, belong, say, $\kappa _{j}=\kappa (O_{j})$, are unknown, but we know the probabilities

\[{p_{j;N}^{k}}\stackrel{\mathrm{def}}{=}\operatorname{\mathsf{P}}\{\kappa _{j}=k\},\hspace{1em}k=1,\dots ,M\]

(mixing probabilities or concentrations of the mixture components). MVC models arise naturally in the description of medical, biologic, and sociologic data [1, 8, 9, 12]. They can be considered as a generalization of finite mixture models (FMM). Classical theory of FMMs can be found in monographs [10, 13].

Let

\[\xi (O)={\big(Y(O),{X}^{1}(O),\dots ,{X}^{d}(O)\big)}^{T}\]

be a vector of observed features (random variables) of a subject O. We consider the following linear regression model for these variables:

(1)

\[Y(O)=\sum \limits_{i=1}^{d}{b_{i}^{(\kappa (O))}}{X}^{i}(O)+\varepsilon (O),\]

where ${\mathbf{b}}^{(m)}={({b_{1}^{(m)}},\dots ,{b_{d}^{(m)}})}^{T}$ are nonrandom regression coefficients for the mth component, $\varepsilon (O)$ is an error term, which is assumed to be zero mean and conditionally independent of the regressors vector $\mathbf{X}(O)={({X}^{1}(O),\dots ,{X}^{d}(O))}^{T}$ given $\kappa (O)$.

Note. We consider a subject O as taken at random from an infinite population, so it is random in this sense. The vector of observed variables $\xi (O)$ can be considered as a random vector even for a fixed O.

Our aim is to estimate the vectors of regression coefficients ${\mathbf{b}}^{(k)}$, $k=1,\dots ,M$, by the observations $\varXi _{N}=(\xi _{1},\dots ,\xi _{N})$, where $\xi _{j}=\xi (O_{j})$. We assume that $(\kappa _{j},\xi _{j})$ are independent for different j.

A statistical model similar to MVC with (1) is considered in [5], where a parametric model for the conditional distributions of $\varepsilon (O)$ given $\kappa (O)$ is assumed. For this case, maximum likelihood estimation is proposed in [5], and a version of EM-algorithm is developed for numerical computation of the estimates.

In this paper, we adopt a nonparametric approach assuming no parametric models for $\varepsilon (O)$ and $\mathbf{X}(O)$ distributions. Nonparametric and semiparametric technique for MVC was developed in [6, 7, 4]. We use the weighted empirical moment technique to derive estimates for the regression coefficients and then obtain conditions of consistency and asymptotic normality of the estimates. These results are based on general ideas of least squares [11] and moment estimates [3].

The rest of the paper is organized as follows. In Section 2, we recall some results on nonparametric estimation of functional moments in general MVC. The estimates are introduced, and conditions of their consistency and asymptotic normality are presented in Section 3. Section 4 contains proofs of the statements of Section 3. Results of computer simulations are presented in Section 5.

2 Nonparametric estimation for MVC

Let us start with some notation and definitions. We denote by $F_{m}$ the distribution of $\xi (O)$ for O belonging to the mth component of the mixture, that is,

\[F_{m}(A)\stackrel{\mathrm{def}}{=}\operatorname{\mathsf{P}}\big\{\xi (O)\in A\hspace{2.5pt}|\hspace{2.5pt}\kappa (O)=m\big\}\]

for all measurable sets A. Then by the definition of MVC

(2)

\[\operatorname{\mathsf{P}}\{\xi _{j}\in A\}=\sum \limits_{k=1}^{M}{p_{j;N}^{k}}F_{k}(A).\]

In the asymptotic statements, we will consider the data $\varXi _{n}=(\xi _{1},\dots ,\xi _{N})$ as an element of (imaginary) series of data $\varXi _{1},\varXi _{2},\dots ,\varXi _{N},\dots \hspace{0.1667em}$ in which no link between observations for different N is assumed. So, in formal notation, it should be more correct to write $\xi _{j;N}$ instead of $\xi _{j}$, but we will drop the subscript N when it is insignificant.

We consider an array of all concentrations for all data sizes

\[\mathbf{p}\stackrel{\mathrm{def}}{=}\big({p_{j;N}^{k}},k=1,\dots ,M;\hspace{2.5pt}j=1,\dots ,N;\hspace{2.5pt}N=1,2,\dots \hspace{0.1667em}\big).\]

Its subarrays

\[{\mathbf{p}_{N}^{k}}\stackrel{\mathrm{def}}{=}{\big({p_{1;N}^{k}},\dots ,{p_{N;N}^{k}}\big)}^{T}\hspace{1em}\text{and}\hspace{1em}\mathbf{p}_{j;N}={\big({p_{j;N}^{1}},\dots ,{p_{j;N}^{M}}\big)}^{T}\]

are considered as vector columns, and

\[\mathbf{p}_{N}=\big({p_{j;N}^{k}},\hspace{2.5pt}j=1,\dots ,N;\hspace{2.5pt}k=1,\dots ,M\big)\]

as an $N\times M$-matrix. We will also consider a weight array a of the same structure as p with similar notation for its subarrays.

By the angle brackets with subscript n we denote the averaging by $j=1,\dots ,n$:

\[\big\langle {\mathbf{a}}^{k}\big\rangle _{N}\stackrel{\mathrm{def}}{=}\frac{1}{N}\sum \limits_{j=1}^{N}{a_{j;N}^{k}}.\]

Multiplication, summation, and other operations in the angle brackets are made elementwise:

\[\big\langle {\mathbf{a}}^{k}{\mathbf{p}}^{m}\big\rangle _{N}=\frac{1}{N}\sum \limits_{j=1}^{N}{a_{j;N}^{k}}{p_{j;N}^{m}};\hspace{2em}\big\langle {\big({\mathbf{a}}^{k}\big)}^{2}\big\rangle _{N}=\frac{1}{N}\sum \limits_{j=1}^{N}{\big({a_{j;N}^{k}}\big)}^{2}.\]

We define $\langle {\mathbf{p}}^{m}\rangle \stackrel{\mathrm{def}}{=}\lim _{N\to \infty }\langle {\mathbf{p}}^{m}\rangle _{N}$ if this limit exists. Let $\boldsymbol{\varGamma }_{N}={(\langle {\mathbf{p}}^{l}{\mathbf{p}}^{m}\rangle _{N})_{l,m=1}^{M}}=\frac{1}{N}{\mathbf{p}_{N}^{T}}\mathbf{p}_{N}$ be an $M\times M$ matrix, and $\gamma _{lm;N}$ be its $(l,m)$th minor. The matrix $\boldsymbol{\varGamma }_{N}$ can be considered as the Gramian matrix of vectors $({\mathbf{p}}^{1},\dots ,{\mathbf{p}}^{M})$ in the inner product $\langle {\mathbf{p}}^{i}{\mathbf{p}}^{k}\rangle _{N}$, so that it is nonsingular if these vectors are linearly independent.

In what follows, $\stackrel{\mathrm{P}}{\longrightarrow }$ means convergence in probability, and $\stackrel{\mathrm{W}}{\longrightarrow }$ means weak convergence.

Assume now that model (2) holds for the data $\varXi _{N}$. Then the distribution $F_{m}$ of the mth component can be estimated by the weighted empirical measure

(3)

\[\hat{F}_{m;N}(A)\stackrel{\mathrm{def}}{=}\frac{1}{N}\sum \limits_{j=1}^{N}{a_{j;N}^{m}}\mathbb{1}\{\xi _{j}\in A\},\]

where

(4)

\[{a_{j;N}^{m}}=\frac{1}{\det \boldsymbol{\varGamma }_{N}}\sum \limits_{k=1}^{M}{(-1)}^{k+m}\gamma _{mk;N}{p_{j;N}^{k}}.\]

It is shown in [8] that if $\boldsymbol{\varGamma }_{N}$ is nonsingular, then $\hat{F}_{m;N}$ is the minimax unbiased estimate for $F_{m}$. The consistency of $\hat{F}_{m;N}$ is demonstrated in [6] (see also [8]).

Consider now functional moment estimation based on weighted empirical moments. Let $g:{\mathbb{R}}^{d+1}\to {\mathbb{R}}^{k}$ be a measurable function. Then to estimate

\[{\bar{g}}^{(m)}=\operatorname{\mathsf{E}}\big[g\big(\xi (O)\big)\hspace{2.5pt}\big|\hspace{2.5pt}\kappa (O)=m\big]=\int g(x)F_{m}(dx),\]

we can use

\[{\hat{g}_{;N}^{(m)}}=\int g(x)\hat{F}_{m;N}(dx)=\frac{1}{N}\sum \limits_{j=1}^{N}{a_{j;N}^{m}}g(\xi _{j}).\]

Lemma 1 (Consistency).

Assume that

1. $\operatorname{\mathsf{E}}[|g(\xi (O))|\hspace{2.5pt}|\hspace{2.5pt}\kappa (O)=k]<\infty $ for all $k=1,\dots ,M$.
2. There exists $C>0$ such that $\det \boldsymbol{\varGamma }_{N}>C$ for all N large enough.

Then ${\hat{g}_{;N}^{(m)}}\stackrel{\mathrm{P}}{\longrightarrow }{\bar{g}}^{(m)}$ as $N\to \infty $.

This lemma is a simple corollary of Theorem 4.2 in [8]. (See also Theorem 3.1.1 in [7]).

Lemma 2 (Asymptotic normality).

Assume that

1. $\operatorname{\mathsf{E}}[|g(\xi (O)){|}^{2}\hspace{2.5pt}|\hspace{2.5pt}\kappa (O)=k]<\infty $ for all $k=1,\dots ,M$.
2. There exists $C>0$ such that $\det \boldsymbol{\varGamma }_{N}>C$ for all N large enough.
3. There exists the limit
\[\boldsymbol{\varSigma }=\underset{N\to \infty }{\lim }N\operatorname{Cov}\big({\hat{g}_{;N}^{(m)}}\big).\]

Then

\[\sqrt{N}\big({\hat{g}_{;N}^{(m)}}-{\bar{g}}^{(m)}\big)\stackrel{\mathrm{W}}{\longrightarrow }N(0,\boldsymbol{\varSigma }).\]

For univariate ${\hat{g}_{;N}^{(m)}}$, the statement of the lemma is contained in Theorem 4.2 from [8] (or Theorem 3.1.2 in [7]). The multivariate case can be obtained from the univariate one applying the Cramér–Wold device (see [2], p. 382).

3 Estimate for $\mathbf{b}_{m}$ and its asymptotics

In view of Lemma 1, we expect that, under suitable assumptions,

\[\begin{array}{r@{\hskip0pt}l}\displaystyle J_{m;N}(\mathbf{b})& \displaystyle \stackrel{\mathrm{def}}{=}\frac{1}{N}\sum \limits_{j=1}^{N}{\mathbf{a}_{j;N}^{m}}{\Bigg(Y_{j;N}-\sum \limits_{i=1}^{d}b_{i}{X_{j:N}^{i}}\Bigg)}^{2}\\{} & \displaystyle =\int {\Bigg(y-\sum \limits_{i=1}^{d}b_{i}{x}^{i}\Bigg)}^{2}\hat{F}_{m;N}\big(dy,d{x}^{1},\dots ,d{x}^{d}\big)\end{array}\]

converges to

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \int {\Bigg(y-\sum \limits_{i=1}^{d}b_{i}{x}^{i}\Bigg)}^{2}F_{m}\big(dy,d{x}^{1},\dots ,d{x}^{d}\big)\\{} & \displaystyle \hspace{1em}=\operatorname{\mathsf{E}}\Bigg[{\Bigg(Y(O)-\sum \limits_{i=1}^{d}b_{i}{X}^{i}(O)\Bigg)}^{2}\hspace{2.5pt}\Bigg|\hspace{2.5pt}\kappa (O)=m\Bigg]\stackrel{\mathrm{def}}{=}J_{m;\infty }(\mathbf{b})\end{array}\]

as $N\to \infty $.

Since $J_{m;\infty }(\mathbf{b})$ attains its minimal value at ${\mathbf{b}}^{(m)}$, it is natural to suggest the argmin of $J_{m;N}(\mathbf{b})$ as an estimate for ${\mathbf{b}}^{(m)}$. If the weights ${\mathbf{a}}^{m}$ were positive, then this argmin would be

(5)

\[{\hat{\mathbf{b}}_{;N}^{(m)}}\stackrel{\mathrm{def}}{=}{\big({\mathbf{X}}^{T}\mathbf{A}\mathbf{X}\big)}^{-1}{\mathbf{X}}^{T}\mathbf{A}\mathbf{Y},\]

where $\mathbf{X}\stackrel{\mathrm{def}}{=}({X_{j}^{i}})_{j=1,\dots ,N;\hspace{2.5pt}i=1,\dots ,d}$ is the $N\times d$ matrix of observed regressors, $\mathbf{Y}\stackrel{\mathrm{def}}{=}{(Y_{1},\dots ,Y_{N})}^{T}$ is the vector of observed responses, and $\mathbf{A}\stackrel{\mathrm{def}}{=}\operatorname{diag}({a_{1;N}^{m}},\dots ,{a_{N;N}^{m}})$ is the diagonal weight matrix for estimation of mth component. (Obviously, A depends on m, but we do not show it explicitly by a subscript since the number m of the component for which $\mathbf{b}_{m}$ is estimated will be further fixed.)

Generally speaking, by (4) ${a_{j;N}^{m}}$ must be negative for some j, so ${\hat{\mathbf{b}}_{;N}^{(m)}}$ is not necessarily an argmin of $\hat{J}_{m;N}(\mathbf{b})$. But we will take ${\hat{\mathbf{b}}_{;N}^{(m)}}$ as an estimate for $\mathbf{b}_{m}$ and call it a modified least-squares estimate for $\mathbf{b}_{m}$ in MVC model (MVC-LS estimate).

Let

\[{\mathbf{D}}^{(k)}\stackrel{\mathrm{def}}{=}\operatorname{\mathsf{E}}\big[\mathbf{X}(O){\mathbf{X}}^{T}(O)\hspace{2.5pt}\big|\hspace{2.5pt}\kappa (O)=k\big]\]

be the matrix of second moments of the regressors for subjects belonging to the kth component. Denote the variance of the kth component’s error term by

\[{\big({\sigma }^{(k)}\big)}^{2}=\operatorname{\mathsf{E}}\big[{\big(\varepsilon (O)\big)}^{2}\hspace{2.5pt}\big|\hspace{2.5pt}\kappa (O)=k\big].\]

(Recall that $\operatorname{\mathsf{E}}[(\varepsilon (O))\hspace{2.5pt}|\hspace{2.5pt}\kappa (O)=k]=0$). In what follows, we assume that these moments and variances exist for all components.

Theorem 1 (Consistency).

Assume that

1. ${\mathbf{D}}^{(k)}$ and ${({\sigma }^{(k)})}^{2}$ are finite for all $k=1,\dots ,M$.
2. ${\mathbf{D}}^{(m)}$ is nonsingular.
3. There exists $C>0$ such that $\det \boldsymbol{\varGamma }_{N}>C$ for all N large enough.

Then ${\hat{\mathbf{b}}_{;N}^{(m)}}\stackrel{\mathrm{P}}{\longrightarrow }{\mathbf{b}}^{(m)}$ as $N\to \infty $.

Note. Assumption 3 can be weakened. Applying Theorem 4.2 from [8], we can show that ${\hat{\mathbf{b}}_{;N}^{(m)}}$ is consistent if the vector ${\mathbf{p}_{N}^{m}}$ is asymptotically linearly independent from the vectors ${\mathbf{p}_{N}^{i}}$, $i\ne m$, as $N\to \infty $. To avoid complexities in this presentation, we do not formulate the strict meaning of this statement.

Denote ${D}^{ik(s)}\stackrel{\mathrm{def}}{=}\operatorname{\mathsf{E}}\big[{X}^{i}(O){X}^{k}(O)\hspace{2.5pt}\big|\hspace{2.5pt}\kappa (O)=s\big]$,

\[\begin{array}{r@{\hskip10.0pt}c}& \displaystyle {\mathbf{L}}^{ik(s)}\stackrel{\mathrm{def}}{=}{\big(\operatorname{\mathsf{E}}\big[{X}^{i}(O){X}^{k}(O){X}^{q}(O){X}^{l}(O)\hspace{2.5pt}\big|\hspace{2.5pt}\kappa (O)=s\big]\big)_{l,q=1}^{d}},\\{} & \displaystyle {\mathbf{M}}^{ik(s,p)}\stackrel{\mathrm{def}}{=}{\big({D}^{il(s)}{D}^{kq(p)}\big)_{l,q=1}^{d}}.\end{array}\]

Theorem 2 (Asymptotic normality).

Assume that

1. $\operatorname{\mathsf{E}}[{({X}^{i}(O))}^{4}\hspace{2.5pt}|\hspace{2.5pt}\kappa (O)=k]<\infty $ and $\operatorname{\mathsf{E}}[{(\varepsilon (O))}^{4}\hspace{2.5pt}|\hspace{2.5pt}\kappa (O)=k]<\infty $ for all $k=1,\dots ,M$.
2. Matrix $\mathbf{D}={\mathbf{D}}^{(m)}$ is nonsingular.
3. There exists $C>0$ such that $\det \boldsymbol{\varGamma }_{N}>C$ for all N large enough.
4. For all $s,p=1,\dots ,M$, there exist $\langle {({\mathbf{a}}^{m})}^{2}{\mathbf{p}}^{s}{\mathbf{p}}^{p}\rangle $.

Then $\sqrt{N}({\hat{\mathbf{b}}_{;N}^{(m)}}-{\mathbf{b}}^{(m)})\stackrel{\mathrm{W}}{\longrightarrow }N(0,\mathbf{V})$, where

(6)

\[\mathbf{V}\stackrel{\mathrm{def}}{=}{\mathbf{D}}^{-1}\boldsymbol{\varSigma }{\mathbf{D}}^{-1}\]

with

(7)

\[\begin{array}{r@{\hskip10.0pt}c}& \displaystyle \boldsymbol{\varSigma }={\big({\varSigma }^{ik}\big)_{ik=1}^{d}},\\{} & \displaystyle {\varSigma }^{ik}=\sum \limits_{s=1}^{M}\big\langle {\big({\mathbf{a}}^{m}\big)}^{2}{\mathbf{p}}^{s}\big\rangle \big({D}^{ik(s)}{\big({\sigma }^{(s)}\big)}^{2}+{\big({\mathbf{b}}^{(s)}-{\mathbf{b}}^{(m)}\big)}^{T}{\mathbf{L}}^{ik(s)}\big({\mathbf{b}}^{(s)}-{\mathbf{b}}^{(m)}\big)\big)\hspace{-6.0pt}\\{} & \displaystyle \hspace{2em}-\sum \limits_{s=1}^{M}\sum \limits_{p=1}^{M}\big\langle {\big({\mathbf{a}}^{m}\big)}^{2}{\mathbf{p}}^{s}{\mathbf{p}}^{p}\big\rangle {\big({\mathbf{b}}^{(s)}-{\mathbf{b}}^{(m)}\big)}^{T}{\mathbf{M}}^{ik(s,p)}\big({\mathbf{b}}^{(p)}-{\mathbf{b}}^{(m)}\big).\end{array}\]

4 Proofs

Proof of Theorem 1.

Note that if ${\mathbf{D}}^{(m)}$ is nonsingular, then

\[{\mathbf{b}}^{(m)}=\operatorname{argmin}_{\mathbf{b}\in {\mathbb{R}}^{d}}J_{m;\infty }(\mathbf{b})={\big({\mathbf{D}}^{(m)}\big)}^{-1}\operatorname{\mathsf{E}}\big[\big(Y(O)\mathbf{X}(O)\big)\hspace{2.5pt}\big|\hspace{2.5pt}\kappa (O)=m\big].\]

By Lemma 1,

\[{\mathbf{X}}^{T}\mathbf{A}\mathbf{X}=\frac{1}{N}\sum \limits_{j=1}^{N}{a_{j;N}^{m}}\mathbf{X}(O_{j}){\mathbf{X}}^{T}(O_{j})\stackrel{\mathrm{P}}{\longrightarrow }{\mathbf{D}}^{(m)}\]

and

\[{\mathbf{X}}^{T}\mathbf{A}\mathbf{Y}=\frac{1}{N}\sum \limits_{j=1}^{N}{a_{j;N}^{m}}Y(O_{j})\mathbf{X}(O_{j})\stackrel{\mathrm{P}}{\longrightarrow }\operatorname{\mathsf{E}}\big[\big(Y(O)\mathbf{X}(O)\big)\hspace{2.5pt}\big|\hspace{2.5pt}\kappa (O)=m\big]\]

as $N\to \infty $. This implies the statement of the theorem. □

Proof of Theorem 2.

Let us introduce a set of random vectors ${\xi _{j}^{(k)}}={({Y_{j}^{(k)}},{X_{j}^{1(k)}},\dots ,{X_{j}^{d(k)}})}^{T}$, $j=1,2,\dots \hspace{0.1667em}$, with distributions $F_{k}$ that are independent for different j and k and independent from $\kappa _{j}$. Denote ${\delta _{j}^{(k)}}=\mathbb{1}\{\kappa _{j}=k\}$,

\[{\xi ^{\prime }_{j}}\stackrel{\mathrm{def}}{=}\sum \limits_{k=1}^{M}{\delta _{j}^{(k)}}{\xi _{j}^{(k)}}.\]

Then the distribution of ${\varXi ^{\prime }_{N}}=({\xi ^{\prime }_{1}},\dots ,{\xi ^{\prime }_{N}})$ is the same as that of $\varXi _{N}$. Since in this theorem we are interested in weak convergence only, without loss of generality, let us assume that $\varXi _{N}={\varXi ^{\prime }_{N}}$ . By $\mathfrak{F}$ we denote the sigma-algebra generated by ${\xi _{j}^{(k)}}$, $j=1,\dots ,N$, $k=1,\dots ,M$.

Let us show that $\sqrt{N}({\hat{\mathbf{b}}_{;N}^{(m)}}-{\mathbf{b}}^{(m)})$ converges weakly to $N(0,\mathbf{V})$. It is readily seen that

\[\sqrt{N}\big({\hat{\mathbf{b}}_{;N}^{(m)}}-{\mathbf{b}}^{(m)}\big)={\bigg[\frac{1}{N}\big({\mathbf{X}}^{T}\mathbf{A}\mathbf{X}\big)\bigg]}^{-1}\bigg[\frac{1}{\sqrt{N}}\big({\mathbf{X}}^{T}\mathbf{A}\mathbf{Y}-{\mathbf{X}}^{T}\mathbf{A}\mathbf{X}{\mathbf{b}}^{(m)}\big)\bigg].\]

Since $\frac{1}{N}({\mathbf{X}}^{T}\mathbf{A}\mathbf{X})\stackrel{\mathrm{P}}{\longrightarrow }\mathbf{D}$, we need only to show week convergence of the random vectors $\frac{1}{\sqrt{N}}({\mathbf{X}}^{T}\mathbf{A}\mathbf{Y}-{\mathbf{X}}^{T}\mathbf{A}\mathbf{X}{\mathbf{b}}^{(m)})$ to $N(0,\boldsymbol{\varSigma })$.

Denote

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle g(\xi _{j})\stackrel{\mathrm{def}}{=}{\Bigg({X_{j}^{i}}\Bigg(Y_{j}-\sum \limits_{l=1}^{d}{X_{j}^{l}}{b_{l}^{(m)}}\Bigg)\Bigg)_{i=1}^{d}},\\{} & \displaystyle {\gamma _{j}^{i}}\stackrel{\mathrm{def}}{=}\frac{1}{\sqrt{N}}{a_{j;N}^{(m)}}{X_{j}^{i}}\Bigg(Y_{j}-\sum \limits_{l=1}^{d}{X_{j}^{l}}{b_{l}^{(m)}}\Bigg).\end{array}\]

Obviously,

\[\zeta _{N}\stackrel{\mathrm{def}}{=}\frac{1}{\sqrt{N}}\big({\mathbf{X}}^{T}\mathbf{A}\mathbf{Y}-{\mathbf{X}}^{T}\mathbf{A}\mathbf{X}{\mathbf{b}}^{(m)}\big)=\sqrt{N}{\hat{g}_{;N}^{(m)}}={\Bigg(\sum \limits_{j=1}^{N}{\gamma _{j}^{i}}\Bigg)_{i=1}^{d}}.\]

We will apply Lemma 2 to show that $\sqrt{N}{\hat{g}_{;N}^{(m)}}\stackrel{\mathrm{W}}{\longrightarrow }N(0,\boldsymbol{\varSigma })$.

First, let us show that ${\bar{g}}^{(m)}=\operatorname{\mathsf{E}}{\hat{g}_{;N}^{(m)}}=0$. It is equivalent to $\operatorname{\mathsf{E}}{\sum _{j=1}^{N}}{\gamma _{j}^{i}}=0$ for all $i=1,\dots ,d$. In fact,

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \operatorname{\mathsf{E}}\sum \limits_{j=1}^{N}{\gamma _{j}^{i}}\\{} & \displaystyle \hspace{1em}=\operatorname{\mathsf{E}}\Bigg[\operatorname{\mathsf{E}}\frac{1}{\sqrt{N}}\sum \limits_{j=1}^{N}{a_{j;N}^{(m)}}\sum \limits_{s=1}^{M}{\delta _{j}^{(s)}}{X_{j}^{i(s)}}\Bigg(\sum \limits_{s=1}^{M}{\delta _{j}^{(s)}}Y_{j}(s)-\sum \limits_{l=1}^{d}\sum \limits_{s=1}^{M}{\delta _{j}^{(s)}}{X_{j}^{l(s)}}{b_{l}^{(m)}}\Bigg)\Bigg|\mathfrak{F}\Bigg]\\{} & \displaystyle \hspace{1em}=\frac{1}{\sqrt{N}}\operatorname{\mathsf{E}}\sum \limits_{j=1}^{N}{a_{j;N}^{(m)}}\sum \limits_{s=1}^{M}{p_{j;N}^{s}}{X_{j}^{i(s)}}\Bigg({Y_{j}^{(s)}}-\sum \limits_{l=1}^{d}{X_{j}^{l(s)}}{b_{l}^{(m)}}\Bigg)\\{} & \displaystyle \hspace{1em}=\sqrt{N}\sum \limits_{s=1}^{M}\underset{\mathbb{1}\{m=s\}}{\underbrace{\big\langle {a}^{(m)}{\mathbf{p}}^{s}\big\rangle _{N}}}\operatorname{\mathsf{E}}\Bigg[{X_{1}^{i(s)}}\Bigg({Y_{1}^{(s)}}-\sum \limits_{l=1}^{d}{X_{1}^{l(s)}}{b_{l}^{(m)}}\Bigg)\Bigg]\\{} & \displaystyle \hspace{1em}=\sqrt{N}\operatorname{\mathsf{E}}\Bigg[{X_{1}^{i(m)}}\underset{{\varepsilon _{1}^{(m)}}}{\underbrace{\Bigg({Y_{1}^{(m)}}-\sum \limits_{l=1}^{d}{X_{1}^{l(m)}}{b_{l}^{(m)}}\Bigg)}}\Bigg]=\sqrt{N}\operatorname{\mathsf{E}}{X_{1}^{i(m)}}\operatorname{\mathsf{E}}{\varepsilon _{1}^{(m)}}=0.\end{array}\]

\[\zeta _{N}={\Bigg(\sum \limits_{j=1}^{N}{\gamma _{j}^{i}}\Bigg)_{i=1}^{d}}={\Bigg(\sum \limits_{j=1}^{N}{\gamma _{j}^{i}}-\operatorname{\mathsf{E}}{\gamma _{j}^{i}}\Bigg)_{i=1}^{d}}.\]

In view of Lemma 2, to complete the proof, we only need to show that$\operatorname{Cov}(\zeta _{N})\to \boldsymbol{\varSigma }$.

Denote

\[{\zeta _{j}^{i(s)}}={\delta _{j}^{(s)}}{X_{j}^{i(s)}}\Bigg(\sum \limits_{l=1}^{d}{X_{j}^{l(s)}}\big({b_{l}^{(s)}}-{b_{l}^{(m)}}\big)+{\varepsilon _{j}^{(s)}}\Bigg)\]

and

\[{\eta _{j}^{i(s)}}={\delta _{j}^{(s)}}\Bigg(\sum \limits_{l=1}^{d}{D}^{il(s)}\big({b_{l}^{(s)}}-{b_{l}^{(m)}}\big)\Bigg)\]

Then

\[\zeta _{N}={\Bigg(\frac{1}{\sqrt{N}}\sum \limits_{j=1}^{N}{a_{j}^{(m)}}\sum \limits_{s=1}^{M}\big(\big({\zeta _{j}^{i(s)}}-{\eta _{j}^{i(s)}}\big)+\big({\eta _{j}^{i(s)}}-\operatorname{\mathsf{E}}{\zeta _{j}^{i(s)}}\big)\big)\Bigg)_{i=1}^{d}}={\big({S_{1}^{i}}+{S_{2}^{i}}\big)_{i=1}^{d}},\]

where

\[{S_{1}^{i}}=\frac{1}{\sqrt{N}}\sum \limits_{j=1}^{N}{a_{j}^{(m)}}\sum \limits_{s=1}^{M}\big({\zeta _{j}^{i(s)}}-{\eta _{j}^{i(s)}}\big),\hspace{2em}{S_{2}^{i}}=\frac{1}{\sqrt{N}}\sum \limits_{j=1}^{N}{a_{j}^{(m)}}\sum \limits_{s=1}^{M}\big({\eta _{j}^{i(s)}}-\operatorname{\mathsf{E}}{\zeta _{j}^{i(s)}}\big).\]

Note that

\[\operatorname{\mathsf{E}}\big({\zeta _{j}^{i(s)}}-{\eta _{j}^{i(s)}}\big)=\operatorname{\mathsf{E}}\operatorname{\mathsf{E}}\big[\big({\zeta _{j}^{i(s)}}-{\eta _{j}^{i(s)}}\big)\hspace{2.5pt}\big|\hspace{2.5pt}\kappa _{j}\big]=0.\]

Now

(8)

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \operatorname{Cov}\big({S_{1}^{i}},{S_{2}^{k}}\big)\\{} & \displaystyle \hspace{1em}=\frac{1}{N}\sum \limits_{j=1}^{N}{\big({a_{j;N}^{(m)}}\big)}^{2}\sum \limits_{s=1}^{M}\sum \limits_{p=1}^{M}\operatorname{\mathsf{E}}\big({\zeta _{j}^{i(s)}}-{\eta _{j}^{i(s)}}\big){\eta _{j}^{k(p)}}\\{} & \displaystyle \hspace{1em}=\frac{1}{N}\sum \limits_{j=1}^{N}{\big({a_{j;N}^{(m)}}\big)}^{2}\sum \limits_{s=1}^{M}\sum \limits_{p=1}^{M}\operatorname{\mathsf{E}}{\delta _{j}^{(s)}}\Bigg[\sum \limits_{l=1}^{d}\big({X_{j}^{i(s)}}{X_{j}^{l(s)}}-{D}^{il(s)}\big)\big({b_{l}^{(s)}}-{b_{l}^{(m)}}\big)\\{} & \displaystyle \hspace{2em}+{X_{j}^{i(s)}}{\varepsilon _{j}^{i(s)}}\Bigg]\cdot {\delta _{j}^{(p)}}\Bigg(\sum \limits_{q=1}^{d}{D}^{kq(p)}\big({b_{q}^{(p)}}-{b_{q}^{(m)}}\big)\Bigg)\\{} & \displaystyle \hspace{1em}=\frac{1}{N}\sum \limits_{j=1}^{N}{\big({a_{j}^{(m)}}\big)}^{2}\sum \limits_{s=1}^{M}{p_{j;N}^{s}}\Bigg(\sum \limits_{l=1}^{d}\big(\underset{0}{\underbrace{{D}^{il(s)}-{D}^{il(s)}}}\big)\big({b_{l}^{(s)}}-{b_{l}^{(m)}}\big)+\operatorname{\mathsf{E}}{X_{j}^{i(s)}}\underset{0}{\underbrace{\operatorname{\mathsf{E}}{\varepsilon _{j}^{(s)}}}}\Bigg)\\{} & \displaystyle \hspace{2em}\times \Bigg(\sum \limits_{q=1}^{d}{D}^{kq(s)}\big({b_{q}^{(s)}}-{b_{q}^{(m)}}\big)\Bigg)=0;\end{array}\]

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \operatorname{Cov}\big({S_{1}^{i}},{S_{1}^{k}}\big)& \displaystyle =\frac{1}{N}\sum \limits_{j=1}^{N}{\big({a_{j;N}^{(m)}}\big)}^{2}\sum \limits_{s=1}^{M}{p_{j;N}^{s}}\sum \limits_{q=1}^{d}\sum \limits_{l=1}^{d}\big({b_{l}^{(s)}}-{b_{l}^{(m)}}\big)\operatorname{\mathsf{E}}\big(\big({X_{j}^{i(s)}}{X_{j}^{l(s)}}-{D}^{il(s)}\big)\\{} & \displaystyle \hspace{1em}\times \big({X_{j}^{k(s)}}{X_{j}^{q(s)}}-{D}^{kq(s)}\big)\big)\big({b_{q}^{(s)}}-{b_{q}^{(m)}}\big)\\{} & \displaystyle \hspace{1em}+\frac{1}{N}\sum \limits_{j=1}^{N}{\big({a_{j;N}^{(m)}}\big)}^{2}\sum \limits_{s=1}^{M}{p_{j;N}^{s}}{D}^{ik(s)}){\big({\sigma }^{(s)}\big)}^{2};\\{} \displaystyle \operatorname{Cov}\big({S_{2}^{i}},{S_{2}^{k}}\big)& \displaystyle =\frac{1}{N}\sum \limits_{j=1}^{N}{\big({a_{j;N}^{(m)}}\big)}^{2}\sum \limits_{s=1}^{M}{p_{j;N}^{s}}\sum \limits_{q=1}^{d}\sum \limits_{l=1}^{d}\big({b_{l}^{(s)}}-{b_{l}^{(m)}}\big){D}^{il(s)}\\{} & \displaystyle \hspace{1em}\times \Bigg[\big({b_{q}^{(s)}}-{b_{q}^{(m)}}\big){D}^{kq(s)}-\sum \limits_{r=1}^{M}{p_{j;N}^{r}}\big({b_{q}^{(r)}}-{b_{q}^{(m)}}\big){D}^{kq(r)}\Bigg].\end{array}\]

Thus,

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \operatorname{Cov}\zeta _{N}& \displaystyle =\frac{1}{N}\sum \limits_{j=1}^{N}{\big({a_{j;N}^{(m)}}\big)}^{2}\sum \limits_{s=1}^{M}{p_{j;N}^{s}}{D}^{ik(s)}{\big({\sigma }^{(s)}\big)}^{2}\\{} & \displaystyle \hspace{1em}+\frac{1}{N}\sum \limits_{j=1}^{N}{\big({a_{j}^{(m)}}\big)}^{2}\sum \limits_{s=1}^{M}{p_{j;N}^{s}}\sum \limits_{q=1}^{d}\sum \limits_{l=1}^{d}\big({b_{l}^{(s)}}-{b_{l}^{(m)}}\big)\Bigg[\big({b_{q}^{(s)}}-{b_{q}^{(m)}}\big)\\{} & \displaystyle \hspace{1em}\times \underset{{L_{lq}^{ik(s)}}}{\underbrace{\operatorname{\mathsf{E}}{X_{j}^{i(s)}}{X_{j}^{k(s)}}{X_{j}^{l(s)}}{X_{j}^{q(s)}}}}-{D}^{il(s)}\sum \limits_{r=1}^{M}{p_{j;N}^{r}}\big({b_{q}^{(r)}}-{b_{q}^{(m)}}\big){D}^{kq(r)}\Bigg]{\Bigg)_{i,k=1}^{d}}.\end{array}\]

From the last equation we get $\operatorname{Cov}(\zeta _{N})\to \boldsymbol{\varSigma }$ as $N\to \infty $. □

5 Results of simulation

To assess the accuracy of the asymptotic results from Section 3, we performed a small simulation study. We considered a two-component mixture ($M=2$) with mixing probabilities ${p_{j;N}^{1}}=j/N$ and ${p_{j;N}^{2}}=1-{p_{j;N}^{1}}$. For each subject, there were two observed variables X and Y, which were simulated based on the simple linear regression model

\[Y_{j}={b_{0}^{\kappa _{j}}}+{b_{1}^{\kappa _{j}}}{X_{j}^{(\kappa _{j})}}+{\varepsilon _{j}^{(\kappa _{j})}},\]

where $\kappa _{i}$ is the number of component the jth observation belongs to, ${X_{j}^{(1)}}$ was simulated as $N(1,1)$, ${X_{j}^{(2)}}$ as $N(2,2.25)$, and ${\varepsilon _{j}^{(k)}}$ were zero-mean Gaussians with standard deviations 0.01 for the first component and 0.05 for the second one. The values of the regression coefficients were ${b_{0}^{1}}=3$, ${b_{1}^{1}}=0.5$, ${b_{0}^{2}}=-2$, ${b_{1}^{2}}=1$.

The means and covariances of the estimates were calculated over 2000 replications.

The results of simulation are presented in Table 1. The true values of parameters and asymptotic covariances are placed in the last rows of the tables.

Table 1.

Simulation results

First component

n	$\operatorname{\mathsf{E}}{\hat{b}_{0}^{1}}$	$\operatorname{\mathsf{E}}{\hat{b}_{1}^{1}}$	$N\operatorname{Var}{\hat{b}_{0}^{1}}$	$N\operatorname{Var}{\hat{b}_{1}^{1}}$	$N\operatorname{Cov}({\hat{b}_{0}^{1}},{\hat{b}_{1}^{1}})$
500	3.0014	0.5235	47.61	45.83	− 42.27
1000	3.0033	0.512	41.19	37.50	−35.04
2000	3.0011	0.5032	38.84	34.71	−32.87
5000	3.0003	0.5016	39.54	34.49	−32.83
∞	3	0.5	39.13	33.96	−32.53
Second component

n	$\operatorname{\mathsf{E}}{\hat{b}_{0}^{2}}$	$\operatorname{\mathsf{E}}{\hat{b}_{1}^{2}}$	$N\operatorname{Var}{\hat{b}_{0}^{2}}$	$N\operatorname{Var}{\hat{b}_{1}^{2}}$	$N\operatorname{Cov}({\hat{b}_{0}^{2}},{\hat{b}_{1}^{2}})$
500	−2.0243	1.0084	67.37	7.94	−22.17
1000	−42.0100	1.0027	63.04	7.57	−20.90
2000	−2.0039	1.0016	63.57	7.52	−20.95
5000	−2.0074	1.0025	62.41	7.32	−20.48
∞	−2	1	62.20	7.34	−20.47

The presented data show good concordance with the asymptotic theory for $n>1000$.

6 Conclusions

We considered a modification of least-squares estimators for linear regression coefficients in the case where observations are obtained from a mixture with varying concentrations. Conditions of consistency and asymptotic normality of the estimators were derived, and dispersion matrices were evaluated. The results of simulations confirm good concordance of estimators covariances with the asymptotic formulas for sample sizes larger then 1000 observations.

In real-life data analysis, concentrations (mixing probabilities) are usually not known exactly but estimated. So, to apply the proposed technique, we also need to analyze sensitivity of the estimates to perturbations of the concentrations model. (We are thankful to the unknown referee for this observation). It is worth noting that performance of these estimates will be poor if the true concentrations of the components are nearly linearly dependent ($\det \boldsymbol{\varGamma }_{N}\approx 0$). We also expect stability of the estimates w.r.t. concentration perturbations if $\det \boldsymbol{\varGamma }_{N}$ is bounded away from zero. More deep analysis of sensitivity will be a part of our further work.

Modern Stochastics: Theory and Applications

Abstract

1 Introduction

(1)

2 Nonparametric estimation for MVC

(2)

(3)

(4)

Lemma 1 (Consistency).

Lemma 2 (Asymptotic normality).

3 Estimate for $\mathbf{b}_{m}$ and its asymptotics

(5)

Theorem 1 (Consistency).

Theorem 2 (Asymptotic normality).

(6)

(7)

4 Proofs

Proof of Theorem 1.

Proof of Theorem 2.

(8)

5 Results of simulation

Table 1.

6 Conclusions

References

Table 1.

Theorem 1 (Consistency).

Theorem 2 (Asymptotic normality).

Authors

Abstract

1 Introduction

(1)

2 Nonparametric estimation for MVC

(2)

(3)

(4)

Lemma 1 (Consistency).

Lemma 2 (Asymptotic normality).

3 Estimate for $\mathbf{b}_{m}$ and its asymptotics

(5)

Theorem 1 (Consistency).

Theorem 2 (Asymptotic normality).

(6)

(7)

4 Proofs

Proof of Theorem 1.

Proof of Theorem 2.

(8)

5 Results of simulation

Table 1.

6 Conclusions

References

Export citation

Copy and paste formatted citation

Download citation in file

Table 1.

Theorem 1 (Consistency).

Theorem 2 (Asymptotic normality).