Modern Stochastics: Theory and Applications logo


  • Help
Login Register

  1. Home
  2. Issues
  3. Volume 2, Issue 4 (2015)
  4. Linear regression by observations from m ...

Modern Stochastics: Theory and Applications

Submit your article Information Become a Peer-reviewer
  • Article info
  • Full article
  • Related articles
  • Cited by
  • More
    Article info Full article Related articles Cited by

Linear regression by observations from mixture with varying concentrations
Volume 2, Issue 4 (2015), pp. 343–353
Daryna Liubashenko   Rostyslav Maiboroda  

Authors

 
Placeholder
https://doi.org/10.15559/15-VMSTA41
Pub. online: 4 December 2015      Type: Research Article      Open accessOpen Access

Received
20 October 2015
Revised
22 November 2015
Accepted
26 November 2015
Published
4 December 2015

Abstract

We consider a finite mixture model with varying mixing probabilities. Linear regression models are assumed for observed variables with coefficients depending on the mixture component the observed subject belongs to. A modification of the least-squares estimator is proposed for estimation of the regression coefficients. Consistency and asymptotic normality of the estimates is demonstrated.

1 Introduction

In this paper, we discuss a structural linear regression technique in the context of model of mixture with varying concentrations (MVC). MVC means that the observed subjects belong to M different subpopulations (mixture components). The true numbers of components to which the subjects $O_{j}$, $j=1,\dots ,N$, belong, say, $\kappa _{j}=\kappa (O_{j})$, are unknown, but we know the probabilities
\[{p_{j;N}^{k}}\stackrel{\mathrm{def}}{=}\operatorname{\mathsf{P}}\{\kappa _{j}=k\},\hspace{1em}k=1,\dots ,M\]
(mixing probabilities or concentrations of the mixture components). MVC models arise naturally in the description of medical, biologic, and sociologic data [1, 8, 9, 12]. They can be considered as a generalization of finite mixture models (FMM). Classical theory of FMMs can be found in monographs [10, 13].
Let
\[\xi (O)={\big(Y(O),{X}^{1}(O),\dots ,{X}^{d}(O)\big)}^{T}\]
be a vector of observed features (random variables) of a subject O. We consider the following linear regression model for these variables:
(1)
\[Y(O)=\sum \limits_{i=1}^{d}{b_{i}^{(\kappa (O))}}{X}^{i}(O)+\varepsilon (O),\]
where ${\mathbf{b}}^{(m)}={({b_{1}^{(m)}},\dots ,{b_{d}^{(m)}})}^{T}$ are nonrandom regression coefficients for the mth component, $\varepsilon (O)$ is an error term, which is assumed to be zero mean and conditionally independent of the regressors vector $\mathbf{X}(O)={({X}^{1}(O),\dots ,{X}^{d}(O))}^{T}$ given $\kappa (O)$.
Note. We consider a subject O as taken at random from an infinite population, so it is random in this sense. The vector of observed variables $\xi (O)$ can be considered as a random vector even for a fixed O.
Our aim is to estimate the vectors of regression coefficients ${\mathbf{b}}^{(k)}$, $k=1,\dots ,M$, by the observations $\varXi _{N}=(\xi _{1},\dots ,\xi _{N})$, where $\xi _{j}=\xi (O_{j})$. We assume that $(\kappa _{j},\xi _{j})$ are independent for different j.
A statistical model similar to MVC with (1) is considered in [5], where a parametric model for the conditional distributions of $\varepsilon (O)$ given $\kappa (O)$ is assumed. For this case, maximum likelihood estimation is proposed in [5], and a version of EM-algorithm is developed for numerical computation of the estimates.
In this paper, we adopt a nonparametric approach assuming no parametric models for $\varepsilon (O)$ and $\mathbf{X}(O)$ distributions. Nonparametric and semiparametric technique for MVC was developed in [6, 7, 4]. We use the weighted empirical moment technique to derive estimates for the regression coefficients and then obtain conditions of consistency and asymptotic normality of the estimates. These results are based on general ideas of least squares [11] and moment estimates [3].
The rest of the paper is organized as follows. In Section 2, we recall some results on nonparametric estimation of functional moments in general MVC. The estimates are introduced, and conditions of their consistency and asymptotic normality are presented in Section 3. Section 4 contains proofs of the statements of Section 3. Results of computer simulations are presented in Section 5.

2 Nonparametric estimation for MVC

Let us start with some notation and definitions. We denote by $F_{m}$ the distribution of $\xi (O)$ for O belonging to the mth component of the mixture, that is,
\[F_{m}(A)\stackrel{\mathrm{def}}{=}\operatorname{\mathsf{P}}\big\{\xi (O)\in A\hspace{2.5pt}|\hspace{2.5pt}\kappa (O)=m\big\}\]
for all measurable sets A. Then by the definition of MVC
(2)
\[\operatorname{\mathsf{P}}\{\xi _{j}\in A\}=\sum \limits_{k=1}^{M}{p_{j;N}^{k}}F_{k}(A).\]
In the asymptotic statements, we will consider the data $\varXi _{n}=(\xi _{1},\dots ,\xi _{N})$ as an element of (imaginary) series of data $\varXi _{1},\varXi _{2},\dots ,\varXi _{N},\dots \hspace{0.1667em}$ in which no link between observations for different N is assumed. So, in formal notation, it should be more correct to write $\xi _{j;N}$ instead of $\xi _{j}$, but we will drop the subscript N when it is insignificant.
We consider an array of all concentrations for all data sizes
\[\mathbf{p}\stackrel{\mathrm{def}}{=}\big({p_{j;N}^{k}},k=1,\dots ,M;\hspace{2.5pt}j=1,\dots ,N;\hspace{2.5pt}N=1,2,\dots \hspace{0.1667em}\big).\]
Its subarrays
\[{\mathbf{p}_{N}^{k}}\stackrel{\mathrm{def}}{=}{\big({p_{1;N}^{k}},\dots ,{p_{N;N}^{k}}\big)}^{T}\hspace{1em}\text{and}\hspace{1em}\mathbf{p}_{j;N}={\big({p_{j;N}^{1}},\dots ,{p_{j;N}^{M}}\big)}^{T}\]
are considered as vector columns, and
\[\mathbf{p}_{N}=\big({p_{j;N}^{k}},\hspace{2.5pt}j=1,\dots ,N;\hspace{2.5pt}k=1,\dots ,M\big)\]
as an $N\times M$-matrix. We will also consider a weight array a of the same structure as p with similar notation for its subarrays.
By the angle brackets with subscript n we denote the averaging by $j=1,\dots ,n$:
\[\big\langle {\mathbf{a}}^{k}\big\rangle _{N}\stackrel{\mathrm{def}}{=}\frac{1}{N}\sum \limits_{j=1}^{N}{a_{j;N}^{k}}.\]
Multiplication, summation, and other operations in the angle brackets are made elementwise:
\[\big\langle {\mathbf{a}}^{k}{\mathbf{p}}^{m}\big\rangle _{N}=\frac{1}{N}\sum \limits_{j=1}^{N}{a_{j;N}^{k}}{p_{j;N}^{m}};\hspace{2em}\big\langle {\big({\mathbf{a}}^{k}\big)}^{2}\big\rangle _{N}=\frac{1}{N}\sum \limits_{j=1}^{N}{\big({a_{j;N}^{k}}\big)}^{2}.\]
We define $\langle {\mathbf{p}}^{m}\rangle \stackrel{\mathrm{def}}{=}\lim _{N\to \infty }\langle {\mathbf{p}}^{m}\rangle _{N}$ if this limit exists. Let $\boldsymbol{\varGamma }_{N}={(\langle {\mathbf{p}}^{l}{\mathbf{p}}^{m}\rangle _{N})_{l,m=1}^{M}}=\frac{1}{N}{\mathbf{p}_{N}^{T}}\mathbf{p}_{N}$ be an $M\times M$ matrix, and $\gamma _{lm;N}$ be its $(l,m)$th minor. The matrix $\boldsymbol{\varGamma }_{N}$ can be considered as the Gramian matrix of vectors $({\mathbf{p}}^{1},\dots ,{\mathbf{p}}^{M})$ in the inner product $\langle {\mathbf{p}}^{i}{\mathbf{p}}^{k}\rangle _{N}$, so that it is nonsingular if these vectors are linearly independent.
In what follows, $\stackrel{\mathrm{P}}{\longrightarrow }$ means convergence in probability, and $\stackrel{\mathrm{W}}{\longrightarrow }$ means weak convergence.
Assume now that model (2) holds for the data $\varXi _{N}$. Then the distribution $F_{m}$ of the mth component can be estimated by the weighted empirical measure
(3)
\[\hat{F}_{m;N}(A)\stackrel{\mathrm{def}}{=}\frac{1}{N}\sum \limits_{j=1}^{N}{a_{j;N}^{m}}\mathbb{1}\{\xi _{j}\in A\},\]
where
(4)
\[{a_{j;N}^{m}}=\frac{1}{\det \boldsymbol{\varGamma }_{N}}\sum \limits_{k=1}^{M}{(-1)}^{k+m}\gamma _{mk;N}{p_{j;N}^{k}}.\]
It is shown in [8] that if $\boldsymbol{\varGamma }_{N}$ is nonsingular, then $\hat{F}_{m;N}$ is the minimax unbiased estimate for $F_{m}$. The consistency of $\hat{F}_{m;N}$ is demonstrated in [6] (see also [8]).
Consider now functional moment estimation based on weighted empirical moments. Let $g:{\mathbb{R}}^{d+1}\to {\mathbb{R}}^{k}$ be a measurable function. Then to estimate
\[{\bar{g}}^{(m)}=\operatorname{\mathsf{E}}\big[g\big(\xi (O)\big)\hspace{2.5pt}\big|\hspace{2.5pt}\kappa (O)=m\big]=\int g(x)F_{m}(dx),\]
we can use
\[{\hat{g}_{;N}^{(m)}}=\int g(x)\hat{F}_{m;N}(dx)=\frac{1}{N}\sum \limits_{j=1}^{N}{a_{j;N}^{m}}g(\xi _{j}).\]
Lemma 1 (Consistency).
Assume that
  • 1. $\operatorname{\mathsf{E}}[|g(\xi (O))|\hspace{2.5pt}|\hspace{2.5pt}\kappa (O)=k]<\infty $ for all $k=1,\dots ,M$.
  • 2. There exists $C>0$ such that $\det \boldsymbol{\varGamma }_{N}>C$ for all N large enough.
Then ${\hat{g}_{;N}^{(m)}}\stackrel{\mathrm{P}}{\longrightarrow }{\bar{g}}^{(m)}$ as $N\to \infty $.
This lemma is a simple corollary of Theorem 4.2 in [8]. (See also Theorem 3.1.1 in [7]).
Lemma 2 (Asymptotic normality).
Assume that
  • 1. $\operatorname{\mathsf{E}}[|g(\xi (O)){|}^{2}\hspace{2.5pt}|\hspace{2.5pt}\kappa (O)=k]<\infty $ for all $k=1,\dots ,M$.
  • 2. There exists $C>0$ such that $\det \boldsymbol{\varGamma }_{N}>C$ for all N large enough.
  • 3. There exists the limit
    \[\boldsymbol{\varSigma }=\underset{N\to \infty }{\lim }N\operatorname{Cov}\big({\hat{g}_{;N}^{(m)}}\big).\]
Then
\[\sqrt{N}\big({\hat{g}_{;N}^{(m)}}-{\bar{g}}^{(m)}\big)\stackrel{\mathrm{W}}{\longrightarrow }N(0,\boldsymbol{\varSigma }).\]
For univariate ${\hat{g}_{;N}^{(m)}}$, the statement of the lemma is contained in Theorem 4.2 from [8] (or Theorem 3.1.2 in [7]). The multivariate case can be obtained from the univariate one applying the Cramér–Wold device (see [2], p. 382).

3 Estimate for $\mathbf{b}_{m}$ and its asymptotics

In view of Lemma 1, we expect that, under suitable assumptions,
\[\begin{array}{r@{\hskip0pt}l}\displaystyle J_{m;N}(\mathbf{b})& \displaystyle \stackrel{\mathrm{def}}{=}\frac{1}{N}\sum \limits_{j=1}^{N}{\mathbf{a}_{j;N}^{m}}{\Bigg(Y_{j;N}-\sum \limits_{i=1}^{d}b_{i}{X_{j:N}^{i}}\Bigg)}^{2}\\{} & \displaystyle =\int {\Bigg(y-\sum \limits_{i=1}^{d}b_{i}{x}^{i}\Bigg)}^{2}\hat{F}_{m;N}\big(dy,d{x}^{1},\dots ,d{x}^{d}\big)\end{array}\]
converges to
\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \int {\Bigg(y-\sum \limits_{i=1}^{d}b_{i}{x}^{i}\Bigg)}^{2}F_{m}\big(dy,d{x}^{1},\dots ,d{x}^{d}\big)\\{} & \displaystyle \hspace{1em}=\operatorname{\mathsf{E}}\Bigg[{\Bigg(Y(O)-\sum \limits_{i=1}^{d}b_{i}{X}^{i}(O)\Bigg)}^{2}\hspace{2.5pt}\Bigg|\hspace{2.5pt}\kappa (O)=m\Bigg]\stackrel{\mathrm{def}}{=}J_{m;\infty }(\mathbf{b})\end{array}\]
as $N\to \infty $.
Since $J_{m;\infty }(\mathbf{b})$ attains its minimal value at ${\mathbf{b}}^{(m)}$, it is natural to suggest the argmin of $J_{m;N}(\mathbf{b})$ as an estimate for ${\mathbf{b}}^{(m)}$. If the weights ${\mathbf{a}}^{m}$ were positive, then this argmin would be
(5)
\[{\hat{\mathbf{b}}_{;N}^{(m)}}\stackrel{\mathrm{def}}{=}{\big({\mathbf{X}}^{T}\mathbf{A}\mathbf{X}\big)}^{-1}{\mathbf{X}}^{T}\mathbf{A}\mathbf{Y},\]
where $\mathbf{X}\stackrel{\mathrm{def}}{=}({X_{j}^{i}})_{j=1,\dots ,N;\hspace{2.5pt}i=1,\dots ,d}$ is the $N\times d$ matrix of observed regressors, $\mathbf{Y}\stackrel{\mathrm{def}}{=}{(Y_{1},\dots ,Y_{N})}^{T}$ is the vector of observed responses, and $\mathbf{A}\stackrel{\mathrm{def}}{=}\operatorname{diag}({a_{1;N}^{m}},\dots ,{a_{N;N}^{m}})$ is the diagonal weight matrix for estimation of mth component. (Obviously, A depends on m, but we do not show it explicitly by a subscript since the number m of the component for which $\mathbf{b}_{m}$ is estimated will be further fixed.)
Generally speaking, by (4) ${a_{j;N}^{m}}$ must be negative for some j, so ${\hat{\mathbf{b}}_{;N}^{(m)}}$ is not necessarily an argmin of $\hat{J}_{m;N}(\mathbf{b})$. But we will take ${\hat{\mathbf{b}}_{;N}^{(m)}}$ as an estimate for $\mathbf{b}_{m}$ and call it a modified least-squares estimate for $\mathbf{b}_{m}$ in MVC model (MVC-LS estimate).
Let
\[{\mathbf{D}}^{(k)}\stackrel{\mathrm{def}}{=}\operatorname{\mathsf{E}}\big[\mathbf{X}(O){\mathbf{X}}^{T}(O)\hspace{2.5pt}\big|\hspace{2.5pt}\kappa (O)=k\big]\]
be the matrix of second moments of the regressors for subjects belonging to the kth component. Denote the variance of the kth component’s error term by
\[{\big({\sigma }^{(k)}\big)}^{2}=\operatorname{\mathsf{E}}\big[{\big(\varepsilon (O)\big)}^{2}\hspace{2.5pt}\big|\hspace{2.5pt}\kappa (O)=k\big].\]
(Recall that $\operatorname{\mathsf{E}}[(\varepsilon (O))\hspace{2.5pt}|\hspace{2.5pt}\kappa (O)=k]=0$). In what follows, we assume that these moments and variances exist for all components.
Theorem 1 (Consistency).
Assume that
  • 1. ${\mathbf{D}}^{(k)}$ and ${({\sigma }^{(k)})}^{2}$ are finite for all $k=1,\dots ,M$.
  • 2. ${\mathbf{D}}^{(m)}$ is nonsingular.
  • 3. There exists $C>0$ such that $\det \boldsymbol{\varGamma }_{N}>C$ for all N large enough.
Then ${\hat{\mathbf{b}}_{;N}^{(m)}}\stackrel{\mathrm{P}}{\longrightarrow }{\mathbf{b}}^{(m)}$ as $N\to \infty $.
Note. Assumption 3 can be weakened. Applying Theorem 4.2 from [8], we can show that ${\hat{\mathbf{b}}_{;N}^{(m)}}$ is consistent if the vector ${\mathbf{p}_{N}^{m}}$ is asymptotically linearly independent from the vectors ${\mathbf{p}_{N}^{i}}$, $i\ne m$, as $N\to \infty $. To avoid complexities in this presentation, we do not formulate the strict meaning of this statement.
Denote ${D}^{ik(s)}\stackrel{\mathrm{def}}{=}\operatorname{\mathsf{E}}\big[{X}^{i}(O){X}^{k}(O)\hspace{2.5pt}\big|\hspace{2.5pt}\kappa (O)=s\big]$,
\[\begin{array}{r@{\hskip10.0pt}c}& \displaystyle {\mathbf{L}}^{ik(s)}\stackrel{\mathrm{def}}{=}{\big(\operatorname{\mathsf{E}}\big[{X}^{i}(O){X}^{k}(O){X}^{q}(O){X}^{l}(O)\hspace{2.5pt}\big|\hspace{2.5pt}\kappa (O)=s\big]\big)_{l,q=1}^{d}},\\{} & \displaystyle {\mathbf{M}}^{ik(s,p)}\stackrel{\mathrm{def}}{=}{\big({D}^{il(s)}{D}^{kq(p)}\big)_{l,q=1}^{d}}.\end{array}\]
Theorem 2 (Asymptotic normality).
Assume that
  • 1. $\operatorname{\mathsf{E}}[{({X}^{i}(O))}^{4}\hspace{2.5pt}|\hspace{2.5pt}\kappa (O)=k]<\infty $ and $\operatorname{\mathsf{E}}[{(\varepsilon (O))}^{4}\hspace{2.5pt}|\hspace{2.5pt}\kappa (O)=k]<\infty $ for all $k=1,\dots ,M$.
  • 2. Matrix $\mathbf{D}={\mathbf{D}}^{(m)}$ is nonsingular.
  • 3. There exists $C>0$ such that $\det \boldsymbol{\varGamma }_{N}>C$ for all N large enough.
  • 4. For all $s,p=1,\dots ,M$, there exist $\langle {({\mathbf{a}}^{m})}^{2}{\mathbf{p}}^{s}{\mathbf{p}}^{p}\rangle $.
Then $\sqrt{N}({\hat{\mathbf{b}}_{;N}^{(m)}}-{\mathbf{b}}^{(m)})\stackrel{\mathrm{W}}{\longrightarrow }N(0,\mathbf{V})$, where
(6)
\[\mathbf{V}\stackrel{\mathrm{def}}{=}{\mathbf{D}}^{-1}\boldsymbol{\varSigma }{\mathbf{D}}^{-1}\]
with
(7)
\[\begin{array}{r@{\hskip10.0pt}c}& \displaystyle \boldsymbol{\varSigma }={\big({\varSigma }^{ik}\big)_{ik=1}^{d}},\\{} & \displaystyle {\varSigma }^{ik}=\sum \limits_{s=1}^{M}\big\langle {\big({\mathbf{a}}^{m}\big)}^{2}{\mathbf{p}}^{s}\big\rangle \big({D}^{ik(s)}{\big({\sigma }^{(s)}\big)}^{2}+{\big({\mathbf{b}}^{(s)}-{\mathbf{b}}^{(m)}\big)}^{T}{\mathbf{L}}^{ik(s)}\big({\mathbf{b}}^{(s)}-{\mathbf{b}}^{(m)}\big)\big)\hspace{-6.0pt}\\{} & \displaystyle \hspace{2em}-\sum \limits_{s=1}^{M}\sum \limits_{p=1}^{M}\big\langle {\big({\mathbf{a}}^{m}\big)}^{2}{\mathbf{p}}^{s}{\mathbf{p}}^{p}\big\rangle {\big({\mathbf{b}}^{(s)}-{\mathbf{b}}^{(m)}\big)}^{T}{\mathbf{M}}^{ik(s,p)}\big({\mathbf{b}}^{(p)}-{\mathbf{b}}^{(m)}\big).\end{array}\]

4 Proofs

Proof of Theorem 1.
Note that if ${\mathbf{D}}^{(m)}$ is nonsingular, then
\[{\mathbf{b}}^{(m)}=\operatorname{argmin}_{\mathbf{b}\in {\mathbb{R}}^{d}}J_{m;\infty }(\mathbf{b})={\big({\mathbf{D}}^{(m)}\big)}^{-1}\operatorname{\mathsf{E}}\big[\big(Y(O)\mathbf{X}(O)\big)\hspace{2.5pt}\big|\hspace{2.5pt}\kappa (O)=m\big].\]
By Lemma 1,
\[{\mathbf{X}}^{T}\mathbf{A}\mathbf{X}=\frac{1}{N}\sum \limits_{j=1}^{N}{a_{j;N}^{m}}\mathbf{X}(O_{j}){\mathbf{X}}^{T}(O_{j})\stackrel{\mathrm{P}}{\longrightarrow }{\mathbf{D}}^{(m)}\]
and
\[{\mathbf{X}}^{T}\mathbf{A}\mathbf{Y}=\frac{1}{N}\sum \limits_{j=1}^{N}{a_{j;N}^{m}}Y(O_{j})\mathbf{X}(O_{j})\stackrel{\mathrm{P}}{\longrightarrow }\operatorname{\mathsf{E}}\big[\big(Y(O)\mathbf{X}(O)\big)\hspace{2.5pt}\big|\hspace{2.5pt}\kappa (O)=m\big]\]
as $N\to \infty $. This implies the statement of the theorem.  □
Proof of Theorem 2.
Let us introduce a set of random vectors ${\xi _{j}^{(k)}}={({Y_{j}^{(k)}},{X_{j}^{1(k)}},\dots ,{X_{j}^{d(k)}})}^{T}$, $j=1,2,\dots \hspace{0.1667em}$, with distributions $F_{k}$ that are independent for different j and k and independent from $\kappa _{j}$. Denote ${\delta _{j}^{(k)}}=\mathbb{1}\{\kappa _{j}=k\}$,
\[{\xi ^{\prime }_{j}}\stackrel{\mathrm{def}}{=}\sum \limits_{k=1}^{M}{\delta _{j}^{(k)}}{\xi _{j}^{(k)}}.\]
Then the distribution of ${\varXi ^{\prime }_{N}}=({\xi ^{\prime }_{1}},\dots ,{\xi ^{\prime }_{N}})$ is the same as that of $\varXi _{N}$. Since in this theorem we are interested in weak convergence only, without loss of generality, let us assume that $\varXi _{N}={\varXi ^{\prime }_{N}}$ . By $\mathfrak{F}$ we denote the sigma-algebra generated by ${\xi _{j}^{(k)}}$, $j=1,\dots ,N$, $k=1,\dots ,M$.
Let us show that $\sqrt{N}({\hat{\mathbf{b}}_{;N}^{(m)}}-{\mathbf{b}}^{(m)})$ converges weakly to $N(0,\mathbf{V})$. It is readily seen that
\[\sqrt{N}\big({\hat{\mathbf{b}}_{;N}^{(m)}}-{\mathbf{b}}^{(m)}\big)={\bigg[\frac{1}{N}\big({\mathbf{X}}^{T}\mathbf{A}\mathbf{X}\big)\bigg]}^{-1}\bigg[\frac{1}{\sqrt{N}}\big({\mathbf{X}}^{T}\mathbf{A}\mathbf{Y}-{\mathbf{X}}^{T}\mathbf{A}\mathbf{X}{\mathbf{b}}^{(m)}\big)\bigg].\]
Since $\frac{1}{N}({\mathbf{X}}^{T}\mathbf{A}\mathbf{X})\stackrel{\mathrm{P}}{\longrightarrow }\mathbf{D}$, we need only to show week convergence of the random vectors $\frac{1}{\sqrt{N}}({\mathbf{X}}^{T}\mathbf{A}\mathbf{Y}-{\mathbf{X}}^{T}\mathbf{A}\mathbf{X}{\mathbf{b}}^{(m)})$ to $N(0,\boldsymbol{\varSigma })$.
Denote
\[\begin{array}{r@{\hskip0pt}l}& \displaystyle g(\xi _{j})\stackrel{\mathrm{def}}{=}{\Bigg({X_{j}^{i}}\Bigg(Y_{j}-\sum \limits_{l=1}^{d}{X_{j}^{l}}{b_{l}^{(m)}}\Bigg)\Bigg)_{i=1}^{d}},\\{} & \displaystyle {\gamma _{j}^{i}}\stackrel{\mathrm{def}}{=}\frac{1}{\sqrt{N}}{a_{j;N}^{(m)}}{X_{j}^{i}}\Bigg(Y_{j}-\sum \limits_{l=1}^{d}{X_{j}^{l}}{b_{l}^{(m)}}\Bigg).\end{array}\]
Obviously,
\[\zeta _{N}\stackrel{\mathrm{def}}{=}\frac{1}{\sqrt{N}}\big({\mathbf{X}}^{T}\mathbf{A}\mathbf{Y}-{\mathbf{X}}^{T}\mathbf{A}\mathbf{X}{\mathbf{b}}^{(m)}\big)=\sqrt{N}{\hat{g}_{;N}^{(m)}}={\Bigg(\sum \limits_{j=1}^{N}{\gamma _{j}^{i}}\Bigg)_{i=1}^{d}}.\]
We will apply Lemma 2 to show that $\sqrt{N}{\hat{g}_{;N}^{(m)}}\stackrel{\mathrm{W}}{\longrightarrow }N(0,\boldsymbol{\varSigma })$.
First, let us show that ${\bar{g}}^{(m)}=\operatorname{\mathsf{E}}{\hat{g}_{;N}^{(m)}}=0$. It is equivalent to $\operatorname{\mathsf{E}}{\sum _{j=1}^{N}}{\gamma _{j}^{i}}=0$ for all $i=1,\dots ,d$. In fact,
\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \operatorname{\mathsf{E}}\sum \limits_{j=1}^{N}{\gamma _{j}^{i}}\\{} & \displaystyle \hspace{1em}=\operatorname{\mathsf{E}}\Bigg[\operatorname{\mathsf{E}}\frac{1}{\sqrt{N}}\sum \limits_{j=1}^{N}{a_{j;N}^{(m)}}\sum \limits_{s=1}^{M}{\delta _{j}^{(s)}}{X_{j}^{i(s)}}\Bigg(\sum \limits_{s=1}^{M}{\delta _{j}^{(s)}}Y_{j}(s)-\sum \limits_{l=1}^{d}\sum \limits_{s=1}^{M}{\delta _{j}^{(s)}}{X_{j}^{l(s)}}{b_{l}^{(m)}}\Bigg)\Bigg|\mathfrak{F}\Bigg]\\{} & \displaystyle \hspace{1em}=\frac{1}{\sqrt{N}}\operatorname{\mathsf{E}}\sum \limits_{j=1}^{N}{a_{j;N}^{(m)}}\sum \limits_{s=1}^{M}{p_{j;N}^{s}}{X_{j}^{i(s)}}\Bigg({Y_{j}^{(s)}}-\sum \limits_{l=1}^{d}{X_{j}^{l(s)}}{b_{l}^{(m)}}\Bigg)\\{} & \displaystyle \hspace{1em}=\sqrt{N}\sum \limits_{s=1}^{M}\underset{\mathbb{1}\{m=s\}}{\underbrace{\big\langle {a}^{(m)}{\mathbf{p}}^{s}\big\rangle _{N}}}\operatorname{\mathsf{E}}\Bigg[{X_{1}^{i(s)}}\Bigg({Y_{1}^{(s)}}-\sum \limits_{l=1}^{d}{X_{1}^{l(s)}}{b_{l}^{(m)}}\Bigg)\Bigg]\\{} & \displaystyle \hspace{1em}=\sqrt{N}\operatorname{\mathsf{E}}\Bigg[{X_{1}^{i(m)}}\underset{{\varepsilon _{1}^{(m)}}}{\underbrace{\Bigg({Y_{1}^{(m)}}-\sum \limits_{l=1}^{d}{X_{1}^{l(m)}}{b_{l}^{(m)}}\Bigg)}}\Bigg]=\sqrt{N}\operatorname{\mathsf{E}}{X_{1}^{i(m)}}\operatorname{\mathsf{E}}{\varepsilon _{1}^{(m)}}=0.\end{array}\]
So
\[\zeta _{N}={\Bigg(\sum \limits_{j=1}^{N}{\gamma _{j}^{i}}\Bigg)_{i=1}^{d}}={\Bigg(\sum \limits_{j=1}^{N}{\gamma _{j}^{i}}-\operatorname{\mathsf{E}}{\gamma _{j}^{i}}\Bigg)_{i=1}^{d}}.\]
In view of Lemma 2, to complete the proof, we only need to show that$\operatorname{Cov}(\zeta _{N})\to \boldsymbol{\varSigma }$.
Denote
\[{\zeta _{j}^{i(s)}}={\delta _{j}^{(s)}}{X_{j}^{i(s)}}\Bigg(\sum \limits_{l=1}^{d}{X_{j}^{l(s)}}\big({b_{l}^{(s)}}-{b_{l}^{(m)}}\big)+{\varepsilon _{j}^{(s)}}\Bigg)\]
and
\[{\eta _{j}^{i(s)}}={\delta _{j}^{(s)}}\Bigg(\sum \limits_{l=1}^{d}{D}^{il(s)}\big({b_{l}^{(s)}}-{b_{l}^{(m)}}\big)\Bigg)\]
Then
\[\zeta _{N}={\Bigg(\frac{1}{\sqrt{N}}\sum \limits_{j=1}^{N}{a_{j}^{(m)}}\sum \limits_{s=1}^{M}\big(\big({\zeta _{j}^{i(s)}}-{\eta _{j}^{i(s)}}\big)+\big({\eta _{j}^{i(s)}}-\operatorname{\mathsf{E}}{\zeta _{j}^{i(s)}}\big)\big)\Bigg)_{i=1}^{d}}={\big({S_{1}^{i}}+{S_{2}^{i}}\big)_{i=1}^{d}},\]
where
\[{S_{1}^{i}}=\frac{1}{\sqrt{N}}\sum \limits_{j=1}^{N}{a_{j}^{(m)}}\sum \limits_{s=1}^{M}\big({\zeta _{j}^{i(s)}}-{\eta _{j}^{i(s)}}\big),\hspace{2em}{S_{2}^{i}}=\frac{1}{\sqrt{N}}\sum \limits_{j=1}^{N}{a_{j}^{(m)}}\sum \limits_{s=1}^{M}\big({\eta _{j}^{i(s)}}-\operatorname{\mathsf{E}}{\zeta _{j}^{i(s)}}\big).\]
Note that
\[\operatorname{\mathsf{E}}\big({\zeta _{j}^{i(s)}}-{\eta _{j}^{i(s)}}\big)=\operatorname{\mathsf{E}}\operatorname{\mathsf{E}}\big[\big({\zeta _{j}^{i(s)}}-{\eta _{j}^{i(s)}}\big)\hspace{2.5pt}\big|\hspace{2.5pt}\kappa _{j}\big]=0.\]
Now
(8)
\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \operatorname{Cov}\big({S_{1}^{i}},{S_{2}^{k}}\big)\\{} & \displaystyle \hspace{1em}=\frac{1}{N}\sum \limits_{j=1}^{N}{\big({a_{j;N}^{(m)}}\big)}^{2}\sum \limits_{s=1}^{M}\sum \limits_{p=1}^{M}\operatorname{\mathsf{E}}\big({\zeta _{j}^{i(s)}}-{\eta _{j}^{i(s)}}\big){\eta _{j}^{k(p)}}\\{} & \displaystyle \hspace{1em}=\frac{1}{N}\sum \limits_{j=1}^{N}{\big({a_{j;N}^{(m)}}\big)}^{2}\sum \limits_{s=1}^{M}\sum \limits_{p=1}^{M}\operatorname{\mathsf{E}}{\delta _{j}^{(s)}}\Bigg[\sum \limits_{l=1}^{d}\big({X_{j}^{i(s)}}{X_{j}^{l(s)}}-{D}^{il(s)}\big)\big({b_{l}^{(s)}}-{b_{l}^{(m)}}\big)\\{} & \displaystyle \hspace{2em}+{X_{j}^{i(s)}}{\varepsilon _{j}^{i(s)}}\Bigg]\cdot {\delta _{j}^{(p)}}\Bigg(\sum \limits_{q=1}^{d}{D}^{kq(p)}\big({b_{q}^{(p)}}-{b_{q}^{(m)}}\big)\Bigg)\\{} & \displaystyle \hspace{1em}=\frac{1}{N}\sum \limits_{j=1}^{N}{\big({a_{j}^{(m)}}\big)}^{2}\sum \limits_{s=1}^{M}{p_{j;N}^{s}}\Bigg(\sum \limits_{l=1}^{d}\big(\underset{0}{\underbrace{{D}^{il(s)}-{D}^{il(s)}}}\big)\big({b_{l}^{(s)}}-{b_{l}^{(m)}}\big)+\operatorname{\mathsf{E}}{X_{j}^{i(s)}}\underset{0}{\underbrace{\operatorname{\mathsf{E}}{\varepsilon _{j}^{(s)}}}}\Bigg)\\{} & \displaystyle \hspace{2em}\times \Bigg(\sum \limits_{q=1}^{d}{D}^{kq(s)}\big({b_{q}^{(s)}}-{b_{q}^{(m)}}\big)\Bigg)=0;\end{array}\]
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \operatorname{Cov}\big({S_{1}^{i}},{S_{1}^{k}}\big)& \displaystyle =\frac{1}{N}\sum \limits_{j=1}^{N}{\big({a_{j;N}^{(m)}}\big)}^{2}\sum \limits_{s=1}^{M}{p_{j;N}^{s}}\sum \limits_{q=1}^{d}\sum \limits_{l=1}^{d}\big({b_{l}^{(s)}}-{b_{l}^{(m)}}\big)\operatorname{\mathsf{E}}\big(\big({X_{j}^{i(s)}}{X_{j}^{l(s)}}-{D}^{il(s)}\big)\\{} & \displaystyle \hspace{1em}\times \big({X_{j}^{k(s)}}{X_{j}^{q(s)}}-{D}^{kq(s)}\big)\big)\big({b_{q}^{(s)}}-{b_{q}^{(m)}}\big)\\{} & \displaystyle \hspace{1em}+\frac{1}{N}\sum \limits_{j=1}^{N}{\big({a_{j;N}^{(m)}}\big)}^{2}\sum \limits_{s=1}^{M}{p_{j;N}^{s}}{D}^{ik(s)}){\big({\sigma }^{(s)}\big)}^{2};\\{} \displaystyle \operatorname{Cov}\big({S_{2}^{i}},{S_{2}^{k}}\big)& \displaystyle =\frac{1}{N}\sum \limits_{j=1}^{N}{\big({a_{j;N}^{(m)}}\big)}^{2}\sum \limits_{s=1}^{M}{p_{j;N}^{s}}\sum \limits_{q=1}^{d}\sum \limits_{l=1}^{d}\big({b_{l}^{(s)}}-{b_{l}^{(m)}}\big){D}^{il(s)}\\{} & \displaystyle \hspace{1em}\times \Bigg[\big({b_{q}^{(s)}}-{b_{q}^{(m)}}\big){D}^{kq(s)}-\sum \limits_{r=1}^{M}{p_{j;N}^{r}}\big({b_{q}^{(r)}}-{b_{q}^{(m)}}\big){D}^{kq(r)}\Bigg].\end{array}\]
Thus,
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \operatorname{Cov}\zeta _{N}& \displaystyle =\frac{1}{N}\sum \limits_{j=1}^{N}{\big({a_{j;N}^{(m)}}\big)}^{2}\sum \limits_{s=1}^{M}{p_{j;N}^{s}}{D}^{ik(s)}{\big({\sigma }^{(s)}\big)}^{2}\\{} & \displaystyle \hspace{1em}+\frac{1}{N}\sum \limits_{j=1}^{N}{\big({a_{j}^{(m)}}\big)}^{2}\sum \limits_{s=1}^{M}{p_{j;N}^{s}}\sum \limits_{q=1}^{d}\sum \limits_{l=1}^{d}\big({b_{l}^{(s)}}-{b_{l}^{(m)}}\big)\Bigg[\big({b_{q}^{(s)}}-{b_{q}^{(m)}}\big)\\{} & \displaystyle \hspace{1em}\times \underset{{L_{lq}^{ik(s)}}}{\underbrace{\operatorname{\mathsf{E}}{X_{j}^{i(s)}}{X_{j}^{k(s)}}{X_{j}^{l(s)}}{X_{j}^{q(s)}}}}-{D}^{il(s)}\sum \limits_{r=1}^{M}{p_{j;N}^{r}}\big({b_{q}^{(r)}}-{b_{q}^{(m)}}\big){D}^{kq(r)}\Bigg]{\Bigg)_{i,k=1}^{d}}.\end{array}\]
From the last equation we get $\operatorname{Cov}(\zeta _{N})\to \boldsymbol{\varSigma }$ as $N\to \infty $.  □

5 Results of simulation

To assess the accuracy of the asymptotic results from Section 3, we performed a small simulation study. We considered a two-component mixture ($M=2$) with mixing probabilities ${p_{j;N}^{1}}=j/N$ and ${p_{j;N}^{2}}=1-{p_{j;N}^{1}}$. For each subject, there were two observed variables X and Y, which were simulated based on the simple linear regression model
\[Y_{j}={b_{0}^{\kappa _{j}}}+{b_{1}^{\kappa _{j}}}{X_{j}^{(\kappa _{j})}}+{\varepsilon _{j}^{(\kappa _{j})}},\]
where $\kappa _{i}$ is the number of component the jth observation belongs to, ${X_{j}^{(1)}}$ was simulated as $N(1,1)$, ${X_{j}^{(2)}}$ as $N(2,2.25)$, and ${\varepsilon _{j}^{(k)}}$ were zero-mean Gaussians with standard deviations 0.01 for the first component and 0.05 for the second one. The values of the regression coefficients were ${b_{0}^{1}}=3$, ${b_{1}^{1}}=0.5$, ${b_{0}^{2}}=-2$, ${b_{1}^{2}}=1$.
The means and covariances of the estimates were calculated over 2000 replications.
The results of simulation are presented in Table 1. The true values of parameters and asymptotic covariances are placed in the last rows of the tables.
Table 1.
Simulation results
First component
n $\operatorname{\mathsf{E}}{\hat{b}_{0}^{1}}$ $\operatorname{\mathsf{E}}{\hat{b}_{1}^{1}}$ $N\operatorname{Var}{\hat{b}_{0}^{1}}$ $N\operatorname{Var}{\hat{b}_{1}^{1}}$ $N\operatorname{Cov}({\hat{b}_{0}^{1}},{\hat{b}_{1}^{1}})$
500 3.0014 0.5235 47.61 45.83 − 42.27
1000 3.0033 0.512 41.19 37.50 −35.04
2000 3.0011 0.5032 38.84 34.71 −32.87
5000 3.0003 0.5016 39.54 34.49 −32.83
∞ 3 0.5 39.13 33.96 −32.53
Second component
n $\operatorname{\mathsf{E}}{\hat{b}_{0}^{2}}$ $\operatorname{\mathsf{E}}{\hat{b}_{1}^{2}}$ $N\operatorname{Var}{\hat{b}_{0}^{2}}$ $N\operatorname{Var}{\hat{b}_{1}^{2}}$ $N\operatorname{Cov}({\hat{b}_{0}^{2}},{\hat{b}_{1}^{2}})$
500 −2.0243 1.0084 67.37 7.94 −22.17
1000 −42.0100 1.0027 63.04 7.57 −20.90
2000 −2.0039 1.0016 63.57 7.52 −20.95
5000 −2.0074 1.0025 62.41 7.32 −20.48
∞ −2 1 62.20 7.34 −20.47
The presented data show good concordance with the asymptotic theory for $n>1000$.

6 Conclusions

We considered a modification of least-squares estimators for linear regression coefficients in the case where observations are obtained from a mixture with varying concentrations. Conditions of consistency and asymptotic normality of the estimators were derived, and dispersion matrices were evaluated. The results of simulations confirm good concordance of estimators covariances with the asymptotic formulas for sample sizes larger then 1000 observations.
In real-life data analysis, concentrations (mixing probabilities) are usually not known exactly but estimated. So, to apply the proposed technique, we also need to analyze sensitivity of the estimates to perturbations of the concentrations model. (We are thankful to the unknown referee for this observation). It is worth noting that performance of these estimates will be poor if the true concentrations of the components are nearly linearly dependent ($\det \boldsymbol{\varGamma }_{N}\approx 0$). We also expect stability of the estimates w.r.t. concentration perturbations if $\det \boldsymbol{\varGamma }_{N}$ is bounded away from zero. More deep analysis of sensitivity will be a part of our further work.

References

[1] 
Autin, F., Pouet, Ch.: Test on the components of mixture densities. Stat. Risk. Model. 28(4), 389–410 (2011). MR2877572. doi:10.1524/strm.2011.1065
[2] 
Billingsley, P.: Probability and Measure, 3rd edn. Wiley, New York (1995). MR1324786
[3] 
Borovkov, A.A.: Mathematical Statistics. Gordon and Breach Science Publishers, Amsterdam (1998). MR1712750
[4] 
Doronin, O.V.: Lower bound for a dispersion matrix for the semiparametric estimation in a model of mixtures. Theory Probab. Math. Stat. 90, 71–85 (2015). MR3241861. doi:10.1090/tpms/950
[5] 
Grün, B., Leisch, F.: Fitting finite mixtures of linear regression models with varying & fixed effects in R. In: Rizzi, A., Vichi, M. (eds.) Compstat 2006 – Proceedings in Computational Statistics, pp. 853–860. Physica Verlag, Heidelberg, Germany (2006)
[6] 
Maiboroda, R.: Statistical Analysis of Mixtures. Kyiv University Publishers, Kyiv (2003) (in Ukrainian)
[7] 
Maiboroda, R.E., Sugakova, O.V.: Estimation and classification by observations from mixture. Kyiv University Publishers, Kyiv (2008) (in Ukrainian)
[8] 
Maiboroda, R., Sugakova, O.: Statistics of mixtures with varying concentrations with application to DNA microarray data analysis. J. Nonparametr. Stat. 24(1), 201–205 (2012). MR2885834. doi:10.1080/10485252.2011.630076
[9] 
Maiboroda, R.E., Sugakova, O.V., Doronin, A.V.: Generalized estimating equations for mixtures with varying concentrations. Can. J. Stat. 41(2), 217–236 (2013). MR3061876. doi:10.1002/cjs.11170
[10] 
McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley–Interscience, New York (2000). MR1789474. doi:10.1002/0471721182
[11] 
Seber, G.A.F., Lee, A.J.: Linear Regression Analysis. Wiley, New York (2003). MR1958247. doi:10.1002/9780471722199
[12] 
Shcherbina, A.: Estimation of the mean value in the model of mixtures with varying concentrations. Theory Probab. Math. Stat. 84, 151–164 (2012). MR2857425. doi:10.1090/S0094-9000-2012-00866-1
[13] 
Titterington, D.M., Smith, A.F., Makov, U.E.: Analysis of Finite Mixture Distributions. Wiley, New York (1985)
Reading mode PDF XML

Table of contents
  • 1 Introduction
  • 2 Nonparametric estimation for MVC
  • 3 Estimate for $\mathbf{b}_{m}$ and its asymptotics
  • 4 Proofs
  • 5 Results of simulation
  • 6 Conclusions
  • References

Copyright
© 2015 The Author(s). Published by VTeX
by logo by logo
Open access article under the CC BY license.

Keywords
Finite mixture model linear regression mixture with varying concentrations nonparametric estimation asymptotic normality consistency

MSC2010
62J05 62G20

Metrics
since March 2018
572

Article info
views

375

Full article
views

348

PDF
downloads

182

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

  • Tables
    1
  • Theorems
    2
Table 1.
Simulation results
Theorem 1 (Consistency).
Theorem 2 (Asymptotic normality).
Table 1.
Simulation results
First component
n $\operatorname{\mathsf{E}}{\hat{b}_{0}^{1}}$ $\operatorname{\mathsf{E}}{\hat{b}_{1}^{1}}$ $N\operatorname{Var}{\hat{b}_{0}^{1}}$ $N\operatorname{Var}{\hat{b}_{1}^{1}}$ $N\operatorname{Cov}({\hat{b}_{0}^{1}},{\hat{b}_{1}^{1}})$
500 3.0014 0.5235 47.61 45.83 − 42.27
1000 3.0033 0.512 41.19 37.50 −35.04
2000 3.0011 0.5032 38.84 34.71 −32.87
5000 3.0003 0.5016 39.54 34.49 −32.83
∞ 3 0.5 39.13 33.96 −32.53
Second component
n $\operatorname{\mathsf{E}}{\hat{b}_{0}^{2}}$ $\operatorname{\mathsf{E}}{\hat{b}_{1}^{2}}$ $N\operatorname{Var}{\hat{b}_{0}^{2}}$ $N\operatorname{Var}{\hat{b}_{1}^{2}}$ $N\operatorname{Cov}({\hat{b}_{0}^{2}},{\hat{b}_{1}^{2}})$
500 −2.0243 1.0084 67.37 7.94 −22.17
1000 −42.0100 1.0027 63.04 7.57 −20.90
2000 −2.0039 1.0016 63.57 7.52 −20.95
5000 −2.0074 1.0025 62.41 7.32 −20.48
∞ −2 1 62.20 7.34 −20.47
Theorem 1 (Consistency).
Assume that
  • 1. ${\mathbf{D}}^{(k)}$ and ${({\sigma }^{(k)})}^{2}$ are finite for all $k=1,\dots ,M$.
  • 2. ${\mathbf{D}}^{(m)}$ is nonsingular.
  • 3. There exists $C>0$ such that $\det \boldsymbol{\varGamma }_{N}>C$ for all N large enough.
Theorem 2 (Asymptotic normality).
Assume that
  • 1. $\operatorname{\mathsf{E}}[{({X}^{i}(O))}^{4}\hspace{2.5pt}|\hspace{2.5pt}\kappa (O)=k]<\infty $ and $\operatorname{\mathsf{E}}[{(\varepsilon (O))}^{4}\hspace{2.5pt}|\hspace{2.5pt}\kappa (O)=k]<\infty $ for all $k=1,\dots ,M$.
  • 2. Matrix $\mathbf{D}={\mathbf{D}}^{(m)}$ is nonsingular.
  • 3. There exists $C>0$ such that $\det \boldsymbol{\varGamma }_{N}>C$ for all N large enough.
  • 4. For all $s,p=1,\dots ,M$, there exist $\langle {({\mathbf{a}}^{m})}^{2}{\mathbf{p}}^{s}{\mathbf{p}}^{p}\rangle $.

MSTA

MSTA

  • Online ISSN: 2351-6054
  • Print ISSN: 2351-6046
  • Copyright © 2018 VTeX

About

  • About journal
  • Indexed in
  • Editors-in-Chief

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • ejournals-vmsta@vtex.lt
  • Mokslininkų 2A
  • LT-08412 Vilnius
  • Lithuania
Powered by PubliMill  •  Privacy policy