1 Introduction
Models of mixtures with varying concentrations (MVC) naturally arise in statistical analysis of sociological and biomedical data [9, 11]. The MVC model is a generalization of a classical finite mixture model FMM [10, 12] in which the concentrations of components in the mixture are different for different observations. Statistics of data described by MVC models is considered in [8].
Regression models provide powerful technology of dependencies analysis in multivariate data. For applications of parametric regression mixture models in behavioral science, see [5]. In context of the MVC model, linear [6] and nonlinear [7] regression models were considered. A modification of the Nadaraya–Watson estimator for MVC is considered in [1]. It is well known that for homogeneous data the Nadaraya–Watson estimator suffers from so-called boundary effect: it has particularly severe bias at the boundary points of the regressor support [3]. As a remedy to cure this sickness a local-linear regression (LLR) technique (or, more general local-polynomial) is used [2]. A modification of local-linear estimator for MVC is proposed in [4].
This paper focuses on the investigation of the modified LLR estimator properties. In Section 2 we describe the MVC model and particularly the model of regression mixture with varying concentrations. In Section 3 modifications of the Nadaraya–Watson (mNWE) and local-linear estimators (mLLE) for MVC are presented. Section 4 is devoted to the conditions for consistency of mLLE. Results of simulations are presented in Section 5. Concluding remarks are placed in Section 6.
2 Regression mixture
Consider a sample with n subjects ${O_{1}},\dots ,{O_{n}}$, where each subject belongs to one of M subpopulations (mixture components). The number of components ${\kappa _{j}}=\kappa ({O_{j}})$, which the j-th subject belongs to, is unknown. But the probability ${p_{j;n}^{m}}=\mathbf{P}({\kappa _{j}}=m)$ that ${O_{j}}$ belongs to m-th component is known for all $1\le j\le n$ and $1\le m\le M$. The probabilities $\{{p_{j;n}^{m}}\}$ are called the mixing probabilities or concentrations of the components in the mixture.
For each subject one observes a set of observed variables ${\xi _{j}}=\xi ({O_{j}})\in {\mathbb{R}^{d}}$. Let ${F^{m}}(A):=\mathbf{P}(\xi (O)\in A|\kappa (O)=m)$ be the (unknown) distribution of $\xi (O)$ if O belongs to the m-th component. Then
where $\mathcal{B}({\mathbb{R}^{d}})$ is the Borel σ-algebra on ${\mathbb{R}^{d}}$.
(1)
\[ \mathbf{P}({\xi _{j}}\in A)={\sum \limits_{m=1}^{M}}{p_{j;n}^{m}}{F^{m}}(A),\hspace{1em}A\in \mathcal{B}\big({\mathbb{R}^{d}}\big),\]In what follows we assume that ${\xi _{j}}$, $j=1,\dots ,n$, are independent random vectors. Formula (1) is called the MVC model. A partial case of MVC is the mixture of regressions, at which a regression model is assumed for each mixture component distribution ${F^{m}}(A)$, $m=1,\dots ,M$.
In this paper we restrict ourselves to bivariate vectors of observed features. For each subject ${O_{j}}$ in the sample a vector ${\xi _{j}}=({X_{j}},{Y_{j}})$ is observed, where ${X_{j}}=X({O_{j}})$ and ${Y_{j}}=Y({O_{j}})$ are the regressor and response connected by the following regression model:
where ${g^{(m)}}$ is an unknown regression function, which corresponds to the m-th component of mixture, ${\varepsilon _{j}}=\varepsilon ({O_{j}})$ is a random error term. The error terms have zero mean and finite variance for each mixture component:
\[ \mathbf{E}\big[\varepsilon ({O_{j}})\mid \kappa ({O_{j}})=m\big]=0,\hspace{0.2222em}\operatorname{\mathbf{Var}}\big[\varepsilon ({O_{j}})\mid \kappa ({O_{j}})=m\big]={\sigma _{(m)}^{2}}\lt \infty ,\hspace{1em}1\le j\le n.\]
It is assumed that ${X_{j}}$ and ${\varepsilon _{j}}$ are independent for fixed ${\kappa _{j}}$.Let ${F_{X}^{m}}(A)=\mathbf{P}(X(O)\in A\mid \kappa (O)=m)$ be the distribution of the regressor for m-th component. In what follows we assume that ${F_{X}^{m}}(A)$ is dominated by the Lebesgue measure for all $1\le m\le M$. The corresponding probability densities are denoted by ${f^{(m)}}(x)$. These density functions are unknown.
Our aim is to estimate the unknown regression function ${g^{(m)}}(x)$ for m-th component of mixture in (2).
3 Estimators
3.1 Nonparametric regression in homogeneous samples
Let us recall how nonparametric regression estimators are defined in the case of homogeneous data. That is, we will assume here that (2) holds with $M=1$ and $g(x)={g^{(1)}}(x)$ is to be estimated.
The classical Nadaraya–Watson estimator (NWE) ${\hat{g}_{n}^{NW}}({x_{0}})$ for $g({x_{0}})$ is defined by
\[ {\hat{g}_{n}^{NW}}({x_{0}})=\frac{{\textstyle\textstyle\sum _{j=1}^{n}}{Y_{j}}K(\frac{{x_{0}}-{X_{j}}}{h})}{{\textstyle\textstyle\sum _{j=1}^{n}}K(\frac{{x_{0}}-{X_{j}}}{h})},\]
where $K:\mathbb{R}\to \mathbb{R}$ is a kernel function, $h\gt 0$ is a bandwidth. The kernel K defines the weight of an observation $({X_{j}},{Y_{j}})$ in the estimator. A typical example of K used in NWE is the Epanechnikov kernel ${K^{Ep}}(t)=\frac{3}{4}(1-{t^{2}})\mathbf{1}\{|t|\lt 1\}$, where $\mathbf{1}\{A\}$ is an indicator function of an event A. The bandwidth can be interpreted as the width of the neighborhood around ${x_{0}}$ to which we localize the estimator of $g({x_{0}})$.To construct the local linear estimator (LLE) one considers the localized least squares functional of the form
Let $(\hat{a}({x_{0}}),\hat{b}({x_{0}}))$ be the point of minimum of $J({x_{0}};a,b)$ over $(a,b)\in {\mathbb{R}^{2}}$. Then LLE for $g({x_{0}})$ is
The estimator ${\hat{g}^{LL}}({x_{0}})$ can be calculated by (8)–(9) below with the weights ${w_{j;n}}=K(({x_{0}}-{X_{j}})/h)$.
(3)
\[ J({x_{0}};a,b)={\sum \limits_{j=1}^{n}}K\bigg(\frac{{x_{0}}-{X_{j}}}{h}\bigg){\big({Y_{j}}-\big(a+b({x_{0}}-{X_{j}})\big)\big)^{2}}.\]3.2 The minimax weighting coefficients
To modify the estimators for MVC data we will need the minimax coefficients for components distribution estimation defined in [9].
Let ${\mathbf{p}^{m}}={({p_{1;n}^{m}},\dots ,{p_{n;n}^{m}})^{T}}$ be the vector of concentrations for the m-th component of mixture, $m=1,\dots ,M$.
The averaging operation will be denoted by angle brackets:
\[ {\langle \mathbf{v}\rangle _{n}}:=\frac{1}{n}{\sum \limits_{j=1}^{n}}{v_{j}},\hspace{1em}\text{for any}\hspace{2.5pt}\mathbf{v}={({v_{1}},\dots ,{v_{n}})^{T}}\in {\mathbb{R}^{n}}.\]
Arithmetic operations with vectors in the angle brackets are performed entry-wise:
\[ {\big\langle {\mathbf{p}^{m}}{\mathbf{p}^{k}}\big\rangle _{n}}=\frac{1}{n}{\sum \limits_{j=1}^{n}}{p_{j;n}^{m}}{p_{j;n}^{k}}.\]
Note that ${\langle {\mathbf{p}^{m}}{\mathbf{p}^{k}}\rangle _{n}}$ can be considered as an inner product on ${\mathbb{R}^{n}}$.In what follows we assume that ${\{{\mathbf{p}^{m}}\}_{m=1}^{M}}$ are linearly independent. Denote by ${\boldsymbol{\Gamma }_{n}}={({\langle {\mathbf{p}^{k}}{\mathbf{p}^{l}}\rangle _{n}})_{k,l=1}^{M}}$ the Gram matrix of ${\{{\mathbf{p}^{m}}\}_{m=1}^{M}}$. The weighting coefficients ${a_{j;n}^{k}}$, defined by
where ${\gamma _{km}}$ is $(k,m)$-th minor of ${\boldsymbol{\Gamma }_{n}}$, are called minimax weighting coefficients. These weights can also be obtained by the formula
By (5) and (1)
(4)
\[ {a_{j;n}^{k}}=\frac{1}{\det {\boldsymbol{\Gamma }_{n}}}{\sum \limits_{m=1}^{M}}{(-1)^{m+k}}{\gamma _{km}}{p_{j;n}^{m}},\]
\[ \big({a_{j;n}^{1}},\dots .{a_{j;n}^{M}}\big)=\big({p_{j;n}^{1}},\dots ,{p_{j;n}^{M}}\big){\boldsymbol{\Gamma }_{n}^{-1}}.\]
The vector of minimax coefficients for the m-th component will be denoted by ${\mathbf{a}^{m}}={({a_{1;n}^{m}},\dots ,{a_{n;n}^{m}})^{T}}$. These coefficients can be used in weighted empirical distributions
\[ {\hat{F}_{n}^{m}}(A)=\frac{1}{n}{\sum \limits_{j=1}^{n}}{a_{j;n}^{m}}\mathbf{1}\{{\xi _{j}}\in A\}.\]
Observe that
(5)
\[ {\big\langle {\mathbf{p}^{k}}{\mathbf{a}^{m}}\big\rangle _{n}}=\left\{\begin{array}{l@{\hskip10.0pt}l}1\hspace{1em}& \text{if}\hspace{2.5pt}k=m,\\ {} 0\hspace{1em}& \text{if}\hspace{2.5pt}k\ne m,\end{array}\right.\hspace{1em}\text{for all}\hspace{2.5pt}1\le m\le M.\]
\[ \mathbf{E}\big[{\hat{F}_{n}^{m}}(A)\big]={\sum \limits_{k=1}^{M}}{\big\langle {\mathbf{a}^{m}}{\mathbf{p}^{k}}\big\rangle _{n}}{F^{k}}(A)={F^{m}}(A),\]
so ${\hat{F}_{n}^{m}}(A)$ is an unbiased estimator for ${F^{m}}(A)$. It is shown in [9] that under the MVC model (1) ${\hat{F}_{n}^{m}}(A)$ is a minimax estimator for ${F_{m}}(A)$ in the class of all unbiased estimators.3.3 Nonparametric estimators for MVC
Here and below we assume that ${\xi _{j}}$, $j=1,\dots ,n$, are described by the regression MVC model (2).
In [1] a modified NWE (mNWE) for ${g^{(m)}}({x_{0}})$ is proposed of the form
It is shown in [1] that, under suitable assumptions, ${\hat{g}_{n}^{NW(m)}}({x_{0}})$ is a consistent and asymptotically normal estimator of ${g^{(m)}}({x_{0}})$.
(6)
\[ {\hat{g}_{n}^{NW(m)}}({x_{0}})=\frac{{\textstyle\textstyle\sum _{j=1}^{n}}{a_{j;n}^{m}}{Y_{j}}K(\frac{{x_{0}}-{X_{j}}}{h})}{{\textstyle\textstyle\sum _{j=1}^{n}}{a_{j;n}^{m}}K(\frac{{x_{0}}-{X_{j}}}{h})}.\]To derive a modified local linear estimator (mLLE) we start with the weighted least squares functional
Observe that some weights ${a_{j;n}^{m}}$ can be negative, so this functional can attain negative values and be unbounded from below. Despite this we consider the stationary point of ${J^{(m)}}({x_{0}};a,b)$, i.e. the solution $(\hat{a}({x_{0}}),\hat{b}({x_{0}}))$ to the equation system
where
Since ${\hat{g}_{n}^{NW(m)}}({x_{0}})={S_{Y}}/{S_{1}}$, we have
So, mLLE can be considered as the mNWE with an additional correction term. This term takes a value close to zero if ${S_{X}}/{S_{1}}\approx 0$. As we will see below, this is usually the case if ${x_{0}}$ is an interior point of the regressor support and K is an even function. Therefore if ${x_{0}}$ lies in the interior of the regressor support, mLLE behavior is nearly the same as of mNWE. But at the boundary points the correction makes the bias of mLLE smaller than of mNWE.
(7)
\[ {J^{(m)}}({x_{0}};a,b)={\sum \limits_{j=1}^{n}}{a_{j;n}^{m}}K\bigg(\frac{{x_{0}}-{X_{j}}}{h}\bigg){\big({Y_{j}}-\big(a+b({x_{0}}-{X_{j}})\big)\big)^{2}}.\]
\[ \left\{\begin{array}{l}\frac{\partial }{\partial a}{J^{(m)}}({x_{0}};a,b)=0,\hspace{1em}\\ {} \frac{\partial }{\partial b}{J^{(m)}}({x_{0}};a,b)=0.\hspace{1em}\end{array}\right.\]
The mLLE for ${g^{(m)}}({x_{0}})$ is defined as ${\hat{g}_{n}^{LL(m)}}({x_{0}}):=\hat{a}({x_{0}})$. By a simple algebra it can be calculated as
(8)
\[ {\hat{g}_{n}^{LL(m)}}({x_{0}})=\frac{{S_{Y}}{S_{XX}}-{S_{X}}{S_{XY}}}{{S_{1}}{S_{XX}}-{({S_{X}})^{2}}},\](9)
\[\begin{array}{l}\displaystyle \begin{aligned}{}{S_{1}}& ={\sum \limits_{j=1}^{n}}{w_{j;n}^{m}},\hspace{0.2222em}{S_{X}}={\sum \limits_{j=1}^{n}}{w_{j;n}^{m}}({x_{0}}-{X_{j}}),\hspace{0.2222em}{S_{Y}}={\sum \limits_{j=1}^{n}}{w_{j;n}^{m}}{Y_{j}},\\ {} {S_{XX}}& ={\sum \limits_{j=1}^{n}}{w_{j;n}^{m}}{({x_{0}}-{X_{j}})^{2}},\hspace{0.2222em}{S_{XY}}={\sum \limits_{j=1}^{n}}{w_{j;n}^{m}}{Y_{j}}({x_{0}}-{X_{j}}),\end{aligned}\\ {} \displaystyle {w_{j;n}^{m}}={a_{j;n}^{m}}K\big(({x_{0}}-{X_{j}})/h\big).\end{array}\](10)
\[ {\hat{g}_{n}^{LL(m)}}({x_{0}})={\hat{g}_{n}^{NW(m)}}({x_{0}})-\frac{{S_{1}}{S_{XY}}-{S_{X}}{S_{Y}}}{{S_{1}}{S_{XX}}-{({S_{X}})^{2}}}\cdot \frac{{S_{X}}}{{S_{1}}}.\]4 Consistency of the modified local-linear regression estimator
The consistency conditions of the weightened local-linear estimator are given in the following theorem.
Theorem 1.
Let $1\le m\le M$ be a fixed number of components. Assume that:
Then ${\hat{g}_{n}^{LL(m)}}({x_{0}})$ is a consistent estimator of ${g^{(m)}}({x_{0}})$, i.e.
-
1. ${x_{0}}$ is a continuity point of ${f^{(m)}}(x)$ and ${g^{(m)}}(x)$;
-
2. $h={h_{n}}$, such that ${h_{n}}\to 0$ and $n{h_{n}}\to +\infty $ as $n\to \infty $;
-
3. the kernel function K is bounded on $\mathbb{R}$ and satisfies ${\textstyle\int _{-\infty }^{\infty }}|K(z)|{z^{2}}dz\lt \infty $ and ${\textstyle\int _{-\infty }^{\infty }}{K^{2}}(z){z^{4}}dz\lt \infty $;
-
4. ${g^{(k)}}(x)$, ${f^{(k)}}(x)$ are bounded on $x\in \mathbb{R}$ for all $1\le k\le M$;
-
5. ${f^{(m)}}({x_{0}})\gt 0$;
-
6. $\Delta (K):={\textstyle\int _{-\infty }^{+\infty }}K(z)dz{\textstyle\int _{-\infty }^{+\infty }}{z^{2}}K(z)dz-{({\textstyle\int _{-\infty }^{+\infty }}zK(z)dz)^{2}}\ne 0$;
-
7. there exists ${c_{0}}\gt 0$, such that $\det {\boldsymbol{\Gamma }_{n}}\ge {c_{0}}$ for all $n\ge 1$.
Remarks.
-
1. Assumption 2 of the theorem is necessary. This is also required for the LLR estimator in case of homogeneous sample, see Theorem 1 of [2].
-
2. If the kernel K has a bounded support then Assumption 4 can be relaxed to a local version: ${g^{(k)}}$ and ${f^{(k)}}$ are bounded in some open neighborhood of ${x_{0}}$.
-
3. If the kernel $K(z)$ is nonnegative the assumption $\Delta (K)\ne 0$ is equivalent to $K(\cdot )\ne 0$ on a set of positive Lebesgue measure.
-
4. The requirement that ${f^{(m)}}({x_{0}})\gt 0$ is crucial. If ${f^{(m)}}(x)=0$ for all x in some neighborhood of ${x_{0}}$, then ${X^{(m)}}$ will not attain values at this neighborhood, hence ${g^{(m)}}({x_{0}})$ cannot be estimated nonparametrically. On the other hand, when ${x_{0}}$ is a boundary point of the support of ${X^{(m)}}$, consistent estimation is possible.
-
5. Condition $\det {\boldsymbol{\Gamma }_{n}}\gt 0$, is equivalent to linear independence of concentrations vectors $\{{\mathbf{p}^{k}}\}$, $k=1,\dots ,M$. Assumption 7 can be considered as an asymptotic version of the linear independence condition. It is worth noting that this assumption outrules the classical FMMs at which ${p_{j;n}^{m}}={p^{m}}$ does not depend on m.
Proof.
Multiplying both the numerator and denominator in (8) by ${(n{h_{n}^{2}})^{-2}}$, one obtains
Let us analyze each term separately.
We will start from ${S_{XX}}/(n{h_{n}^{3}})$.
\[\begin{aligned}{}\mathbf{E}\bigg[\frac{{S_{XX}}}{n{h_{n}^{3}}}\bigg]& =\frac{1}{n{h_{n}^{3}}}{\sum \limits_{j=1}^{n}}{a_{j;n}^{m}}\mathbf{E}\big[K\big(({x_{0}}-{X_{j}})/{h_{n}}\big){({x_{0}}-{X_{j}})^{2}}\big]\\ {} & =\frac{1}{n{h_{n}^{3}}}{\sum \limits_{j=1}^{n}}{a_{j;n}^{m}}{\sum \limits_{k=1}^{M}}{p_{j;n}^{k}}\mathbf{E}\big[K\big(\big({x_{0}}\hspace{-0.1667em}-\hspace{-0.1667em}X(O)\big)/{h_{n}}\big){\big({x_{0}}\hspace{-0.1667em}-\hspace{-0.1667em}X(O)\big)^{2}}\mid \kappa (O)\hspace{-0.1667em}=\hspace{-0.1667em}k\big]\\ {} & =\frac{1}{{h_{n}^{3}}}{\sum \limits_{k=1}^{M}}{\big\langle {\mathbf{a}^{m}}{\mathbf{p}^{k}}\big\rangle _{n}}\mathbf{E}\big[K\big(\big({x_{0}}-X(O)\big)/{h_{n}}\big){\big({x_{0}}-X(O)\big)^{2}}\mid \kappa (O)=k\big]\\ {} & =\frac{1}{{h_{n}^{3}}}\mathbf{E}\big[K\big(\big({x_{0}}-X(O)\big)/{h_{n}}\big){\big({x_{0}}-X(O)\big)^{2}}\mid \kappa (O)=m\big],\end{aligned}\]
due to (5). So
\[ \mathbf{E}\bigg[\frac{{S_{XX}}}{n{h_{n}^{3}}}\bigg]=\frac{1}{{h_{n}^{3}}}{\underset{-\infty }{\overset{+\infty }{\int }}}K\big(({x_{0}}-x)/{h_{n}}\big){({x_{0}}-x)^{2}}{f^{(m)}}(x)dx.\]
The substitution
yields
\[ \mathbf{E}\bigg[\frac{{S_{XX}}}{n{h_{n}^{3}}}\bigg]={\underset{-\infty }{\overset{+\infty }{\int }}}K(z){z^{2}}{f^{(m)}}({x_{0}}-{h_{n}}z)dz.\]
By Assumptions 1 and 2, $K(z){z^{2}}{f^{(m)}}({x_{0}}-{h_{n}}z)\to K(z){z^{2}}{f^{(m)}}({x_{0}})$ for all $z\in \mathbb{R}$. By Assumption 4, there exists ${C_{f}}\lt \infty $, such that ${f^{(m)}}(x)\lt {C_{f}}$ for all $x\in \mathbb{R}$. By Assumption 3, ${\textstyle\int _{-\infty }^{\infty }}|K(z)|{z^{2}}dz\lt \infty $, so ${C_{f}}|K(z)|{z^{2}}$ is an integrable majorant of $K(z){z^{2}}{f^{(m)}}({x_{0}}-{h_{n}}z)$ and by the Lebesgue dominated convergence theorem
Let us bound the variance of ${S_{XX}}/(n{h_{n}^{3}})$:
Observe that
\[\begin{aligned}{}& \operatorname{\mathbf{Var}}\bigg[\frac{{S_{XX}}}{n{h_{n}^{3}}}\bigg]=\frac{1}{{(n{h_{n}^{3}})^{2}}}{\sum \limits_{j=1}^{n}}{\big({a_{j;n}^{m}}\big)^{2}}\operatorname{\mathbf{Var}}\big[K\big(({x_{0}}-{X_{j}})/{h_{n}}\big){({x_{0}}-{X_{j}})^{2}}\big]\\ {} & \hspace{1em}\le \frac{1}{{n^{2}}{h_{n}^{6}}}{\sum \limits_{j=1}^{n}}{\big({a_{j;n}^{m}}\big)^{2}}\mathbf{E}\big[{\big(K\big(({x_{0}}-{X_{j}})/{h_{n}}\big){({x_{0}}-{X_{j}})^{2}}\big)^{2}}\big]\\ {} & \hspace{1em}=\frac{1}{{n^{2}}{h_{n}^{6}}}{\sum \limits_{j=1}^{n}}{\big({a_{j;n}^{m}}\big)^{2}}{\sum \limits_{k=1}^{M}}{p_{j;n}^{k}}\mathbf{E}\big[{\big(K\big(\big({x_{0}}\hspace{-0.1667em}-\hspace{-0.1667em}X(O)\big)/{h_{n}}\big)\big)^{2}}{\big({x_{0}}\hspace{-0.1667em}-\hspace{-0.1667em}X(O)\big)^{4}}\mid \kappa (O)=k\big]\\ {} & \hspace{1em}=\frac{1}{n{h_{n}^{6}}}{\sum \limits_{k=1}^{M}}{\big\langle {\big({\mathbf{a}^{m}}\big)^{2}}{\mathbf{p}^{k}}\big\rangle _{n}}{\underset{-\infty }{\overset{+\infty }{\int }}}{\big(K\big(({x_{0}}-x)/{h_{n}}\big)\big)^{2}}{({x_{0}}-x)^{4}}{f^{(k)}}(x)dx.\end{aligned}\]
Applying once more the substitution (12) we obtain
(14)
\[ \operatorname{\mathbf{Var}}\bigg[\frac{{S_{XX}}}{n{h_{n}^{3}}}\bigg]\le \frac{1}{n{h_{n}}}{\sum \limits_{k=1}^{M}}{\big\langle {\big({\mathbf{a}^{m}}\big)^{2}}{\mathbf{p}^{k}}\big\rangle _{n}}{\underset{-\infty }{\overset{+\infty }{\int }}}{\big(K(z)\big)^{2}}{z^{4}}{f^{(k)}}({x_{0}}-{h_{n}}z)dz.\]
\[ {\big\langle {\big({\mathbf{a}^{m}}\big)^{2}}{\mathbf{p}^{k}}\big\rangle _{n}}\le {\Big(\underset{1\le j\le n}{\max }|{a_{j;n}^{m}}|\Big)^{2}}\]
since $0\le {p_{j;n}^{k}}\le 1$.By (4) and Assumption 7,
\[ |{a_{j;n}^{k}}|=\bigg|\frac{1}{\det {\Gamma _{n}}}{\sum \limits_{m=1}^{M}}{(-1)^{m+k}}{\gamma _{km}}{p_{j;n}^{m}}\bigg|\le {c_{0}^{-1}}{\sum \limits_{m=1}^{M}}|{\gamma _{km}}|\le {c_{0}^{-1}}M!=:{C_{0}}\lt \infty ,\]
since $|{\gamma _{km}}|\le (M-1)!$.So
by Assumption 2.
Combining this with (13) we obtain
Consider now asymptotics of ${S_{XY}}/(n{h^{2}})$. With (2) in mind by the same way as above we obtain
\[\begin{aligned}{}& \mathbf{E}\bigg[\frac{{S_{XY}}}{n{h_{n}^{2}}}\bigg]=\frac{1}{{h_{n}^{2}}}{\sum \limits_{k=1}^{M}}{\big\langle {\mathbf{a}^{m}}{\mathbf{p}^{k}}\big\rangle _{n}}\mathbf{E}\big[K\big(\big({x_{0}}-X(O)\big)/{h_{n}}\big)\big({x_{0}}-X(O)\big)Y(O)\mid \kappa (O)=k\big]\\ {} & \hspace{1em}=\frac{1}{{h_{n}^{2}}}{\sum \limits_{k=1}^{M}}{\big\langle {\mathbf{a}^{m}}{\mathbf{p}^{k}}\big\rangle _{n}}\mathbf{E}\big[K\big(\big({x_{0}}\hspace{-0.1667em}-\hspace{-0.1667em}X(O)\big)/{h_{n}}\big)\big({x_{0}}\hspace{-0.1667em}-\hspace{-0.1667em}X(O)\big){g^{(\kappa (O))}}\big(X(O)\big)\mid \kappa (O)\hspace{-0.1667em}=\hspace{-0.1667em}k\big]\\ {} & \hspace{1em}=\frac{1}{{h_{n}^{2}}}{\underset{-\infty }{\overset{+\infty }{\int }}}K\big(({x_{0}}-x)/{h_{n}}\big)({x_{0}}-x){g^{(m)}}(x){f^{(m)}}(x)dx.\end{aligned}\]
(Here the assumption $\mathbf{E}[\varepsilon (O)\mid \kappa (O)=m]=0$ and conditional independence of $X(O)$ and $\varepsilon (O)$ were used.)The substitution (12) yields
Applying the Lebesgue dominated convergence theorem, in view of Assumptions 1 and 6, we obtain
Then
\[\begin{aligned}{}& \operatorname{\mathbf{Var}}\bigg[\frac{{S_{XY}}}{n{h_{n}^{2}}}\bigg]\le \frac{1}{{(n{h_{n}^{2}})^{2}}}{\sum \limits_{j=1}^{n}}{\big({a_{j;n}^{m}}\big)^{2}}\mathbf{E}\big[{\big(K\big(({x_{0}}-{X_{j}})/{h_{n}}\big)({x_{0}}-{X_{j}}){Y_{j}}\big)^{2}}\big]\\ {} & =\frac{1}{n{h_{n}^{4}}}\hspace{-0.1667em}{\sum \limits_{k=1}^{M}}{\big\langle {\big({\mathbf{a}^{m}}\big)^{2}}{\mathbf{p}^{k}}\big\rangle _{n}}\mathbf{E}\big[{\big({g^{(k)}}\hspace{-0.1667em}\big(X(O)\big)\hspace{-0.1667em}K\big(\big({x_{0}}\hspace{-0.1667em}-\hspace{-0.1667em}X(O)\big)/{h_{n}}\big)\big({x_{0}}\hspace{-0.1667em}-\hspace{-0.1667em}X(O)\big)\big)^{2}}\mid \kappa (O)\hspace{-0.1667em}=\hspace{-0.1667em}k\big]\\ {} & \hspace{1em}+\frac{1}{n{h_{n}^{4}}}{\sum \limits_{k=1}^{M}}{\sigma _{(k)}^{2}}{\big\langle {\big({\mathbf{a}^{m}}\big)^{2}}{\mathbf{p}^{k}}\big\rangle _{n}}\mathbf{E}\big[{\big(K\big(\big({x_{0}}-X(O)\big)/{h_{n}}\big)\big({x_{0}}-X(O)\big)\big)^{2}}\mid \kappa (O)=k\big]\\ {} & =\frac{1}{n{h_{n}}}{\sum \limits_{k=1}^{M}}{\big\langle {\big({\mathbf{a}^{m}}\big)^{2}}{\mathbf{p}^{k}}\big\rangle _{n}}{\int _{-\infty }^{\infty }}{\big(K(z)z\big)^{2}}{\big({g^{(k)}}({x_{0}}-{h_{n}}z)\big)^{2}}{f^{(k)}}({x_{0}}-{h_{n}}z)dz\\ {} & \hspace{1em}+\frac{1}{n{h_{n}}}{\sum \limits_{k=1}^{M}}{\sigma _{(k)}^{2}}{\big\langle {\big({\mathbf{a}^{m}}\big)^{2}}{\mathbf{p}^{k}}\big\rangle _{n}}{\int _{-\infty }^{\infty }}{\big(K(z)z\big)^{2}}{f^{(k)}}({x_{0}}-{h_{n}}z)dz.\end{aligned}\]
So, by Assumptions 2, 3 and 4 for some $C\lt \infty $, we obtain
This with (16) yields
By the same way we obtain
(17)
\[ \frac{{S_{XY}}}{n{h_{n}^{2}}}\stackrel{\text{P}}{\longrightarrow }{f^{(m)}}({x_{0}}){g^{(m)}}({x_{0}}){\int _{-\infty }^{+\infty }}K(z)zdz,\hspace{1em}\text{as}\hspace{2.5pt}n\to \infty .\]
\[\begin{aligned}{}\frac{{S_{X}}}{n{h_{n}^{2}}}& \stackrel{\text{P}}{\longrightarrow }{f^{(m)}}({x_{0}}){\int _{-\infty }^{+\infty }}zK(z)dz,\\ {} \frac{{S_{Y}}}{n{h_{n}}}& \stackrel{\text{P}}{\longrightarrow }{f^{(m)}}({x_{0}}){g^{(m)}}({x_{0}}){\int _{-\infty }^{+\infty }}K(z)dz,\\ {} \frac{{S_{1}}}{n{h_{n}}}& \stackrel{\text{P}}{\longrightarrow }{f^{(m)}}({x_{0}}){\int _{-\infty }^{+\infty }}K(z)dz,\end{aligned}\]
as $n\to \infty $. Combining this with (11), (15) and (17) we obtain
\[\begin{aligned}{}{\hat{g}_{n}^{LL(m)}}({x_{0}})& =\frac{\frac{{S_{Y}}}{n{h_{n}}}\frac{{S_{XX}}}{n{h_{n}^{3}}}-\frac{{S_{X}}}{n{h_{n}^{2}}}\frac{{S_{XY}}}{n{h_{n}^{2}}}}{\frac{{S_{1}}}{n{h_{n}}}\frac{{S_{XX}}}{n{h_{n}^{3}}}-{(\frac{{S_{X}}}{n{h_{n}^{2}}})^{2}}}\stackrel{\text{P}}{\longrightarrow }\\ {} & \stackrel{\text{P}}{\longrightarrow }\frac{{g^{(m)}}({x_{0}}){(f({x_{0}}))^{2}}\Delta (K)}{{(f({x_{0}}))^{2}}\Delta (K)}={g^{(m)}}({x_{0}}),\hspace{1em}\text{as}\hspace{2.5pt}n\to \infty .\end{aligned}\]
(Here Assumptions 5 and 6 where used.) This is just the statement of the theorem. □5 Simulation results
5.1 Description of simulations
To assess quality of mLLE and compare it to mNWE we performed a small simulation study. In all the simulation experiments we used a two-component MVC model with concentrations defined as follows:
\[ {p_{j;n}^{1}}=\frac{j}{n},\hspace{0.2222em}{p_{j;n}^{2}}=1-\frac{j}{n},\hspace{1em}1\le j\le n.\]
The regression functions of the components were defined as
The distribution of regressor X is uniform on $[0,1]$ for both the components. The error terms had different distributions in different experiments.Performance of the estimators was examined at two points ${x_{0}}=0.5$ (an interior point of the regressor support) and ${x_{0}}=0$ (a boundary point).
For sample sizes n from 100 through 10000 we generated $B=1000$ independent samples from the model. Assuming that ${x_{0}}$ is fixed, the mLLE ${\hat{g}_{n}^{LL(m)}}({x_{0}})$ and the mNWE estimator ${\hat{g}_{n}^{NW(m)}}({x_{0}})$ were calculated by each generated sample. By the obtained samples of the estimations we calculated the observed bias
\[ \text{Bias}\big({\hat{g}_{n}^{LL(m)}}({x_{0}})\big)={\mathbf{E}_{\ast }}\big[{\hat{g}_{n}^{LL(m)}}({x_{0}})-{g^{m}}({x_{0}})\big]\]
and variance
\[ \text{Var}\big({\hat{g}_{n}^{LL(m)}}({x_{0}})\big)={\mathbf{E}_{\ast }}\big[{\big({\hat{g}_{n}^{LL(m)}}({x_{0}})-{\mathbf{E}_{\ast }}\big[{\hat{g}_{n}^{LL(m)}}({x_{0}})\big]\big)^{2}}\big].\]
(Similarly for ${\hat{g}_{n}^{NW(m)}}$.) Here $m=1,2$ and ${\mathbf{E}_{\ast }}[\cdot ]$ is a sample mean. For all of the experiments we used the Epanechnikov kernel and the bandwidth parameter was ${h_{n}}=H{n^{-1/5}}$, where $H={H_{opt}^{NW}}=1.816$ is a theoretically optimal value for mNWE, see [1].Three experiments were performed.
-
• Experiment 1. In this experiment the regression errors’ distribution was Gaussian, ${\varepsilon _{j}}\sim N(0,1.25)$ for both components, so this is the case of light-tailed errors.
-
• Experiment 2. The regression errors were generated from the Student-T distribution with 10 degrees of freedom, i.e. ${\varepsilon _{j}}\sim T(10)$ for both components. In this case the the variance of errors is the same as in the previous case, but distribution is heavy-tailed.
-
• Experiment 3. In this experiment the errors’ distributions were different in the first and second components: ${\varepsilon _{j}}|\{{\kappa _{j}}=1\}\sim N(0,1.25)$ and ${\varepsilon _{j}}|\{{\kappa _{j}}=2\}\sim T(10)$.
For all of experiments, the numerical results are shown only for the first component of mixture, since the biases and variances of the estimators for the second component are nearly of the same magnitude, so they are not of significant interest.
5.2 Performance of mLLE
The results of Experiment 1 for mLLE are presented in Table 1.
Table 1.
Experiment 1 results on the mLLE
${x_{0}}=0$ | ${x_{0}}=0.5$ | |||
n | Bias | Var | Bias | Var |
100 | −0.14934 | 9.78188 | 0.0783 | 0.05874 |
250 | −0.04822 | 0.15793 | 0.0709 | 0.02123 |
500 | −0.03528 | 0.09786 | 0.05721 | 0.01149 |
1000 | −0.01801 | 0.04796 | 0.04349 | 0.00697 |
2500 | −0.01372 | 0.0254 | 0.02951 | 0.00329 |
5000 | −0.01285 | 0.01394 | 0.02405 | 0.00193 |
10000 | −0.00887 | 0.00828 | 0.01492 | 0.0011 |
In this experiment the bias of mLLE at the boundary point $x=0$ was even smaller in magnitude then at the inner point $x=0.5$ for large samples $(n\ge 250)$. The variance of mLLE was smaller at the point $x=0.5$. For the sample size $n=100$ the mLLE had extremely large variance at $x=0$. Since only nearly 50 observations in such samples belong to the first component, this is a very small sample size for a nonparametric estimation.
The results of Experiment 2 that are shown in Table 2.
Table 2.
Experiment 2 results on the mLLE
${x_{0}}=0$ | ${x_{0}}=0.5$ | |||
n | Bias | Var | Bias | Var |
100 | −0.07368 | 0.50335 | 0.06813 | 0.05452 |
250 | −0.0472 | 0.19456 | 0.06015 | 0.02402 |
500 | −0.02049 | 0.09438 | 0.0523 | 0.01202 |
1000 | −0.02602 | 0.05402 | 0.03873 | 0.00683 |
2500 | −0.01697 | 0.02364 | 0.03016 | 0.00322 |
5000 | −0.00879 | 0.01374 | 0.02307 | 0.00171 |
10000 | −0.01294 | 0.00728 | 0.01571 | 0.00112 |
By these results we observe that the heavy-tailed regression errors did not significantly affect the mLLE performance. The bias at the boundary point is still of the same magnitude as in the inner point of the regressor support.
The results of Experiment 3 are shown in Table 3.
Table 3.
Experiment 3 results on the mLLE
${x_{0}}=0$ | ${x_{0}}=0.5$ | |||
n | Bias | Var | Bias | Var |
100 | −0.0646 | 0.47884 | 0.06129 | 0.05877 |
250 | −0.06522 | 0.18535 | 0.06101 | 0.02189 |
500 | −0.0309 | 0.09735 | 0.05424 | 0.0113 |
1000 | −0.03353 | 0.05129 | 0.04028 | 0.00645 |
2500 | −0.02261 | 0.02417 | 0.02939 | 0.00298 |
5000 | −0.01109 | 0.01384 | 0.02022 | 0.00187 |
10000 | −0.01169 | 0.00796 | 0.01528 | 0.00099 |
In this experiment we observe the same pattern of decrease of mLLE biases and variances as in Experiments 1 and 2. So difference of the errors distributions caused no significant effect on the estimator.
5.3 Results on comparison of mLLE and mNWE
To compare performance of mLLE and mNWE we calculated the ratios of biases for these two estimators $\text{Bias}({\hat{g}_{n}^{LL(1)}}({x_{0}}))/\text{Bias}({\hat{g}_{n}^{NW(1)}}({x_{0}}))$ and the ratios of their variances $\text{Var}({\hat{g}_{n}^{LL(1)}}({x_{0}}))/\text{Var}({\hat{g}_{n}^{NW(1)}}({x_{0}}))$.
The results of the experiments for ${x_{0}}=0.5$ are presented in Figure 1. Both the biases and variances ratios are very close to 1 even for small sample sizes. They apparently tend to 1 when n increases.
Fig. 1.
Ratios of biases (left panel) and variances (right panel) of mLLE and mNWE at ${x_{0}}=0.5$ in Experiment 1 (∘), Experiment 2 (△) and Experiment 3 (+)
The same ratios for ${x_{0}}=0$ are presented in Figure 2. (Extremely high ratio $\text{Var}({\hat{g}_{n}^{LL(1)}}({x_{0}}))/\text{Var}({g_{n}^{NW(1)}}({x_{0}}))=109.92$ for $n=100$ in Experiment 1 is not presented to avoid collapsing of all other points.)
Fig. 2.
Ratios of biases (left panel) and variances (right panel) of mLLE and mNWE at ${x_{0}}=0$ in Experiment 1 (∘), Experiment 2 (△) and Experiment 3 (+)
The left panel of Figure 2 shows that the bias of mLLE is significantly smaller then of mNWE at the boundary point ${x_{0}}=0$. The biases’ ratio for $n=10000$ vary from 0.097 in Experiment 1 to 0.14 in Experiment 2. On the other hand, the variances of mLLE are larger then of mNWE (Three to four times larger for $n=10000$). Recall that the bandwidth h in the experiments was taken to be asymptotically optimal for mNWE, not for mLLE. Increasing h one can achieve a reduction in variance due to a moderate increase of the bias.
As an unknown referee noted, it can be of interest to compare the mean squared errors of the estimators, i.e.
In all the experiments the ratio
\[ R=\mathbf{MSE}\big[{\hat{g}_{n}^{LL(m)}}({x_{0}})\big]/\mathbf{MSE}\big[{\hat{g}_{n}^{NW(m)}}({x_{0}})\big]\]
was near to 1 at ${x_{0}}=0.5$ for all sample sizes. At ${x_{0}}=0$, R was larger then 1 for smaller n (from 2.280 in Experiment 2 to 2.31 in Experiment 1 for $n=250$) and smaller then 1 for $n=10000$ (from 0.8037 in Experiment 1, to 0.7461 in Experiment 3). So, the mLLE estimators outperformed the mNWE ones for large sample sizes.6 Conclusion
We have discussed a modification of local linear regression technique for the nonparametric estimation in the model of regression mixture. Consistency of the obtained estimator is demonstrated. Results of simulations confirm significant reduction of boundary effect by the use of mLLE. Much efforts are needed to develop a practical algorithm for bandwidth selection, especially at boundary points of the regressor distribution support.