Finite mixtures with different regression models for different mixture components naturally arise in statistical analysis of biological and sociological data. In this paper a model of mixtures with varying concentrations is considered in which the mixing probabilities are different for different observations. A modified local linear estimation (mLLE) technique is developed to estimate the regression functions of the mixture component nonparametrically. Consistency of the mLLE is demonstrated. Performance of mLLE and a modified Nadaraya–Watson estimator (mNWE) is assessed via simulations. The results confirm that the mLLE technique overcomes the boundary effect typical to the NWE.

Models of mixtures with varying concentrations (MVC) naturally arise in statistical analysis of sociological and biomedical data [

Regression models provide powerful technology of dependencies analysis in multivariate data. For applications of parametric regression mixture models in behavioral science, see [

This paper focuses on the investigation of the modified LLR estimator properties. In Section

Consider a sample with

For each subject one observes a set of observed variables

In what follows we assume that

In this paper we restrict ourselves to bivariate vectors of observed features. For each subject

Let

Our aim is to estimate the unknown regression function

Let us recall how nonparametric regression estimators are defined in the case of homogeneous data. That is, we will assume here that (

The classical Nadaraya–Watson estimator (NWE)

To construct the local linear estimator (LLE) one considers the localized least squares functional of the form

To modify the estimators for MVC data we will need the minimax coefficients for components distribution estimation defined in [

Let

The averaging operation will be denoted by angle brackets:

In what follows we assume that

Here and below we assume that

In [

To derive a modified local linear estimator (mLLE) we start with the weighted least squares functional

The consistency conditions of the weightened local-linear estimator are given in the following theorem.

Assumption 2 of the theorem is necessary. This is also required for the LLR estimator in case of homogeneous sample, see Theorem 1 of [

If the kernel

If the kernel

The requirement that

Condition

Multiplying both the numerator and denominator in (

We will start from

Let us bound the variance of

By (

So

Combining this with (

Consider now asymptotics of

The substitution (

Then

This with (

To assess quality of mLLE and compare it to mNWE we performed a small simulation study. In all the simulation experiments we used a two-component MVC model with concentrations defined as follows:

Performance of the estimators was examined at two points

For sample sizes

Three experiments were performed.

For all of experiments, the numerical results are shown only for the first component of mixture, since the biases and variances of the estimators for the second component are nearly of the same magnitude, so they are not of significant interest.

The results of Experiment 1 for mLLE are presented in Table

Experiment 1 results on the mLLE

Bias | Var | Bias | Var | |

100 | −0.14934 | 9.78188 | 0.0783 | 0.05874 |

250 | −0.04822 | 0.15793 | 0.0709 | 0.02123 |

500 | −0.03528 | 0.09786 | 0.05721 | 0.01149 |

1000 | −0.01801 | 0.04796 | 0.04349 | 0.00697 |

2500 | −0.01372 | 0.0254 | 0.02951 | 0.00329 |

5000 | −0.01285 | 0.01394 | 0.02405 | 0.00193 |

10000 | −0.00887 | 0.00828 | 0.01492 | 0.0011 |

In this experiment the bias of mLLE at the boundary point

The results of Experiment 2 that are shown in Table

Experiment 2 results on the mLLE

Bias | Var | Bias | Var | |

100 | −0.07368 | 0.50335 | 0.06813 | 0.05452 |

250 | −0.0472 | 0.19456 | 0.06015 | 0.02402 |

500 | −0.02049 | 0.09438 | 0.0523 | 0.01202 |

1000 | −0.02602 | 0.05402 | 0.03873 | 0.00683 |

2500 | −0.01697 | 0.02364 | 0.03016 | 0.00322 |

5000 | −0.00879 | 0.01374 | 0.02307 | 0.00171 |

10000 | −0.01294 | 0.00728 | 0.01571 | 0.00112 |

By these results we observe that the heavy-tailed regression errors did not significantly affect the mLLE performance. The bias at the boundary point is still of the same magnitude as in the inner point of the regressor support.

The results of Experiment 3 are shown in Table

Experiment 3 results on the mLLE

Bias | Var | Bias | Var | |

100 | −0.0646 | 0.47884 | 0.06129 | 0.05877 |

250 | −0.06522 | 0.18535 | 0.06101 | 0.02189 |

500 | −0.0309 | 0.09735 | 0.05424 | 0.0113 |

1000 | −0.03353 | 0.05129 | 0.04028 | 0.00645 |

2500 | −0.02261 | 0.02417 | 0.02939 | 0.00298 |

5000 | −0.01109 | 0.01384 | 0.02022 | 0.00187 |

10000 | −0.01169 | 0.00796 | 0.01528 | 0.00099 |

In this experiment we observe the same pattern of decrease of mLLE biases and variances as in Experiments 1 and 2. So difference of the errors distributions caused no significant effect on the estimator.

To compare performance of mLLE and mNWE we calculated the ratios of biases for these two estimators

The results of the experiments for

Ratios of biases (left panel) and variances (right panel) of mLLE and mNWE at

The same ratios for

Ratios of biases (left panel) and variances (right panel) of mLLE and mNWE at

The left panel of Figure

As an unknown referee noted, it can be of interest to compare the mean squared errors of the estimators, i.e.

In all the experiments the ratio

We have discussed a modification of local linear regression technique for the nonparametric estimation in the model of regression mixture. Consistency of the obtained estimator is demonstrated. Results of simulations confirm significant reduction of boundary effect by the use of mLLE. Much efforts are needed to develop a practical algorithm for bandwidth selection, especially at boundary points of the regressor distribution support.

We are thankful to unknown referees for their interest to our paper and fruitful comments.