A general jackknife estimator for the asymptotic covariance of moment estimators is considered in the case when the sample is taken from a mixture with varying concentrations of components. Consistency of the estimator is demonstrated. A fast algorithm for its calculation is described. The estimator is applied to construction of confidence sets for regression parameters in the linear regression with errors in variables. An application to sociological data analysis is considered.

Finite Mixture Models (FMM) are widely used in the analysis of biological, economic and sociological data. For a comprehensive survey of different statistical techniques based on FMMs, see [

In this paper we consider application of the jackknife technique to the estimation of asymptotic covariance matrix (the covariance matrix for asymptotically normal estimator, ACM) in the case when the data are described by the MVC model. The jackknife is a well-known resampling technique usually applied to i.i.d. samples (see Section 5.5.2 in [

We obtained a general theorem on consistency of the jackknife estimators for ACM for moment estimators in the MVC models and apply this result to construct confidence sets for regression coefficients in linear errors-in-variables models for MVC data. On general errors-in-variables models, see [

The rest of the paper is organized as follows. In Section

We consider a dataset in which each observed subject

We observe variables of

So, the distribution of

To estimate

To obtain unbiased estimates in (

In this notation the unbiasedness condition (

There can be many choices of

In fact, in [

To describe the asymptotic behavior of

Notice that

So, under this assumption,

This theorem is a simple corollary of Theorem 4.3 in [

In what follows, we will consider unknown parameters of the component distribution

This theorem is a simple implication from our Theorem

So,

The jackknife is one of most general techniques of ACM estimation. Let

For

Direct calculation of

Notice that

Let us denote

In this section we consider a mixture of simple linear regressions with errors in variables. A modification of orthogonal regression estimation technique was proposed for the regression coefficients estimation in [

Recall the errors-in-variables regression model in the context of mixture with varying concentrations.

We consider the case when each subject

The true values of

As in Section

In the case of homogeneous sample, when there is no mixture, the classical way to estimate

The ACM

Notice that the estimator

So we can apply the technique developed in Sections

This theorem is a simple combination of Theorem

In what follows we assume that

We can construct a confidence set (ellipsoid) for the unknown parameter

Consider

To assess performance of the proposed technique we performed a small simulation study. In the following three experiments we calculated covering frequencies of confidence sets for regression coefficients in the model (

In all experiments for sample size

Then the numbers of cases when the confidence set covers the true value of the estimated parameter were calculated and divided by

In all the experiments we considered two-component mixture (

In this experiment we let

The covering frequencies for confidence sets are presented in Table

In this experiment we enlarged the variance of the error terms taking it as

Here we consider the case when the errors distributions are heavy tailed. We generate the data with

Covering frequencies for confidence sets in Experiment 1

100 | 0.935 | 0.961 | 0.948 | 0.936 | 0.987 | 0.957 |

250 | 0.953 | 0.960 | 0.950 | 0.964 | 0.980 | 0.950 |

500 | 0.940 | 0.954 | 0.939 | 0.958 | 0.973 | 0.962 |

1000 | 0.946 | 0.949 | 0.943 | 0.954 | 0.971 | 0.935 |

2500 | 0.961 | 0.949 | 0.948 | 0.937 | 0.953 | 0.947 |

5000 | 0.947 | 0.949 | 0.948 | 0.954 | 0.956 | 0.958 |

Covering frequencies for confidence sets in Experiment 2

100 | 0.969 | 0.942 | 0.918 | 0.950 | 0.974 | 0.958 |

250 | 0.958 | 0.956 | 0.945 | 0.946 | 0.962 | 0.959 |

500 | 0.949 | 0.945 | 0.936 | 0.953 | 0.966 | 0.960 |

1000 | 0.959 | 0.946 | 0.954 | 0.947 | 0.958 | 0.942 |

2500 | 0.956 | 0.949 | 0.950 | 0.947 | 0.961 | 0.958 |

5000 | 0.953 | 0.941 | 0.952 | 0.955 | 0.955 | 0.968 |

Covering frequencies for confidence sets in Experiment 3

100 | 0.935 | 0.961 | 0.948 | 0.936 | 0.987 | 0.957 |

250 | 0.953 | 0.960 | 0.950 | 0.964 | 0.980 | 0.950 |

500 | 0.940 | 0.954 | 0.939 | 0.958 | 0.973 | 0.962 |

1000 | 0.946 | 0.949 | 0.943 | 0.954 | 0.971 | 0.935 |

2500 | 0.961 | 0.949 | 0.948 | 0.937 | 0.953 | 0.947 |

5000 | 0.947 | 0.949 | 0.948 | 0.954 | 0.956 | 0.958 |

We would like to demonstrate advantages of the proposed technique by application to the analysis of the External Independent Testing (EIT) data (see [

In this presentation we consider only the data on scores on two subjects:

Our aim is to investigate how dependence between Ukr and Math scores differs for examinees grown up in different environments There can be, e.g. an environment of adherents of Ukrainian culture and Ukrainian state, or in the environment of persons critical toward the Ukrainan independence. EIT-2016 doesn’t contain information on such issues. So we use data on Ukrainian Parliament (Verhovna Rada) election results to deduce approximate proportions of adherents of different political choices in different regions of Ukraine.

We divided adherents of 29 parties and blocks that took part in the elections into three large groups, which are the components of our mixture:

Pro-Ukrainian persons, voting for the parties that then created the ruling coalition (BPP, Batkivschyna, Narodny Front, Radicals and Samopomich)

Contra-Ukrainian persons who voted for the Opposition block, voted against all or voted for small parties which where under 5% threshold on these elections.

Neutral persons who did not took part in the voting.

Combining these data with EIT-2016 we obtain the sample

In [

So, in this presentation, we assumed that the data are described by the model (

Confidence sets for coefficients of regression between Math and Ukr

Pro | Contra | Neutral | ||||

low | upp | low | upp | low | upp | |

40.12 | 40.22 | 236.3 | 240.1 | 84.21 | 87.19 | |

0.8562 | -0.366 | -0.345 | -2.80 | 0.335 | 0.359 |

In this model we calculated the confidence intervals of the level

The orthogonal regression lines corresponding to different components are presented on Fig

Estimated orthogonal regression lines for EIT-2016 data

These results have simple and plausible explanation. Say, in the Pro-component the success in Ukr positively correlates with the general school successes, so with Math scores, too. It is natural for persons who are interested in Ukrainian culture and literature. In the Contra-component the correlation is negative. Why? The persons with high Math grades in this component do not feel the need to learn Ukrainian. But the persons with less success in Math try to improve their average score (by which the admission to universities is made) by increasing their Ukr score. The Neutral component shows positive correlation between Math and Ukr, but it is less then the correlation in the Pro-component.

Surely, these explanations are too simple to be absolutely correct. We consider them only as examples of hypotheses which can be deduced from the data by the proposed technique.

To demonstrate Theorem

By definition,

Then by (

Let

Let

By the Chebyshov inequality we obtain that for some

Lemma is proved. □

Let

We start from (

Let us show (

Combining these estimates with (

Combining (

We introduced a modification of the jackknife technique for the ACM estimation for moment estimators by observations from mixtures with varying concentrations. A fast algorithm is proposed which implements this technique. Consistency of derived estimator is demonstrated. Results of simulations demonstrate its practical applicability for sample sizes