Principal Component Analysis (PCA) is a classical technique of dimension reduction for multivariate data. When the data are a mixture of subjects from different subpopulations one can be interested in PCA of some (or each) subpopulation separately. In this paper estimators are considered for PC directions and corresponding eigenvectors of subpopulations in the nonparametric model of mixture with varying concentrations. Consistency and asymptotic normality of obtained estimators are proved. These results allow one to construct confidence sets for the PC model parameters. Performance of such confidence intervals for the leading eigenvalues is investigated via simulations.
We consider a mixture with varying concentrations in which each component is described by a nonlinear regression model. A modified least squares estimator is used to estimate the regressions parameters. Asymptotic normality of the derived estimators is demonstrated. This result is applied to confidence sets construction. Performance of the confidence sets is assessed by simulations.
A multivariate errors-in-variables (EIV) model with an intercept term, and a polynomial EIV model are considered. Focus is made on a structural homoskedastic case, where vectors of covariates are i.i.d. and measurement errors are i.i.d. as well. The covariates contaminated with errors are normally distributed and the corresponding classical errors are also assumed normal. In both models, it is shown that (inconsistent) ordinary least squares estimators of regression parameters yield an a.s. approximation to the best prediction of response given the values of observable covariates. Thus, not only in the linear EIV, but in the polynomial EIV models as well, consistent estimators of regression parameters are useless in the prediction problem, provided the size and covariance structure of observation errors for the predicted subject do not differ from those in the data used for the model fitting.
A general jackknife estimator for the asymptotic covariance of moment estimators is considered in the case when the sample is taken from a mixture with varying concentrations of components. Consistency of the estimator is demonstrated. A fast algorithm for its calculation is described. The estimator is applied to construction of confidence sets for regression parameters in the linear regression with errors in variables. An application to sociological data analysis is considered.
Confidence ellipsoids for linear regression coefficients are constructed by observations from a mixture with varying concentrations. Two approaches are discussed. The first one is the nonparametric approach based on the weighted least squares technique. The second one is an approximate maximum likelihood estimation with application of the EM-algorithm for the estimates calculation.
This paper deals with a homoskedastic errors-in-variables linear regression model and properties of the total least squares (TLS) estimator. We partly revise the consistency results for the TLS estimator previously obtained by the author [18]. We present complete and comprehensive proofs of consistency theorems. A theoretical foundation for construction of the TLS estimator and its relation to the generalized eigenvalue problem is explained. Particularly, the uniqueness of the estimate is proved. The Frobenius norm in the definition of the estimator can be substituted by the spectral norm, or by any other unitarily invariant norm; then the consistency results are still valid.
We consider a multivariable functional errors-in-variables model $AX\approx B$, where the data matrices A and B are observed with errors, and a matrix parameter X is to be estimated. A goodness-of-fit test is constructed based on the total least squares estimator. The proposed test is asymptotically chi-squared under null hypothesis. The power of the test under local alternatives is discussed.
We consider the two-line fitting problem. True points lie on two straight lines and are observed with Gaussian perturbations. For each observed point, it is not known on which line the corresponding true point lies. The parameters of the lines are estimated.
This model is a restriction of the conic section fitting model because a couple of two lines is a degenerate conic section. The following estimators are constructed: two projections of the adjusted least squares estimator in the conic section fitting model, orthogonal regression estimator, parametric maximum likelihood estimator in the Gaussian model, and regular best asymptotically normal moment estimator.
The conditions for the consistency and asymptotic normality of the projections of the adjusted least squares estimator are provided. All the estimators constructed in the paper are equivariant. The estimators are compared numerically.
We consider a finite mixture model with varying mixing probabilities. Linear regression models are assumed for observed variables with coefficients depending on the mixture component the observed subject belongs to. A modification of the least-squares estimator is proposed for estimation of the regression coefficients. Consistency and asymptotic normality of the estimates is demonstrated.