Prediction in polynomial errors-in-variables models

Kukush, Alexander; Senko, Ivan

doi:10.15559/20-VMSTA154

Abstract

A multivariate errors-in-variables (EIV) model with an intercept term, and a polynomial EIV model are considered. Focus is made on a structural homoskedastic case, where vectors of covariates are i.i.d. and measurement errors are i.i.d. as well. The covariates contaminated with errors are normally distributed and the corresponding classical errors are also assumed normal. In both models, it is shown that (inconsistent) ordinary least squares estimators of regression parameters yield an a.s. approximation to the best prediction of response given the values of observable covariates. Thus, not only in the linear EIV, but in the polynomial EIV models as well, consistent estimators of regression parameters are useless in the prediction problem, provided the size and covariance structure of observation errors for the predicted subject do not differ from those in the data used for the model fitting.

1 Introduction

We deal with errors-in-variables (EIV) models which are widely used in system identification [10], epidemiology [2], econometrics [12], etc. In such regression models (with unknown parameter β), the response variable y depends on the covariates z and ξ, where z is observed precisely and ξ is observed with error. We consider the classical measurement error δ, i.e., instead of ξ the surrogate data $x=\xi +\delta $ is observed; moreover, the model is structural, i.e. z, ξ and δ are mutually independent, and we have i.i.d. copies of the model (${z_{i}}$, ${\xi _{i}}$, ${\delta _{i}}$, ${x_{i}}={\xi _{i}}+{\delta _{i}}$, ${y_{i}}$), $i=1,\dots ,n$. The measurement error can be nondifferential, when the distribution of y given $\left(\xi ,z,x\right)$ depends only on $\left(\xi ,z\right)$, and differential, otherwise [2, Section 2.5].

The present paper is devoted to the prediction of the response variable from ξ and z. Based on the observations (${y_{i}}$, ${z_{i}}$, ${x_{i}}$), $i=1,\dots ,n$, and given new values ${z_{0}}$ and ${x_{0}}$ of z and x variables, we want to predict either the new ${y_{0}}$ (this procedure is called individual prediction) or the exact relation ${\eta _{0}}=\operatorname{\mathbf{E}}\left[\left.{y_{0}}\right|{z_{0}},{\xi _{0}}\right]$, where ${\xi _{0}}$ is a new value for ξ (this procedure is called mean prediction). Both prediction problems are important in econometrics [5]. The individual prediction is used in the Leave-one-out cross-validation procedure.

The best mean squared error individual predictor is

(1)

\[ {\hat{y}_{0}}=\operatorname{\mathbf{E}}\left[\left.{y_{0}}\right|{z_{0}},{x_{0}}\right]\]

and the best mean squared error predictor of ${\eta _{0}}$ is

(2)

\[ {\hat{\eta }_{0}}=\operatorname{\mathbf{E}}\left[\left.{\eta _{0}}\right|{z_{0}},{x_{0}}\right].\]

For the nondifferential measurement error,

\[ {\hat{\eta }_{0}}=\operatorname{\mathbf{E}}\left[\hspace{2.5pt}\operatorname{\mathbf{E}}\left[\left.{y_{0}}\right|{z_{0}},{\xi _{0}},{x_{0}}\right]\hspace{2.5pt}\right|{z_{0}},{x_{0}}]=\operatorname{\mathbf{E}}\left[\left.{y_{0}}\right|{z_{0}},{x_{0}}\right]={\hat{y}_{0}},\]

and the best mean predictor coincides with the best individual predictor, but this needs not to hold for the differential measurement error.

Both predictors (1) and (2) are unfeasible, because they involve unknown model parameters. Our goal is to construct consistent estimators of the predictors as the sample size n grows.

The nonparametric individual prediction under errors in covariates is studied in [7]. Below we consider only parametric models.

For scalar linear EIV models with normally distributed ξ and δ, it is stated in [4, Section 2.5.1] that the ordinary least squares (OLS) predictor should be used even when dealing with the EIV model. This is quite surprising, since the OLS estimator of β is inconsistent due to the attenuation effect [4]. In fact, there is no surprise that in a Gaussian model the linear OLS estimator provides a consistent prediction, since the Gaussian dependence is always linear. In the present paper, we consider a non-Gaussian regression model, since the distribution of the observable covariate z is not assumed Gaussian; therefore, the consistency of OLS predictions in such a model is a nontrivial feature.

We confirm the assertion, that the OLS estimator yields a suitable prediction under the model validity, for two kinds of EIV models: multivariate linear and polynomial. For this purpose, we just follow the recommendation of [4, Section 2.6] and analyze the regression of y on the observable z and x. In other nonlinear EIV models, the OLS predictor (contaminated from the initial regression y on $\left(z,\xi \right)$, where we naively substitute x for ξ) is inconsistent; instead the least-squares predictor can be used from the regression y on $\left(z,x\right)$.

The paper is organized as follows. In Sections 2 and 3, we state the results on prediction in multivariate linear and polynomial EIV models, respectively. Section 4 studies briefly some other nonlinear EIV models, and Section 5 concludes.

Through the paper, all vectors are column ones, $\operatorname{\mathbf{E}}$ stands for the expectation and acts as an operator on the total product, and $\operatorname{\mathbf{Cov}}\left(x\right)$ denotes the covariance matrix of a random vector x. By ${I_{p}}$ we denote the identity matrix of size p. For symmetric matrices A and B of the same size, $A>B$ and $A\ge B$ means that $A-B$ is positive definite or positive semidefinite, respectively.

2 Prediction in a multivariate linear EIV model

2.1 Model and main assumptions

Consider a multivariate linear EIV model with the intercept term (structural case):

(3)

\[\begin{aligned}{}y& =b+{C^{T}}z+{B^{T}}\xi +e+\epsilon ,\end{aligned}\]

(4)

\[\begin{aligned}{}x& =\xi +\delta .\end{aligned}\]

Here the random vector y is the response variable distributed in ${\mathbb{R}^{d}}$; the random vector z is the observable covariate distributed in ${\mathbb{R}^{q}}$, the random vector ξ is the unobservable (latent) covariate distributed in ${\mathbb{R}^{m}}$; x is the surrogate data observed instead of ξ; $e+\epsilon $ is the random error in y, δ is the measurement error in the latent covariate; $C\in {\mathbb{R}^{q\times d}}$, $B\in {\mathbb{R}^{m\times d}}$ and $b\in {\mathbb{R}^{d}}$ contain unknown regression parameters, where b is the intercept term. The random vector e models the error in the regression equation, and ϵ models the measurement error in y; ϵ can be correlated with δ.

Such models are studied, e.g., in [11, 10, 9] in relation to system identification problems and numerical linear algebra. We list the model assumptions.

(i) Three vectors z, ξ, e and the augmented measurement error vector ${\left({\epsilon ^{T}},{\delta ^{T}}\right)^{T}}$ are independent with finite 2nd moments; the errors ϵ and δ can be correlated.
(ii) The covariance matrices ${\Sigma _{z}}:=\operatorname{\mathbf{Cov}}(z)$ and ${\Sigma _{x}}:=\operatorname{\mathbf{Cov}}(x)$ are nonsingular.
(iii) The errors e, ϵ and δ have zero means.
(iv) The errors ϵ, δ and covariate ξ are jointly Gaussian.

Introduce the cross-covariance matrix

\[ {\Sigma _{\epsilon \delta }}:=\operatorname{\mathbf{E}}\epsilon {\delta ^{T}}.\]

The classical measurement error δ is nondifferential if, and only if, ϵ and δ are independent, i.e. ${\Sigma _{\epsilon \delta }}=0$ (see Section 1 for the definition of the nondifferential error).

We denote also

(5)

\[\begin{array}{l}\displaystyle \mu =\operatorname{\mathbf{E}}x,\hspace{2em}{\Sigma _{\xi }}=\operatorname{\mathbf{Cov}}(\xi ),\hspace{2em}\\ {} \displaystyle {\Sigma _{e}}=\operatorname{\mathbf{Cov}}(e),\hspace{2em}{\Sigma _{\epsilon }}=\operatorname{\mathbf{Cov}}(\epsilon ),\hspace{2em}{\Sigma _{\delta }}=\operatorname{\mathbf{Cov}}(\delta ),\\ {} \displaystyle {\Sigma _{11}}=block‐diag({\Sigma _{\xi }},{\Sigma _{\epsilon }}),\hspace{2em}{\Sigma _{12}}=\left[\substack{{\Sigma _{\xi }}\\ {} {\Sigma _{\epsilon \delta }}}\right],\hspace{2em}{\Sigma _{22}}={\Sigma _{x}}.\end{array}\]

Thus, ${\Sigma _{11}}$ is a block-diagonal matrix, and sometimes we will use ${\Sigma _{22}}$ for the covariance matrix of x.

2.2 Regression of y on z and x

Lemma 1.

Assume conditions (i) to (iv).

(a) The response variable (3) can be represented as

(6)
\[ y={b_{x}}+{C^{T}}z+{B_{x}^{T}}x+u,\]
where z, x, and u are independent, C remains unchanged compared with (3), $\operatorname{\mathbf{E}}u=0$, $\operatorname{\mathbf{E}}\| u{\| ^{2}}<\infty $, and

(7)
\[ {b_{x}}=b+{B^{T}}{\Sigma _{\delta }}{\Sigma _{x}^{-1}}\mu -{\Sigma _{\epsilon \delta }}{\Sigma _{x}^{-1}}\mu ,\]

(8)
\[ {B_{x}^{T}}={B^{T}}{\Sigma _{\xi }}{\Sigma _{x}^{-1}}+{\Sigma _{\epsilon \delta }}{\Sigma _{x}^{-1}}.\]
(b) Assume additionally the following condition:
- (v) Either ${\Sigma _{e}}$ or ${\Sigma _{11}}-{\Sigma _{12}}{\Sigma _{22}^{-1}}{\Sigma _{12}^{T}}$ is positive definite.
Then the error term u in (6) has a positive definite covariance matrix, ${\Sigma _{u}}$.

Proof.

(a) Introduce the jointly Gaussian vectors

\[ {x^{(1)}}=\left(\substack{\xi \\ {} \epsilon }\right),\hspace{2em}{x^{(2)}}=x.\]

We have

\[\begin{array}{l}\displaystyle {\mu ^{(1)}}:=\operatorname{\mathbf{E}}{x^{(1)}}=\left(\substack{\mu \\ {} 0}\right),\hspace{2em}{\mu ^{(2)}}:=\operatorname{\mathbf{E}}{x^{(2)}}=\mu ;\\ {} \displaystyle \operatorname{\mathbf{Cov}}\left({x^{(1)}}\right)={\Sigma _{11}},\hspace{2em}\operatorname{\mathbf{Cov}}\left({x^{(2)}}\right)={\Sigma _{22}},\end{array}\]

which is positive definite by assumption (ii),

\[ \operatorname{\mathbf{E}}\left[{x^{(1)}}{\left({x^{(2)}}\right)^{T}}\right]={\Sigma _{12}},\]

where the matrices ${\Sigma _{11}}$, ${\Sigma _{12}}$, ${\Sigma _{22}}$ are given in (5). Now, according to Theorem 2.5.1 [1] the conditional distribution of ${x^{(1)}}$ given ${x^{(2)}}$ is

(9)

\[\begin{array}{l}\displaystyle \left[\left.{x^{(1)}}\right|{x^{(2)}}\right]\sim \mathcal{N}\left({\mu _{1|2}},{V_{1|2}}\right),\\ {} \displaystyle {\mu _{1|2}}={\mu _{1|2}}({x^{(2)}})={\mu ^{(1)}}+{\Sigma _{12}}{\Sigma _{22}^{-1}}\left({x^{(2)}}-{\mu ^{(2)}}\right)=\left(\substack{{\Sigma _{\delta }}{\Sigma _{x}^{-1}}\mu +{\Sigma _{\xi }}{\Sigma _{x}^{-1}}x\\ {} {\Sigma _{\epsilon \delta }}{\Sigma _{x}^{-1}}(x-\mu )}\right),\\ {} \displaystyle {V_{1|2}}={\Sigma _{11}}-{\Sigma _{12}}{\Sigma _{22}^{-1}}{\Sigma _{12}^{T}}.\end{array}\]

Hence ${({\xi ^{T}},{\epsilon ^{T}})^{T}}-{\mu _{1|2}}(x)=:{({\gamma _{1}^{T}},{\gamma _{2}^{T}})^{T}}$ is uncorrelated with x and has the Gaussian distribution $\mathcal{N}\left(0,{V_{1|2}}\right)$. Therefore,

(10)

\[\begin{aligned}{}\xi & ={\Sigma _{\delta }}{\Sigma _{x}^{-1}}\mu +{\Sigma _{\xi }}{\Sigma _{x}^{-1}}x+{\gamma _{1}},\end{aligned}\]

(11)

\[\begin{aligned}{}\epsilon & ={\Sigma _{\epsilon \delta }}{\Sigma _{x}^{-1}}\left(x-\mu \right)+{\gamma _{2}}.\end{aligned}\]

Substitute (10) and (11) into (3) and obtain the desired relations (6)–(8) with

\[ u=e+{B^{T}}{\gamma _{1}}+{\gamma _{2}}.\]

Here $(z,e,x)$ and a couple $({\gamma _{1}},{\gamma _{2}})$ are independent, hence $(z,x,u)$ are independent as well. This implies the statement (a).

(b) We have

(12)

\[ \operatorname{\mathbf{Cov}}(u)={\Sigma _{e}}+\operatorname{\mathbf{Cov}}\left({B^{T}}{\gamma _{1}}+{\gamma _{2}}\right)=:{\Sigma _{u}}.\]

If ${\Sigma _{e}}>0$ then ${\Sigma _{u}}\ge {\Sigma _{e}}>0$, thus, ${\Sigma _{u}}>0$; and if ${V_{1|2}}>0$ then ${\Sigma _{u}}\ge \operatorname{\mathbf{Cov}}\left({B^{T}}{\gamma _{1}}+{\gamma _{2}}\right)>0$, thus, ${\Sigma _{u}}>0$. This accomplishes the proof of Lemma 1. □

As a particular case take a model with a univariate response and univariate regressor ξ.

Lemma 2.

Consider the model (3), (4) with $d=m=1$. Assume conditions (i), (iii), and (iv). Suppose also that

(13)

\[ {\Sigma _{z}}>0,\hspace{2em}{\Sigma _{\epsilon }}>0,\hspace{2em}{\Sigma _{\delta }}>0,\hspace{2em}\left|\operatorname{Corr}(\epsilon ,\delta )\right|<1.\]

Then expressions (6)–(8) hold true, where the error term u has a positive variance ${\sigma _{u}^{2}}={\Sigma _{u}}$.

Proof.

First suppose that ${\Sigma _{\xi }}>0$. According to Lemma 1, it is enough to check that ${V_{1|2}}$ given in (9) is positive definite.

A direct computation shows that

\[ {V_{1|2}}=\frac{1}{{\sigma _{x}^{2}}}\left(\begin{array}{c@{\hskip10.0pt}c}{\sigma _{\xi }^{2}}{\sigma _{\delta }^{2}}& -{\sigma _{\xi }^{2}}{\sigma _{\epsilon \delta }}\\ {} -{\sigma _{\xi }^{2}}{\sigma _{\epsilon \delta }}& {\sigma _{\epsilon }^{2}}{\sigma _{x}^{2}}-{\sigma _{\epsilon \delta }^{2}}\end{array}\right)=:\frac{V}{{\sigma _{x}^{2}}}.\]

Here in the scalar case we write ${\sigma _{\xi }^{2}}={\Sigma _{\xi }}$, ${\sigma _{\delta }^{2}}={\Sigma _{\delta }}$, ${\sigma _{\epsilon \delta }}={\Sigma _{\epsilon \delta }}$, etc. The matrix V is positive definite, because ${\sigma _{\xi }^{2}}{\sigma _{\delta }^{2}}>0$ and

\[ \det V={\sigma _{\delta }^{2}}{\sigma _{x}^{2}}\left({\sigma _{\epsilon }^{2}}{\sigma _{\delta }^{2}}-{\sigma _{\epsilon \delta }^{2}}\right)>0\]

due to condition (13).

Now, suppose that ${\Sigma _{\xi }}=0$. Then $\xi =\mu $ almost surely. With some computations, it can be shown that $u=e+\epsilon -{\sigma _{\epsilon \delta }}{\sigma _{\delta }^{-2}}\delta $ almost surely, whence ${\sigma _{u}^{2}}={\sigma _{e}^{2}}+{\sigma _{\epsilon }^{2}}-{\sigma _{\epsilon \delta }^{2}}{\sigma _{\delta }^{-2}}>0$. Lemma 2 is proved. □

2.3 Individual prediction

Now, consider independent copies of the multivariate model (3), (4):

\[ \left({y_{i}},{z_{i}},{\xi _{i}},{e_{i}},{\epsilon _{i}},{x_{i}},{\delta _{i}}\right),\hspace{1em}i=1,\dots ,n.\]

Based on the observations

(14)

\[ \left({y_{i}},{z_{i}},{x_{i}}\right),\hspace{1em}i=1,\dots ,n,\]

and for given ${z_{0}}$, ${x_{0}}$, we want to estimate the individual predictor ${\hat{y}_{0}}$ presented in (1) and the mean predictor ${\hat{\eta }_{0}}$ presented in (2).

Assume conditions (i) to (iv) and suppose that all model parameters are unknown. Lemma 1 implies the expansion (6) with $\operatorname{\mathbf{E}}u=0$. All the underlying random vectors have finite 2nd moments, hence

(15)

\[ {\hat{y}_{0}}:={b_{x}}+{C^{T}}{z_{0}}+{B_{x}^{T}}{x_{0}}\]

is the best mean squared error predictor of ${y_{0}}$. Since it is unfeasible, we have to estimate the coefficients ${b_{x}}$, C and ${B_{x}}$ using the sample (14). The OLS estimator $\left({\hat{b}_{x}},\hat{C},{\hat{B}_{x}}\right)$ minimizes the penalty function

\[ Q(b,C,B):={\sum \limits_{i=1}^{n}}{\left\| {y_{i}}-b-{C^{T}}{z_{i}}-{B^{T}}{x_{i}}\right\| ^{2}},\hspace{1em}b\in \mathbb{R},\hspace{2.5pt}C\in {\mathbb{R}^{q\times d}},\hspace{2.5pt}B\in {\mathbb{R}^{m\times d}}.\]

Let bar denote the average over $i=1,\dots ,n$, e.g.,

\[ \bar{y}=\frac{1}{n}{\sum \limits_{i=1}^{n}}{y_{i}},\]

and ${S_{uv}}$ denote the sample covariance matrix of u and v variables, e.g.,

(16)

\[ {S_{xy}}=\frac{1}{n}{\sum \limits_{i=1}^{n}}\left({x_{i}}-\bar{x}\right){\left({y_{i}}-\bar{y}\right)^{T}},\hspace{2em}{S_{xx}}=\frac{1}{n}{\sum \limits_{i=1}^{n}}\left({x_{i}}-\bar{x}\right){\left({x_{i}}-\bar{x}\right)^{T}},\]

etc. The OLS estimator can be computed from the relations [11]

(17)

\[ \bar{y}={\hat{b}_{x}}+{\hat{C}^{T}}\bar{z}+{\hat{B}_{x}^{T}}\bar{x},\]

(18)

\[ \left(\substack{\hat{C}\\ {} {\hat{B}_{x}}}\right)={S_{rr}^{+}}{S_{ry}},\hspace{2em}r:=\left(\substack{z\\ {} x}\right).\]

Hereafter ${A^{+}}$ is the pseudo-inverse of a square matrix A; see the properties of ${A^{+}}$ in [8]. The corresponding OLS predictor is

(19)

\[ {\tilde{y}_{0}}:={\hat{b}_{x}}+{\hat{C}^{T}}{z_{0}}+{\hat{B}_{x}^{T}}{x_{0}}.\]

Theorem 1.

Assume conditions (i) to (iv). Then ${\tilde{y}_{0}}$ presented in (19) is a strongly consistent estimator of the best predictor ${\hat{y}_{0}}$, i.e. ${\tilde{y}_{0}}\to {\hat{y}_{0}}\hspace{2.5pt}\textit{a.s. as}\hspace{2.5pt}n$ tends to infinity. Moreover,

(20)

\[ \forall \tau >0,\hspace{1em}\operatorname{\mathbf{P}}(\left.\left\| {\tilde{y}_{0}}-{\hat{y}_{0}}\| >\tau \right|{z_{0}},{x_{0}}\right)\to 0\hspace{1em}\textit{a.s. as}\hspace{2.5pt}n\to \infty .\]

Proof.

By Strong Law of Large Numbers we have a.s. as $n\to \infty $:

\[\begin{array}{l}\displaystyle {S_{rr}}\to block‐diag\left({\Sigma _{z}},{\Sigma _{x}}\right)>0,\\ {} \displaystyle {S_{ry}}\to \left(\substack{\operatorname{\mathbf{Cov}}\left(z,y\right)\\ {} \operatorname{\mathbf{Cov}}\left(x,y\right)}\right)=\left(\substack{{\Sigma _{z}}\cdot C\\ {} {\Sigma _{x}}\cdot {B_{x}}}\right),\\ {} \displaystyle \left(\substack{\hat{C}\\ {} {\hat{B}_{x}}}\right)\to \left(\substack{{\Sigma _{z}^{-1}}{\Sigma _{z}}\cdot C\\ {} {\Sigma _{x}^{-1}}{\Sigma _{x}}\cdot {B_{x}}}\right)=\left(\substack{C\\ {} {B_{x}}}\right).\end{array}\]

This convergence, relation (17) and the a.s. convergence of the sample means imply that ${\hat{b}_{x}}\to {b_{x}}\hspace{2.5pt}\text{a.s.}$ Now, both statements of Theorem 1 follow from (19) and (15). □

It is interesting to construct an asymptotic confidence region for the response ${y_{0}}$ based on the OLS predictor. Assume (i) to (iv). It holds

\[ \operatorname{\mathbf{Cov}}\left(\left.{y_{0}}-{\hat{y}_{0}}\right|{z_{0}},{x_{0}}\right)=\operatorname{\mathbf{Cov}}({u_{0}})={\Sigma _{u}},\]

see (12). Introduce the estimator

\[ {\hat{\Sigma }_{u}}=\frac{1}{n}{\sum \limits_{i=1}^{n}}\left({y_{i}}-{\hat{b}_{x}}-{\hat{C}^{T}}{z_{i}}-{\hat{B}_{x}^{T}}{x_{i}}\right){\left({y_{i}}-{\hat{b}_{x}}-{\hat{C}^{T}}{z_{i}}-{\hat{B}_{x}^{T}}{x_{i}}\right)^{T}}.\]

Theorem 2.

Suppose that conditions (i) to (iv) hold. Fix the confidence probability $1-\alpha $.

(a) Assume additionally (v) and define

(21)
\[ {E_{\alpha }}=\left\{h\in {\mathbb{R}^{d}}:{\left\| {\left({\hat{\Sigma }_{u}^{+}}\right)^{1\hspace{-0.1667em}/2}}\left(h-{\tilde{y}_{0}}\right)\right\| ^{2}}\le \frac{d}{\alpha }\right\}.\]
Then

(22)
\[ \underset{n\to \infty }{\liminf }\operatorname{\mathbf{P}}\left(\left.{y_{0}}\in {E_{\alpha }}\right|{z_{0}},{x_{0}}\right)\ge 1-\alpha .\]
(b) Let the model (3)–(4) be purely normal, i.e. z is normally distributed and $e=0$. Assume additionally that the matrix (9) is nonsingular. Define

(23)
\[ {D_{\alpha }}=\left\{h\in {\mathbb{R}^{d}}:{\left\| {\left({\hat{\Sigma }_{u}^{+}}\right)^{1\hspace{-0.1667em}/2}}\left(h-{\tilde{y}_{0}}\right)\right\| ^{2}}\le {\chi _{d\alpha }^{2}}\right\},\]
where ${\chi _{d\alpha }^{2}}$ is an upper α-quantile of ${\chi _{d}^{2}}$ distribution, i.e. $\operatorname{\mathbf{P}}\left({\chi _{d}^{2}}>{\chi _{d\alpha }^{2}}\right)=\alpha $. Then

(24)
\[ \underset{n\to \infty }{\lim }\operatorname{\mathbf{P}}\left(\left.{y_{0}}\in {D_{\alpha }}\right|{z_{0}},{x_{0}}\right)=1-\alpha .\]

Proof.

If ${b_{x}}$, C, and ${B_{x}}$ were known, then we could approximate ${\Sigma _{u}}$ as follows:

(25)

\[\begin{array}{l}\displaystyle \frac{1}{n}{\sum \limits_{i=1}^{n}}{u_{i}}{u_{i}^{T}}\to {\Sigma _{u}}\hspace{1em}\text{a.s. as}\hspace{2.5pt}n\to \infty ,\\ {} \displaystyle {u_{i}}:={y_{i}}-{b_{x}}-{C^{T}}{z_{i}}-{B_{x}^{T}}{x_{i}}.\end{array}\]

Since ${u_{i}}{u_{i}^{T}}$ is a quadratic function of the coefficients ${b_{x}}$, C, ${B_{x}}$, and the OLS estimators of those coefficients are strongly consistent, the convergence (25) remains valid if we replace all ${u_{i}}$ with the residuals

\[ {\hat{u}_{i}}:={y_{i}}-{\hat{b}_{x}}-{C^{T}}{z_{i}}-{\hat{B}_{x}^{T}}{x_{i}}.\]

Hence

(26)

\[ {\hat{\Sigma }_{u}}\to {\Sigma _{u}}\hspace{1em}\text{a.s. as}\hspace{2.5pt}n\to \infty .\]

(a) Under (v), ${\Sigma _{u}}$ is nonsingular by Lemma 1(b). It holds

\[ \operatorname{\mathbf{P}}\left(\left.{\left\| {\Sigma _{u}^{-1\hspace{-0.1667em}/2}}\left({y_{0}}-{\hat{y}_{0}}\right)\right\| ^{2}}>\frac{d}{\alpha }\right|{z_{0}},{x_{0}}\right)\le \alpha \frac{\operatorname{\mathbf{E}}{\left\| {\Sigma _{u}^{-1\hspace{-0.1667em}/2}}\cdot u\right\| ^{2}}}{d}=\alpha .\]

Since the relations (20) and (26) hold true, the relations (22), (21) follow.

(b) Again, in this purely normal model the matrix ${\Sigma _{u}}$ is nonsingular; conditional on ${z_{0}}$ and ${x_{0}}$, the difference ${y_{0}}-{\hat{y}_{0}}={u_{0}}$ has the normal distribution $\mathcal{N}\left(0,{\Sigma _{u}}\right)$. Then

\[ \operatorname{\mathbf{P}}\left(\left.{\left\| {\Sigma _{u}^{-1\hspace{-0.1667em}/2}}\left({y_{0}}-{\hat{y}_{0}}\right)\right\| ^{2}}>{\chi _{d\alpha }^{2}}\right|{z_{0}},{x_{0}}\right)=\alpha .\]

Since the relations (26) and (20) hold true, the relations (24), (23) follow. □

Remark 1.

For the univariate model with $d\hspace{0.1667em}=\hspace{0.1667em}m\hspace{0.1667em}=\hspace{0.1667em}1$, assume the conditions of Lemma 2. Then relations (22), (21) hold true. If additionally $z=0$ and $e=0$ then relations (24) and (23) are valid.

2.4 Mean prediction

Still consider the model (3), (4) under conditions (i) to (iv). We want to estimate the mean predictor ${\hat{\eta }_{0}}$ presented in (2). We have

\[\begin{array}{l}\displaystyle {\hat{\eta }_{0}}={\hat{y}_{0}}-\operatorname{\mathbf{E}}\left[\left.{e_{0}}\right|{z_{0}},{x_{0}}\right]-\operatorname{\mathbf{E}}\left[\left.{\epsilon _{0}}\right|{z_{0}},{x_{0}}\right],\\ {} \displaystyle \operatorname{\mathbf{E}}\left[\left.{e_{0}}\right|{z_{0}},{x_{0}}\right]=\operatorname{\mathbf{E}}{e_{0}}=0,\end{array}\]

and by (11),

\[ \operatorname{\mathbf{E}}\left[\left.{\epsilon _{0}}\right|{z_{0}},{x_{0}}\right]=\operatorname{\mathbf{E}}\left[\left.{\epsilon _{0}}\right|{x_{0}}\right]={\Sigma _{\epsilon \delta }}{\Sigma _{x}^{-1}}\left({x_{0}}-\mu \right).\]

Thus,

(27)

\[ {\hat{\eta }_{0}}={\hat{y}_{0}}-{\Sigma _{\epsilon \delta }}{\Sigma _{x}^{-1}}\left({x_{0}}-\mu \right).\]

Based on observations (14), strongly consistent and unbiased estimators of μ and ${\Sigma _{x}}$ are as follows:

(28)

\[ \hat{\mu }=\bar{x}=\frac{1}{n}{\sum \limits_{i=1}^{n}}{x_{i}},\]

(29)

\[ {\hat{\Sigma }_{x}}=\frac{1}{n-1}{\sum \limits_{i=1}^{n}}\left({x_{i}}-\bar{x}\right){\left({x_{i}}-\bar{x}\right)^{T}}.\]

Theorem 3.

Assume conditions (i) to (iv) and suppose that ${\Sigma _{\epsilon \delta }}$ is the only model parameter which is known. Consider the estimators (19), (28), and (29). Then

\[ {\tilde{\eta }_{0}}:={\tilde{y}_{0}}-{\Sigma _{\epsilon \delta }}{\hat{\Sigma }_{x}^{-1}}\left({x_{0}}-\hat{\mu }\right)\]

is a strongly consistent estimator of the mean predictor (2), and moreover

\[ \forall \tau >0,\hspace{1em}\operatorname{\mathbf{P}}\left(\left.\left\| {\tilde{\eta }_{0}}-{\hat{\eta }_{0}}\right\| >\tau \right|{z_{0}},{x_{0}}\right)\to 0\hspace{1em}\textit{a.s. as}\hspace{2.5pt}n\to \infty .\]

Proof.

The statement follows from relation (27), Theorem 2, and the strong consistency of the estimators $\hat{\mu }$ and $\hat{{\Sigma _{x}}}$. □

Notice that more model parameters should be known in order to construct a confidence region for ${\eta _{0}}$ around ${\tilde{\eta }_{0}}$.

3 Prediction in a polynomial EIV model

3.1 Model and main assumptions

For a fixed and known $k\ge 2$, consider a polynomial EIV model (structural case):

(30)

\[ y={c^{T}}z+{\beta _{0}}+{\beta ^{T}}{\left(\xi ,{\xi ^{2}},\dots ,{\xi ^{k}}\right)^{T}}+e+\epsilon ,\]

(31)

\[ x=\xi +\delta .\]

Here the random variable (r.v.) y is the response variable; the random vector z is the observable covariate distributed in ${\mathbb{R}^{q}}$, r.v. ξ is the unobservable covariate; x is the surrogate data observed instead of ξ; e is the random error in the equation, ϵ and δ are the measurement errors in the response and in the latent covariate; $c\in {\mathbb{R}^{q}}$, ${\beta _{0}}\in \mathbb{R}$ and $\beta ={\left({\beta _{1}},\dots ,{\beta _{k}}\right)^{T}}\in {\mathbb{R}^{k}}$ contain unknown regression parameters; ϵ and δ can be correlated.

Such models are studied, e.g., in [3, 6] and applied, for instance, in econometrics. Let us introduce the model assumptions.

(a) The random variables ξ, e and random vectors z, ${\left(\epsilon ,\delta \right)^{T}}$ are independent, with finite 2nd moments; the random variables ϵ and δ can be correlated.
(b) The covariance matrix ${\Sigma _{z}}:=\operatorname{\mathbf{Cov}}(z)$ is nonsingular, and ${\sigma _{x}^{2}}:=\operatorname{\mathbf{Var}}(x)>0$.
(c) The errors e, ϵ and δ have zero mean.
(d) The errors ϵ, δ and ξ are jointly Gaussian.

We see that assumptions (a) to (d) are similar to conditions (i) to (iv) imposed on the multivariate linear model, but now the response and latent covariate are real valued.

3.2 Regression y on z and x

Let us denote

(32)

\[ {\sigma _{\epsilon \delta }}=\operatorname{\mathbf{E}}\epsilon \delta ,\hspace{2em}\mu =\operatorname{\mathbf{E}}x,\hspace{2em}{\sigma _{\xi }^{2}}=\operatorname{\mathbf{Var}}(\xi ),\hspace{2em}{\sigma _{e}^{2}}=\operatorname{\mathbf{Var}}(e),\hspace{2em}{\sigma _{\delta }^{2}}=\operatorname{\mathbf{Var}}(\delta ).\]

Lemma 3.

Assume conditions (a) to (d). Then the response variable (30) admits the representation

(33)

\[ y={c^{T}}z+{\beta _{0x}}+{\beta _{x}^{T}}{\left(x,{x^{2}},\dots ,{x^{k}}\right)^{T}}+u,\]

where z and ${(x,u)^{T}}$ are independent, the vector c remains unchanged compared with (30), $\operatorname{\mathbf{E}}\left[\left.u\right|x\right]=0$, $\operatorname{\mathbf{E}}\left[\left.{u^{2}}\right|x\right]<\infty $, and ${\beta _{0x}}\in \mathbb{R}$, ${\beta _{x}}\in {\mathbb{R}^{k}}$ are transformed (nonrandom) parameters of the polynomial regression.

Proof.

In the new notation, we have from (10) and (11):

(34)

\[ \xi ={\sigma _{\delta }^{2}}{\sigma _{x}^{-2}}\mu +{\sigma _{\xi }^{2}}{\sigma _{x}^{-2}}x+{\gamma _{1}}=:a+Kx+{\gamma _{1}},\]

(35)

\[ \epsilon ={\sigma _{\epsilon \delta }}{\sigma _{x}^{-2}}(x-\mu )+{\gamma _{2}}=:b+fx+{\gamma _{2}},\]

where z, x and ${({\gamma _{1}},{\gamma _{2}})^{T}}$ are independent, and ${({\gamma _{1}},{\gamma _{2}})^{T}}$ has the Gaussian distribution $\mathcal{N}\left(0,{V_{1|2}}\right)$.

Now, substitute (34) and (35) into (30) and get

(36)

\[\begin{array}{l}\displaystyle y={c^{T}}z+{\beta _{0}}+{\sum \limits_{j=1}^{k}}{\beta _{j}}{(a+Kx+{\gamma _{1}})^{j}}+b+fx+e+{\gamma _{2}},\\ {} \displaystyle y={c^{T}}z+{\beta _{0}}+{\sum \limits_{j=1}^{k}}{\beta _{j}}{\sum \limits_{p=0}^{j}}\left(\genfrac{}{}{0.0pt}{}{j}{p}\right){(a+Kx)^{j-p}}\operatorname{\mathbf{E}}\left[\left.{\gamma _{1}^{p}}\right|x\right]+b+fx+u,\end{array}\]

(37)

\[ u=e+{\gamma _{2}}+{\sum \limits_{j=1}^{k}}{\beta _{j}}{\sum \limits_{p=1}^{j}}\left(\genfrac{}{}{0.0pt}{}{j}{p}\right){(a+Kx)^{j-p}}\left({\gamma _{1}^{p}}-\operatorname{\mathbf{E}}\left[\left.{\gamma _{1}^{p}}\right|x\right]\right).\]

It holds $\operatorname{\mathbf{E}}\left[\left.u\right|x\right]=0$, $\operatorname{\mathbf{E}}\left[\left.{u^{2}}\right|x\right]<\infty $, and relations (36)–(37) imply the statement. □

3.3 Individual and mean prediction

We consider independent copies of the polynomial model (30)–(31):

\[ \left({y_{i}},{z_{i}},{\xi _{i}},{e_{i}},{\epsilon _{i}},{x_{i}},{\delta _{i}}\right),\hspace{1em}i=1,\dots ,n.\]

Based on observations (14) and for given ${z_{0}}$, ${x_{0}}$, we want to estimate the individual predictor ${\hat{y}_{0}}$ and the mean predictor ${\hat{\eta }_{0}}$ for the polynomial model.

Assume conditions (a) to (d) and suppose that all model parameters are unknown. Lemma 3 implies the expansion (33) with $\operatorname{\mathbf{E}}\left[u|x,z\right]=0$. All the underlying r.v.’s and the random vector z have finite 2nd moments, hence

\[ {\hat{y}_{0}}:={c^{T}}{z_{0}}+{\beta _{0x}}+{\beta _{x}^{T}}{\left({x_{0}},{x_{0}^{2}},\dots ,{x_{0}^{k}}\right)^{T}}\]

is the best mean squared error predictor of ${y_{0}}$. We estimate the coefficients c, ${\beta _{0x}}$ and ${\beta _{x}}$ using the sample (14) from the polynomial model. The OLS estimator minimizes the penalty function

\[ Q\left(c,{\beta _{0}},\beta \right):={\sum \limits_{i=1}^{n}}{\left({y_{i}}-{c^{T}}{z_{i}}-{\beta _{0}}-{\beta ^{T}}{\left({x_{i}},{x_{i}^{2}},\dots ,{x_{i}^{k}}\right)^{T}}\right)^{2}},\]

$c\in {\mathbb{R}^{q}}$, ${\beta _{0}}\in \mathbb{R}$, $\beta \in {\mathbb{R}^{k}}$. The OLS estimator can be computed by relations similar to (17)–(18):

(38)

\[\begin{array}{l}\displaystyle \bar{y}={\hat{c}^{T}}\bar{z}+{\hat{\beta }_{0x}}+{\hat{\beta }_{x}^{T}}{\left(\overline{x},\overline{{x^{2}}},\dots ,\overline{{x^{k}}}\right)^{T}},\\ {} \displaystyle \left(\substack{\hat{c}\\ {} {\hat{\beta }_{x}}}\right)={S_{rr}^{+}}{S_{ry}},\hspace{1em}r:={({z^{T}},x,\dots ,{x^{k}})^{T}};\end{array}\]

the sample covariance matrices ${S_{rr}}$ and ${S_{ry}}$ are defined in (16). The corresponding OLS predictor is

(39)

\[ {\tilde{y}_{0}}:={\hat{c}^{T}}{z_{0}}+{\hat{\beta }_{0x}}+{\hat{\beta }_{x}^{T}}{\left({x_{0}},{x_{0}^{2}},\dots ,{x_{0}^{k}}\right)^{T}}.\]

Theorem 4.

Assume conditions (a) to (d). Then ${\tilde{y}_{0}}$ presented in (39) is a strongly consistent estimator of the individual predictor ${\hat{y}_{0}}$. Moreover,

\[ \forall \tau >0,\hspace{0.2778em}\operatorname{\mathbf{P}}\left(\left.\left\| {\tilde{y}_{0}}-{\hat{y}_{0}}\right\| >\tau \right|{z_{0}},{x_{0}}\right)\to 0\hspace{1em}\textit{a.s. as}\hspace{2.5pt}n\to \infty .\]

Proof.

Following the lines of the proof of Theorem 2, it is enough to check the strong consistency of the estimators $\hat{c}$ and ${\hat{\beta }_{x}}$. We have a.s. as $n\to \infty $:

(40)

\[ {S_{rr}}\to \operatorname{\mathbf{Cov}}(r)=block‐diag\left({\Sigma _{z}},D\right),\hspace{1em}D:=\operatorname{\mathbf{Cov}}(x,{x^{2}},\dots {x^{k}}),\]

(41)

\[ {S_{ry}}\to \operatorname{\mathbf{Cov}}(r,y)=\left(\substack{{\Sigma _{z}}\cdot c\\ {} D{\beta _{x}}}\right).\]

By conditions (b) and (d), x is a nondegenerate Gaussian r.v., therefore, r.v.’s $1,x,\dots ,{x^{k}}$ are linearly independent in the Hilbert space ${L_{2}}\left(\Omega ,\operatorname{\mathbf{P}}\right)$ of square integrable r.v.’s, and the covariance matrix D is nonsingular. Relations (38), (40), and (41) imply that a.s. as $n\to \infty $

\[ \left(\substack{\hat{c}\\ {} {\hat{\beta }_{x}}}\right)\to \left(\substack{{\Sigma _{z}^{-1}}{\Sigma _{z}}c\\ {} {D^{-1}}D{\beta _{x}}}\right)=\left(\substack{c\\ {} {\beta _{x}}}\right).\]

And the statements of Theorem 4 follow. □

Similarly to Theorem 3 one can construct a consistent estimator of the mean predictor (2) in the polynomial EIV model. The strongly consistent estimator of μ is given in (28) and the one of ${\sigma _{x}^{2}}$ is constructed similarly to (29):

(42)

\[ {\hat{\sigma }_{x}^{2}}=\frac{1}{n-1}{\sum \limits_{i=1}^{n}}{\left({x_{i}}-\bar{x}\right)^{2}}.\]

Theorem 5.

Assume conditions (a) to (d) and suppose that ${\sigma _{\epsilon \delta }}$ defined in (32) is the only parameter which is known in the model (30), (31). Consider the estimators (39), (28), and (42). Then

\[ {\tilde{\eta }_{0}}:={\tilde{y}_{0}}-{\sigma _{\epsilon \delta }}{\hat{\sigma }_{x}^{-2}}({x_{0}}-\hat{\mu })\]

is a strongly consistent estimator of the mean predictor (2), and moreover,

3.4 Confidence interval for response in quadratic model

Consider a quadratic EIV model

(43)

\[ y={\beta _{0}}+{\beta _{1}}\xi +{\beta _{2}}{\xi ^{2}}+e,\]

(44)

\[ x=\xi +\delta .\]

It is a particular case of the model (30), (31) with $k=2$, $z=0$ and $\epsilon =0$.

We use notations (32). Our conditions are similar to (a)–(d), but we assume additionally that the reliability ratio

(45)

\[ K:=\frac{{\sigma _{\xi }^{2}}}{{\sigma _{x}^{2}}}\]

is separated away from zero. Thus, assume the following conditions.

(e) The random variables ξ, e and δ are independent; ξ and δ are Gaussian; e and δ have zero mean and ${\sigma _{e}^{2}}<\infty $; ${\sigma _{x}^{2}}>0$.
(f) Model parameters are unknown, but a lower bound ${K_{0}}$ for the reliability ratio (45) is given, with $0<{K_{0}}\le 1\hspace{-0.1667em}/2$.

Consider indepedent copies of the quadratic model

\[ \left({y_{i}},{\xi _{i}},{e_{i}},{x_{i}},{\delta _{i}}\right),\hspace{1em}i=0,1,\dots ,n.\]

Based on observations $({y_{i}},{x_{i}}),\hspace{0.2778em}i=1,\dots ,n$, and for a given ${x_{0}}$, we can construct the OLS predictor ${\tilde{y}_{0}}$, see (39), for ${y_{0}}$ with $k=2$, ${z_{0}}=0$. Now, we show the way how to construct an asymptotic confidence interval for ${y_{0}}$. (In a similar way this can be done for a polynomial EIV model of higher order.)

First we write down the representation (36), (37). Denote

(46)

\[ {m_{x}}=\operatorname{\mathbf{E}}\left(\left.\xi \right|x\right)=Kx+\left(1-K\right)\mu .\]

We have with independent ${m_{x}}$ and γ:

(47)

\[ \xi ={m_{x}}+\gamma ,\hspace{1em}\gamma \sim \mathcal{N}\left(0,K{\sigma _{\delta }^{2}}\right).\]

Then

(48)

\[\begin{array}{l}\displaystyle y={\beta _{0}}+{\beta _{1}}({m_{x}}+\gamma )+{\beta _{2}}{({m_{x}}+\gamma )^{2}}+e=\hspace{2em}\hspace{2em}\hspace{2em}\hspace{2em}\hspace{2em}\hspace{2em}\hspace{2em}\hspace{1em}\\ {} \displaystyle \hspace{2em}\hspace{2em}\hspace{2em}\hspace{2.5pt}={\beta _{0}}+{\beta _{1}}{m_{x}}+{\beta _{2}}\left({m_{x}^{2}}+K{\sigma _{\delta }^{2}}\right)+u=:\hat{y}+u,\end{array}\]

(49)

\[ u=e+({\beta _{1}}+2{m_{x}}{\beta _{2}})\gamma +{\beta _{2}}\left({\gamma ^{2}}-K{\sigma _{\delta }^{2}}\right).\]

Here $\operatorname{\mathbf{E}}\left(\left.u\right|x\right)=0$. From (3.4) we get that the best prediction is

(50)

\[\begin{array}{l}\displaystyle \hat{y}={\beta _{0x}}+{\beta _{1x}}\cdot x+{\beta _{2x}}\cdot {x^{2}},\\ {} \displaystyle {\beta _{1x}}={\beta _{1}}K+2{\beta _{2}}K(1-K)\mu ,\hspace{2.5pt}{\beta _{2x}}={\beta _{2}}\cdot {K^{2}}.\end{array}\]

Those coefficients can be estimated using the strongly consistent OLS estimator, cf. (38),

\[ \left(\substack{{\hat{\beta }_{1x}}\\ {} {\hat{\beta }_{2x}}}\right)={S_{rr}^{+}}{S_{ry}},\hspace{1em}r:={(x,{x^{2}})^{T}}.\]

The OLS estimator ${\hat{\beta }_{0x}}$ satisfies

\[ \bar{y}={\hat{\beta }_{0x}}+{\hat{\beta }_{1x}}\bar{x}+{\hat{\beta }_{2x}}\overline{{x^{2}}},\]

and the OLS predictor of ${y_{0}}$ is equal to

(51)

\[ {\tilde{y}_{0}}={\hat{\beta }_{0x}}+{\hat{\beta }_{1x}}{x_{0}}+{\hat{\beta }_{2x}}{x_{0}^{2}}.\]

To construct a confidence interval for ${y_{0}}$, we have to bound the conditional variance of u given ${x_{0}}$. From (49) we have

\[ \operatorname{\mathbf{Var}}\left(\left.u\right|x\right)={\sigma _{\epsilon }^{2}}+{({\beta _{1}}+2{m_{x}}{\beta _{2}})^{2}}K{\sigma _{\delta }^{2}}+{\beta _{2}^{2}}\cdot 2{\left(K{\sigma _{\delta }^{2}}\right)^{2}},\]

where $2{\left(K{\sigma _{\delta }^{2}}\right)^{2}}=\operatorname{\mathbf{Var}}({\gamma ^{2}})$. Denote

\[ {m_{{u^{2}}}}=\operatorname{\mathbf{E}}\left[\operatorname{\mathbf{Var}}\left(\left.u\right|x\right)\right].\]

It holds $\hspace{2.5pt}\text{a.s. as}\hspace{2.5pt}n\to \infty $:

\[ \frac{1}{n}{\sum \limits_{i=1}^{n}}{\left({y_{i}}-{\beta _{0x}}-{\beta _{1x}}{x_{i}}-{\beta _{2x}}{x_{i}^{2}}\right)^{2}}\to {m_{{u^{2}}}}.\]

Therefore, we have $\hspace{2.5pt}\text{a.s. as}\hspace{2.5pt}n\to \infty $:

\[ {\hat{m}_{{u^{2}}}}:=\frac{1}{n}{\sum \limits_{i=1}^{n}}{\left({y_{i}}-{\hat{\beta }_{0x}}-{\hat{\beta }_{1x}}{x_{i}}-{\hat{\beta }_{2x}}{x_{i}^{2}}\right)^{2}}\to {m_{{u^{2}}}}.\]

We have to bound the difference

(52)

\[\begin{array}{l}\displaystyle \operatorname{\mathbf{Var}}\left(\left.u\right|x\right)-{m_{{u^{2}}}}=4K{\sigma _{\delta }^{2}}\left({\beta _{2}^{2}}{K^{2}}\cdot F(K,x,\mu )+{\beta _{1}}{\beta _{2}}K(x-\mu )\right),\\ {} \displaystyle F(k,x,\mu ):={x^{2}}-{\mu ^{2}}-{\sigma _{x}^{2}}+2K(1-K)\mu (x-\mu ).\end{array}\]

Here we used the relations

\[\begin{array}{l}\displaystyle {m_{x}}-\operatorname{\mathbf{E}}{m_{x}}=K(x-\mu ),\\ {} \displaystyle {m_{x}^{2}}-\operatorname{\mathbf{E}}{m_{x}^{2}}={K^{2}}-{\mu ^{2}}-{\sigma _{x}^{2}}+2K(1-K)\mu (x-\mu ).\end{array}\]

Next, we express (52) through ${\beta _{ix}}$ rather than ${\beta _{i}}$. Using (50) we get:

\[\begin{array}{l}\displaystyle {\sigma _{\delta }^{2}}={\sigma _{x}^{2}}(1-K),\\ {} \displaystyle \begin{aligned}{}& \operatorname{\mathbf{Var}}\left(\left.u\right|x\right)-{m_{{u^{2}}}}=4(1-K){\sigma _{x}^{2}}\cdot \frac{{\beta _{2x}^{2}}}{K}\left(F(K,x,\mu )-\frac{2(1-K)}{K}\mu (x-\mu )\right)+\\ {} & \hspace{2em}+4(1-K){\sigma _{x}^{2}}{\beta _{1x}}{\beta _{2x}}\cdot \frac{x-\mu }{K}\le 4\left(\frac{1}{{K_{0}}}-1\right){\sigma _{x}^{2}}\cdot G(x,\mu ,{\sigma _{x}^{2}},{\beta _{1x}},{\beta _{2x}}),\end{aligned}\\ {} \displaystyle \begin{aligned}{}& G(x,\mu ,{\sigma _{x}^{2}},{\beta _{1x}},{\beta _{2x}})={\beta _{2x}^{2}}\bigg[{x^{2}}-{\mu ^{2}}-{\sigma _{x}^{2}}+\\ {} & \hspace{2em}\hspace{2em}\hspace{2em}+2{\left(\mu (x-\mu )\right)_{-}}{(1-{K_{0}})^{2}}\left(1+\frac{1}{{K_{0}}}\right)\bigg]+{\left({\beta _{1x}}{\beta _{2x}}(x-\mu )\right)_{+}}.\end{aligned}\end{array}\]

Here ${A_{+}}:=\max (A,0)$, ${A_{-}}:=-\min (A,0)$, $A\in \mathbb{R}$. Finally,

(53)

\[ \operatorname{\mathbf{Var}}\left(\left.u\right|x\right)\le {m_{{u^{2}}}}+4\left(\frac{1}{{K_{0}}}-1\right){\sigma _{x}^{2}}G(x,\mu ,{\sigma _{x}^{2}},{\beta _{1x}},{\beta _{2x}}).\]

We are ready to construct a confidence interval for ${y_{0}}$.

Theorem 6.

For the model (43)–(44), assume conditions (e) and (f). Fix the confidence probability $1-\alpha $. Define

\[\begin{aligned}{}{I_{\alpha }}& =\Bigg\{h\in \mathbb{R}:\left|h-{\tilde{y}_{0}}\right|\le {\alpha ^{-1\hspace{-0.1667em}/2}}\times \\ {} & \hspace{2em}\hspace{2em}\hspace{2em}\hspace{1em}\hspace{2.5pt}\times {\left[{\hat{m}_{{u^{2}}}}+4\left(\frac{1}{{K_{0}}}-1\right){\hat{\sigma }_{x}^{2}}G({x_{0}},\hat{\mu },{\hat{\sigma }_{x}^{2}},{\hat{\beta }_{1x}},{\hat{\beta }_{2x}})\right]_{+}^{1\hspace{-0.1667em}/2}}\Bigg\},\end{aligned}\]

where ${\hat{m}_{{u^{2}}}}$, ${\hat{\sigma }_{x}^{2}}$, $\hat{\mu }$, ${\hat{\beta }_{1x}}$ and ${\hat{\beta }_{2x}}$ are strongly consistent estimators of the corresponding parameters; the estimators were presented above. Then

\[ \underset{n\to \infty }{\liminf }\operatorname{\mathbf{P}}\left(\left.{y_{0}}\in {I_{\alpha }}\right|{x_{0}}\right)\ge 1-\alpha .\]

Proof.

It holds for $t>0$:

\[ \operatorname{\mathbf{P}}\left(\left.\left|{y_{0}}-{\hat{y}_{0}}\right|>t\right|{x_{0}}\right)\le \frac{\operatorname{\mathbf{Var}}\left(\left.u\right|{x_{0}}\right)}{{t^{2}}}\le \alpha \]

if t is selected such that $t\ge {\alpha ^{-1\hspace{-0.1667em}/2}}{\left[\operatorname{\mathbf{Var}}\left(\left.u\right|{x_{0}}\right)\right]^{1\hspace{-0.1667em}/2}}$. Now, the statement follows from the inequality (53) and the consistency of ${\tilde{y}_{0}}$, ${\hat{m}_{{u^{2}}}}$, ${\hat{\sigma }_{x}^{2}}$, $\hat{\mu }$, ${\hat{\beta }_{1x}}$ and ${\hat{\beta }_{2x}}$. □

4 Prediction in other EIV models

The OLS predictor $\tilde{y}$ approximates the best mean squared error predictor $\hat{y}$ presented in (1) not only in the plynomial EIV model. Let us consider the model with exponential regression function

(54)

\[ y=\beta {e^{\lambda \xi }}+e,\hspace{2em}x=\xi +\delta ,\]

where the real numbers β and λ are unknown regression parameters, and assume condition (e) from Section 3.4. Using expansion (47)–(46), we get

(55)

\[ y={\beta _{x}}\exp \left({\lambda _{x}}\cdot x\right)+u=:\hat{y}+u,\]

(56)

\[\begin{array}{l}\displaystyle {\beta _{x}}=\beta {e^{\lambda (1-K)\mu }}\cdot \operatorname{\mathbf{E}}{e^{\lambda \gamma }},\hspace{2.5pt}{\lambda _{x}}=K\lambda ,\hspace{2.5pt}\operatorname{\mathbf{E}}{e^{\lambda \gamma }}=\exp \left(\frac{{\lambda ^{2}}K{\sigma _{\delta }^{2}}}{2}\right),\\ {} \displaystyle u={\beta _{x}}{e^{{\lambda _{x}}\cdot x}}({e^{\lambda \gamma }}-\operatorname{\mathbf{E}}{e^{\lambda \gamma }}).\end{array}\]

Under mild conditions, the OLS predictor ${\tilde{y}_{0}}:={\hat{\beta }_{x}}\exp \left({\hat{\lambda }_{x}}\cdot {x_{0}}\right)$ is a strongly consistent estimator of ${\hat{y}_{0}}$, where ${\hat{\beta }_{x}}$ and ${\hat{\lambda }_{x}}$ are the OLS estimators of the regression parameters in the model (54).

Similar conclusion can be made for the trigonometric model

\[ y={a_{0}}+{\sum \limits_{k=1}^{m}}\left({a_{k}}\cos k\omega \xi +{b_{k}}\sin k\omega \xi \right)+e,\hspace{2em}x=\xi +\delta ,\]

where ${a_{k}},0\le k\le m$, ${b_{k}},1\le k\le m$, and $\omega >0$ are unknown regression parameters.

Finally, we give an example of the model, where the OLS predictor does not approximate the best mean squared error predictor. Let

(57)

\[ y=\beta |\xi +a|+e,\hspace{2em}x=\xi +\delta ,\]

where the real numbers β and a are unknown regression parameters, and assume condition (e) from Section 3.4; suppose also that ${\sigma _{\xi }^{2}}$ and ${\sigma _{\delta }^{2}}$ are positive.

For ${\gamma _{0}}\sim \mathcal{N}(0,1)$, evaluate

\[ F(a):=\operatorname{\mathbf{E}}|{\gamma _{0}}+a|=2\phi (a)+a(2\Phi (a)-1),\hspace{1em}a\in \mathbb{R},\]

where ϕ and Φ are the pdf and cdf of ${\gamma _{0}}$. Then the best mean squared error predictor is as follows:

\[\begin{array}{l}\displaystyle \begin{aligned}{}& \hat{y}=\operatorname{\mathbf{E}}\left(\left.y\right|x\right)=\beta \operatorname{\mathbf{E}}\bigg[\Big|a+Kx+(1-K)\mu +{\sigma _{\delta }}\sqrt{K}{\gamma _{0}}\Big|\bigg|x\bigg]=\\ {} & \hspace{2em}\hspace{2em}\hspace{2em}\hspace{2em}\hspace{2em}={\beta _{x}}F\left({k_{x}}\cdot x+{b_{x}}\right),\hspace{1em}{k_{x}}>0,\hspace{2.5pt}{\beta _{x}}\in \mathbb{R},\hspace{2.5pt}{b_{x}}\in \mathbb{R},\end{aligned}\\ {} \displaystyle {\beta _{x}}=\beta {\sigma _{\delta }}\sqrt{K},\hspace{2.5pt}{k_{x}}=\frac{\sqrt{K}}{{\sigma _{\delta }}},\hspace{2.5pt}{b_{x}}=\frac{a+(1-K)\mu }{{\sigma _{\delta }}\sqrt{K}}.\end{array}\]

The LS estimators ${\hat{k}_{x}}$, ${\hat{\beta }_{x}}$ and ${\hat{b}_{x}}$ of ${k_{x}}$, ${\beta _{x}}$ and ${b_{x}}$ minimize the penalty function

\[ {Q_{LS}}(k,\beta ,b):={\sum \limits_{i=1}^{n}}{\left({y_{i}}-\beta F(k{x_{i}}+b)\right)^{2}}.\]

Under mild additional conditions, the LS estimators are strongly consistent, and the LS predictor

\[ {\tilde{y}_{0}}:={\beta _{x}}F\left({\hat{b}_{x}}\cdot {x_{0}}+{\hat{b}_{x}}\right)\]

converges a.s. to ${\hat{y}_{0}}=\operatorname{\mathbf{E}}\left(\left.{y_{0}}\right|{x_{0}}\right)={\beta _{x}}F\left({b_{x}}\cdot {x_{0}}+{b_{x}}\right)$ as the sample size grows. Notice that for this model (57), the OLS predictor $\hat{\beta }\left|{x_{0}}+\hat{a}\right|$ needs not to converge in probability to ${\hat{y}_{0}}$, where the OLS estimators $\hat{\beta }$ and $\hat{a}$ minimize the penalty function

\[ {Q_{OLS}}(\beta ,a):={\sum \limits_{i=1}^{n}}{\left({y_{i}}-\beta |{x_{i}}+a|\right)^{2}}.\]

5 Conclusion

We considered structural EIV models with the classical measurement error. We gave a list of models where the OLS predictor of response ${y_{0}}$ converges with probability one to the best mean squared error predictor ${\hat{y}_{0}}=\operatorname{\mathbf{E}}\left[\left.{y_{0}}\right|{z_{0}},{x_{0}}\right]$. In such models, a functional dependence ${\hat{y}_{0}}={\hat{y}_{0}}({z_{0}},{x_{0}})$ belongs to the same parametric family as the initial regression function ${\eta _{0}}({z_{0}},{\xi _{0}})=\operatorname{\mathbf{E}}\left[\left.{y_{0}}\right|{z_{0}},{\xi _{0}}\right]$. Such a situation looks exceptional for nonlinear models, and we gave an example of model (57), where the OLS predictor does not perform well.

We dealt with both the mean and individual prediction. They coincide in the case of nondifferential errors, where it is known that the errors in response and in covariates are uncorrelated. Otherwise, to construct the mean prediction, one has to know the covariance of the errors.

In linear models, we managed to construct an asymptotic confidence region for response around the OLS prediction, under totally unknown model parameters. In the quadratic model, we did it under the known lower bound of the reliability ratio. The procedure can be expanded to polynomial models of higher order.

Notice that in linear models without intercept and in incomplete polynomial models (like, e.g. $y={\beta _{0}}+{\beta _{2}}{\xi ^{2}}+e$, $x=\xi +\delta $), a prediction with $(z,x)$ naively substituted for $(z,\xi )$ in the regression of y on $(z,x)$ can have huge prediction errors. As stated in [2, Section 2.6], predicting y from $(z,x)$ is merely a matter of substituting known values of x and z into the regression model for y on $(z,x)$. We can add that, in nonlinear EIV models, the corresponding error $v=y-\operatorname{\mathbf{E}}\left[\left.y\right|z,x\right]$ has the variance depending on x, i.e., the regression of y on $(z,x)$ is heteroskedastic; this should be taken into account in order to construct a confidence region for y in a proper way.

Finally, we make a caveat for practitioners. Consistent EIV regression parameter estimators are useful especially for prediction if the observation errors for the predicted subject differ from those in the data used for the model fitting. This is usually the case when the model is fitted by some experimental data while the prediction is made for a real world subject. The idea to use inconsistent OLS estimators for prediction in this case is not good.

Authors

Abstract

1 Introduction

(1)

(2)

2 Prediction in a multivariate linear EIV model

2.1 Model and main assumptions

(3)

(4)

(5)

2.2 Regression of y on z and x

Lemma 1.

(6)

(7)

(8)

Proof.

(9)

(10)

(11)

(12)

Lemma 2.

(13)

Proof.

2.3 Individual prediction

(14)

(15)

(16)

(17)

(18)

(19)

Theorem 1.

(20)

Proof.

Theorem 2.

(21)

(22)

(23)

(24)

Proof.

(25)

(26)

Remark 1.

2.4 Mean prediction

(27)

(28)

(29)

Theorem 3.

Proof.

3 Prediction in a polynomial EIV model

3.1 Model and main assumptions

(30)

(31)

3.2 Regression y on z and x

(32)

Lemma 3.

(33)

Proof.

(34)

(35)

(36)

(37)

3.3 Individual and mean prediction

(38)

(39)

Theorem 4.

Proof.

(40)

(41)

(42)

Theorem 5.

3.4 Confidence interval for response in quadratic model

(43)

(44)

(45)

(46)

(47)

(48)

(49)

(50)

(51)