1 Introduction
We deal with errors-in-variables (EIV) models which are widely used in system identification [10], epidemiology [2], econometrics [12], etc. In such regression models (with unknown parameter β), the response variable y depends on the covariates z and ξ, where z is observed precisely and ξ is observed with error. We consider the classical measurement error δ, i.e., instead of ξ the surrogate data $x=\xi +\delta $ is observed; moreover, the model is structural, i.e. z, ξ and δ are mutually independent, and we have i.i.d. copies of the model (${z_{i}}$, ${\xi _{i}}$, ${\delta _{i}}$, ${x_{i}}={\xi _{i}}+{\delta _{i}}$, ${y_{i}}$), $i=1,\dots ,n$. The measurement error can be nondifferential, when the distribution of y given $\left(\xi ,z,x\right)$ depends only on $\left(\xi ,z\right)$, and differential, otherwise [2, Section 2.5].
The present paper is devoted to the prediction of the response variable from ξ and z. Based on the observations (${y_{i}}$, ${z_{i}}$, ${x_{i}}$), $i=1,\dots ,n$, and given new values ${z_{0}}$ and ${x_{0}}$ of z and x variables, we want to predict either the new ${y_{0}}$ (this procedure is called individual prediction) or the exact relation ${\eta _{0}}=\operatorname{\mathbf{E}}\left[\left.{y_{0}}\right|{z_{0}},{\xi _{0}}\right]$, where ${\xi _{0}}$ is a new value for ξ (this procedure is called mean prediction). Both prediction problems are important in econometrics [5]. The individual prediction is used in the Leave-one-out cross-validation procedure.
The best mean squared error individual predictor is
and the best mean squared error predictor of ${\eta _{0}}$ is
For the nondifferential measurement error,
and the best mean predictor coincides with the best individual predictor, but this needs not to hold for the differential measurement error.
Both predictors (1) and (2) are unfeasible, because they involve unknown model parameters. Our goal is to construct consistent estimators of the predictors as the sample size n grows.
The nonparametric individual prediction under errors in covariates is studied in [7]. Below we consider only parametric models.
For scalar linear EIV models with normally distributed ξ and δ, it is stated in [4, Section 2.5.1] that the ordinary least squares (OLS) predictor should be used even when dealing with the EIV model. This is quite surprising, since the OLS estimator of β is inconsistent due to the attenuation effect [4]. In fact, there is no surprise that in a Gaussian model the linear OLS estimator provides a consistent prediction, since the Gaussian dependence is always linear. In the present paper, we consider a non-Gaussian regression model, since the distribution of the observable covariate z is not assumed Gaussian; therefore, the consistency of OLS predictions in such a model is a nontrivial feature.
We confirm the assertion, that the OLS estimator yields a suitable prediction under the model validity, for two kinds of EIV models: multivariate linear and polynomial. For this purpose, we just follow the recommendation of [4, Section 2.6] and analyze the regression of y on the observable z and x. In other nonlinear EIV models, the OLS predictor (contaminated from the initial regression y on $\left(z,\xi \right)$, where we naively substitute x for ξ) is inconsistent; instead the least-squares predictor can be used from the regression y on $\left(z,x\right)$.
The paper is organized as follows. In Sections 2 and 3, we state the results on prediction in multivariate linear and polynomial EIV models, respectively. Section 4 studies briefly some other nonlinear EIV models, and Section 5 concludes.
Through the paper, all vectors are column ones, $\operatorname{\mathbf{E}}$ stands for the expectation and acts as an operator on the total product, and $\operatorname{\mathbf{Cov}}\left(x\right)$ denotes the covariance matrix of a random vector x. By ${I_{p}}$ we denote the identity matrix of size p. For symmetric matrices A and B of the same size, $A>B$ and $A\ge B$ means that $A-B$ is positive definite or positive semidefinite, respectively.
2 Prediction in a multivariate linear EIV model
2.1 Model and main assumptions
Consider a multivariate linear EIV model with the intercept term (structural case):
Here the random vector y is the response variable distributed in ${\mathbb{R}^{d}}$; the random vector z is the observable covariate distributed in ${\mathbb{R}^{q}}$, the random vector ξ is the unobservable (latent) covariate distributed in ${\mathbb{R}^{m}}$; x is the surrogate data observed instead of ξ; $e+\epsilon $ is the random error in y, δ is the measurement error in the latent covariate; $C\in {\mathbb{R}^{q\times d}}$, $B\in {\mathbb{R}^{m\times d}}$ and $b\in {\mathbb{R}^{d}}$ contain unknown regression parameters, where b is the intercept term. The random vector e models the error in the regression equation, and ϵ models the measurement error in y; ϵ can be correlated with δ.
Such models are studied, e.g., in [11, 10, 9] in relation to system identification problems and numerical linear algebra. We list the model assumptions.
-
(i) Three vectors z, ξ, e and the augmented measurement error vector ${\left({\epsilon ^{T}},{\delta ^{T}}\right)^{T}}$ are independent with finite 2nd moments; the errors ϵ and δ can be correlated.
-
(ii) The covariance matrices ${\Sigma _{z}}:=\operatorname{\mathbf{Cov}}(z)$ and ${\Sigma _{x}}:=\operatorname{\mathbf{Cov}}(x)$ are nonsingular.
-
(iii) The errors e, ϵ and δ have zero means.
-
(iv) The errors ϵ, δ and covariate ξ are jointly Gaussian.
Introduce the cross-covariance matrix
The classical measurement error δ is nondifferential if, and only if, ϵ and δ are independent, i.e. ${\Sigma _{\epsilon \delta }}=0$ (see Section 1 for the definition of the nondifferential error).
We denote also
Thus, ${\Sigma _{11}}$ is a block-diagonal matrix, and sometimes we will use ${\Sigma _{22}}$ for the covariance matrix of x.
(5)
\[\begin{array}{l}\displaystyle \mu =\operatorname{\mathbf{E}}x,\hspace{2em}{\Sigma _{\xi }}=\operatorname{\mathbf{Cov}}(\xi ),\hspace{2em}\\ {} \displaystyle {\Sigma _{e}}=\operatorname{\mathbf{Cov}}(e),\hspace{2em}{\Sigma _{\epsilon }}=\operatorname{\mathbf{Cov}}(\epsilon ),\hspace{2em}{\Sigma _{\delta }}=\operatorname{\mathbf{Cov}}(\delta ),\\ {} \displaystyle {\Sigma _{11}}=block‐diag({\Sigma _{\xi }},{\Sigma _{\epsilon }}),\hspace{2em}{\Sigma _{12}}=\left[\substack{{\Sigma _{\xi }}\\ {} {\Sigma _{\epsilon \delta }}}\right],\hspace{2em}{\Sigma _{22}}={\Sigma _{x}}.\end{array}\]2.2 Regression of y on z and x
Lemma 1.
Assume conditions (i) to (iv).
-
(b) Assume additionally the following condition: Then the error term u in (6) has a positive definite covariance matrix, ${\Sigma _{u}}$.
Proof.
(a) Introduce the jointly Gaussian vectors
We have
\[\begin{array}{l}\displaystyle {\mu ^{(1)}}:=\operatorname{\mathbf{E}}{x^{(1)}}=\left(\substack{\mu \\ {} 0}\right),\hspace{2em}{\mu ^{(2)}}:=\operatorname{\mathbf{E}}{x^{(2)}}=\mu ;\\ {} \displaystyle \operatorname{\mathbf{Cov}}\left({x^{(1)}}\right)={\Sigma _{11}},\hspace{2em}\operatorname{\mathbf{Cov}}\left({x^{(2)}}\right)={\Sigma _{22}},\end{array}\]
which is positive definite by assumption (ii),
where the matrices ${\Sigma _{11}}$, ${\Sigma _{12}}$, ${\Sigma _{22}}$ are given in (5). Now, according to Theorem 2.5.1 [1] the conditional distribution of ${x^{(1)}}$ given ${x^{(2)}}$ is
(9)
\[\begin{array}{l}\displaystyle \left[\left.{x^{(1)}}\right|{x^{(2)}}\right]\sim \mathcal{N}\left({\mu _{1|2}},{V_{1|2}}\right),\\ {} \displaystyle {\mu _{1|2}}={\mu _{1|2}}({x^{(2)}})={\mu ^{(1)}}+{\Sigma _{12}}{\Sigma _{22}^{-1}}\left({x^{(2)}}-{\mu ^{(2)}}\right)=\left(\substack{{\Sigma _{\delta }}{\Sigma _{x}^{-1}}\mu +{\Sigma _{\xi }}{\Sigma _{x}^{-1}}x\\ {} {\Sigma _{\epsilon \delta }}{\Sigma _{x}^{-1}}(x-\mu )}\right),\\ {} \displaystyle {V_{1|2}}={\Sigma _{11}}-{\Sigma _{12}}{\Sigma _{22}^{-1}}{\Sigma _{12}^{T}}.\end{array}\]Hence ${({\xi ^{T}},{\epsilon ^{T}})^{T}}-{\mu _{1|2}}(x)=:{({\gamma _{1}^{T}},{\gamma _{2}^{T}})^{T}}$ is uncorrelated with x and has the Gaussian distribution $\mathcal{N}\left(0,{V_{1|2}}\right)$. Therefore,
Substitute (10) and (11) into (3) and obtain the desired relations (6)–(8) with
Here $(z,e,x)$ and a couple $({\gamma _{1}},{\gamma _{2}})$ are independent, hence $(z,x,u)$ are independent as well. This implies the statement (a).
(b) We have
If ${\Sigma _{e}}>0$ then ${\Sigma _{u}}\ge {\Sigma _{e}}>0$, thus, ${\Sigma _{u}}>0$; and if ${V_{1|2}}>0$ then ${\Sigma _{u}}\ge \operatorname{\mathbf{Cov}}\left({B^{T}}{\gamma _{1}}+{\gamma _{2}}\right)>0$, thus, ${\Sigma _{u}}>0$. This accomplishes the proof of Lemma 1. □
(12)
\[ \operatorname{\mathbf{Cov}}(u)={\Sigma _{e}}+\operatorname{\mathbf{Cov}}\left({B^{T}}{\gamma _{1}}+{\gamma _{2}}\right)=:{\Sigma _{u}}.\]As a particular case take a model with a univariate response and univariate regressor ξ.
Lemma 2.
Proof.
First suppose that ${\Sigma _{\xi }}>0$. According to Lemma 1, it is enough to check that ${V_{1|2}}$ given in (9) is positive definite.
A direct computation shows that
\[ {V_{1|2}}=\frac{1}{{\sigma _{x}^{2}}}\left(\begin{array}{c@{\hskip10.0pt}c}{\sigma _{\xi }^{2}}{\sigma _{\delta }^{2}}& -{\sigma _{\xi }^{2}}{\sigma _{\epsilon \delta }}\\ {} -{\sigma _{\xi }^{2}}{\sigma _{\epsilon \delta }}& {\sigma _{\epsilon }^{2}}{\sigma _{x}^{2}}-{\sigma _{\epsilon \delta }^{2}}\end{array}\right)=:\frac{V}{{\sigma _{x}^{2}}}.\]
Here in the scalar case we write ${\sigma _{\xi }^{2}}={\Sigma _{\xi }}$, ${\sigma _{\delta }^{2}}={\Sigma _{\delta }}$, ${\sigma _{\epsilon \delta }}={\Sigma _{\epsilon \delta }}$, etc. The matrix V is positive definite, because ${\sigma _{\xi }^{2}}{\sigma _{\delta }^{2}}>0$ and
\[ \det V={\sigma _{\delta }^{2}}{\sigma _{x}^{2}}\left({\sigma _{\epsilon }^{2}}{\sigma _{\delta }^{2}}-{\sigma _{\epsilon \delta }^{2}}\right)>0\]
due to condition (13).Now, suppose that ${\Sigma _{\xi }}=0$. Then $\xi =\mu $ almost surely. With some computations, it can be shown that $u=e+\epsilon -{\sigma _{\epsilon \delta }}{\sigma _{\delta }^{-2}}\delta $ almost surely, whence ${\sigma _{u}^{2}}={\sigma _{e}^{2}}+{\sigma _{\epsilon }^{2}}-{\sigma _{\epsilon \delta }^{2}}{\sigma _{\delta }^{-2}}>0$. Lemma 2 is proved. □
2.3 Individual prediction
Now, consider independent copies of the multivariate model (3), (4):
\[ \left({y_{i}},{z_{i}},{\xi _{i}},{e_{i}},{\epsilon _{i}},{x_{i}},{\delta _{i}}\right),\hspace{1em}i=1,\dots ,n.\]
Based on the observations
and for given ${z_{0}}$, ${x_{0}}$, we want to estimate the individual predictor ${\hat{y}_{0}}$ presented in (1) and the mean predictor ${\hat{\eta }_{0}}$ presented in (2).Assume conditions (i) to (iv) and suppose that all model parameters are unknown. Lemma 1 implies the expansion (6) with $\operatorname{\mathbf{E}}u=0$. All the underlying random vectors have finite 2nd moments, hence
is the best mean squared error predictor of ${y_{0}}$. Since it is unfeasible, we have to estimate the coefficients ${b_{x}}$, C and ${B_{x}}$ using the sample (14). The OLS estimator $\left({\hat{b}_{x}},\hat{C},{\hat{B}_{x}}\right)$ minimizes the penalty function
Let bar denote the average over $i=1,\dots ,n$, e.g.,
and ${S_{uv}}$ denote the sample covariance matrix of u and v variables, e.g.,
etc. The OLS estimator can be computed from the relations [11]
Hereafter ${A^{+}}$ is the pseudo-inverse of a square matrix A; see the properties of ${A^{+}}$ in [8]. The corresponding OLS predictor is
(16)
\[ {S_{xy}}=\frac{1}{n}{\sum \limits_{i=1}^{n}}\left({x_{i}}-\bar{x}\right){\left({y_{i}}-\bar{y}\right)^{T}},\hspace{2em}{S_{xx}}=\frac{1}{n}{\sum \limits_{i=1}^{n}}\left({x_{i}}-\bar{x}\right){\left({x_{i}}-\bar{x}\right)^{T}},\]Theorem 1.
Proof.
By Strong Law of Large Numbers we have a.s. as $n\to \infty $:
\[\begin{array}{l}\displaystyle {S_{rr}}\to block‐diag\left({\Sigma _{z}},{\Sigma _{x}}\right)>0,\\ {} \displaystyle {S_{ry}}\to \left(\substack{\operatorname{\mathbf{Cov}}\left(z,y\right)\\ {} \operatorname{\mathbf{Cov}}\left(x,y\right)}\right)=\left(\substack{{\Sigma _{z}}\cdot C\\ {} {\Sigma _{x}}\cdot {B_{x}}}\right),\\ {} \displaystyle \left(\substack{\hat{C}\\ {} {\hat{B}_{x}}}\right)\to \left(\substack{{\Sigma _{z}^{-1}}{\Sigma _{z}}\cdot C\\ {} {\Sigma _{x}^{-1}}{\Sigma _{x}}\cdot {B_{x}}}\right)=\left(\substack{C\\ {} {B_{x}}}\right).\end{array}\]
This convergence, relation (17) and the a.s. convergence of the sample means imply that ${\hat{b}_{x}}\to {b_{x}}\hspace{2.5pt}\text{a.s.}$ Now, both statements of Theorem 1 follow from (19) and (15). □It is interesting to construct an asymptotic confidence region for the response ${y_{0}}$ based on the OLS predictor. Assume (i) to (iv). It holds
\[ \operatorname{\mathbf{Cov}}\left(\left.{y_{0}}-{\hat{y}_{0}}\right|{z_{0}},{x_{0}}\right)=\operatorname{\mathbf{Cov}}({u_{0}})={\Sigma _{u}},\]
see (12). Introduce the estimator
\[ {\hat{\Sigma }_{u}}=\frac{1}{n}{\sum \limits_{i=1}^{n}}\left({y_{i}}-{\hat{b}_{x}}-{\hat{C}^{T}}{z_{i}}-{\hat{B}_{x}^{T}}{x_{i}}\right){\left({y_{i}}-{\hat{b}_{x}}-{\hat{C}^{T}}{z_{i}}-{\hat{B}_{x}^{T}}{x_{i}}\right)^{T}}.\]
Theorem 2.
Suppose that conditions (i) to (iv) hold. Fix the confidence probability $1-\alpha $.
-
(a) Assume additionally (v) and define Then
-
(b) Let the model (3)–(4) be purely normal, i.e. z is normally distributed and $e=0$. Assume additionally that the matrix (9) is nonsingular. Define
(23)
\[ {D_{\alpha }}=\left\{h\in {\mathbb{R}^{d}}:{\left\| {\left({\hat{\Sigma }_{u}^{+}}\right)^{1\hspace{-0.1667em}/2}}\left(h-{\tilde{y}_{0}}\right)\right\| ^{2}}\le {\chi _{d\alpha }^{2}}\right\},\]
Proof.
If ${b_{x}}$, C, and ${B_{x}}$ were known, then we could approximate ${\Sigma _{u}}$ as follows:
Since ${u_{i}}{u_{i}^{T}}$ is a quadratic function of the coefficients ${b_{x}}$, C, ${B_{x}}$, and the OLS estimators of those coefficients are strongly consistent, the convergence (25) remains valid if we replace all ${u_{i}}$ with the residuals
Hence
(25)
\[\begin{array}{l}\displaystyle \frac{1}{n}{\sum \limits_{i=1}^{n}}{u_{i}}{u_{i}^{T}}\to {\Sigma _{u}}\hspace{1em}\text{a.s. as}\hspace{2.5pt}n\to \infty ,\\ {} \displaystyle {u_{i}}:={y_{i}}-{b_{x}}-{C^{T}}{z_{i}}-{B_{x}^{T}}{x_{i}}.\end{array}\](a) Under (v), ${\Sigma _{u}}$ is nonsingular by Lemma 1(b). It holds
\[ \operatorname{\mathbf{P}}\left(\left.{\left\| {\Sigma _{u}^{-1\hspace{-0.1667em}/2}}\left({y_{0}}-{\hat{y}_{0}}\right)\right\| ^{2}}>\frac{d}{\alpha }\right|{z_{0}},{x_{0}}\right)\le \alpha \frac{\operatorname{\mathbf{E}}{\left\| {\Sigma _{u}^{-1\hspace{-0.1667em}/2}}\cdot u\right\| ^{2}}}{d}=\alpha .\]
Since the relations (20) and (26) hold true, the relations (22), (21) follow.(b) Again, in this purely normal model the matrix ${\Sigma _{u}}$ is nonsingular; conditional on ${z_{0}}$ and ${x_{0}}$, the difference ${y_{0}}-{\hat{y}_{0}}={u_{0}}$ has the normal distribution $\mathcal{N}\left(0,{\Sigma _{u}}\right)$. Then
\[ \operatorname{\mathbf{P}}\left(\left.{\left\| {\Sigma _{u}^{-1\hspace{-0.1667em}/2}}\left({y_{0}}-{\hat{y}_{0}}\right)\right\| ^{2}}>{\chi _{d\alpha }^{2}}\right|{z_{0}},{x_{0}}\right)=\alpha .\]
Since the relations (26) and (20) hold true, the relations (24), (23) follow. □2.4 Mean prediction
Still consider the model (3), (4) under conditions (i) to (iv). We want to estimate the mean predictor ${\hat{\eta }_{0}}$ presented in (2). We have
\[\begin{array}{l}\displaystyle {\hat{\eta }_{0}}={\hat{y}_{0}}-\operatorname{\mathbf{E}}\left[\left.{e_{0}}\right|{z_{0}},{x_{0}}\right]-\operatorname{\mathbf{E}}\left[\left.{\epsilon _{0}}\right|{z_{0}},{x_{0}}\right],\\ {} \displaystyle \operatorname{\mathbf{E}}\left[\left.{e_{0}}\right|{z_{0}},{x_{0}}\right]=\operatorname{\mathbf{E}}{e_{0}}=0,\end{array}\]
and by (11),
Thus,
Based on observations (14), strongly consistent and unbiased estimators of μ and ${\Sigma _{x}}$ are as follows:
Theorem 3.
Assume conditions (i) to (iv) and suppose that ${\Sigma _{\epsilon \delta }}$ is the only model parameter which is known. Consider the estimators (19), (28), and (29). Then
\[ {\tilde{\eta }_{0}}:={\tilde{y}_{0}}-{\Sigma _{\epsilon \delta }}{\hat{\Sigma }_{x}^{-1}}\left({x_{0}}-\hat{\mu }\right)\]
is a strongly consistent estimator of the mean predictor (2), and moreover
Notice that more model parameters should be known in order to construct a confidence region for ${\eta _{0}}$ around ${\tilde{\eta }_{0}}$.
3 Prediction in a polynomial EIV model
3.1 Model and main assumptions
For a fixed and known $k\ge 2$, consider a polynomial EIV model (structural case):
Here the random variable (r.v.) y is the response variable; the random vector z is the observable covariate distributed in ${\mathbb{R}^{q}}$, r.v. ξ is the unobservable covariate; x is the surrogate data observed instead of ξ; e is the random error in the equation, ϵ and δ are the measurement errors in the response and in the latent covariate; $c\in {\mathbb{R}^{q}}$, ${\beta _{0}}\in \mathbb{R}$ and $\beta ={\left({\beta _{1}},\dots ,{\beta _{k}}\right)^{T}}\in {\mathbb{R}^{k}}$ contain unknown regression parameters; ϵ and δ can be correlated.
Such models are studied, e.g., in [3, 6] and applied, for instance, in econometrics. Let us introduce the model assumptions.
-
(a) The random variables ξ, e and random vectors z, ${\left(\epsilon ,\delta \right)^{T}}$ are independent, with finite 2nd moments; the random variables ϵ and δ can be correlated.
-
(b) The covariance matrix ${\Sigma _{z}}:=\operatorname{\mathbf{Cov}}(z)$ is nonsingular, and ${\sigma _{x}^{2}}:=\operatorname{\mathbf{Var}}(x)>0$.
-
(c) The errors e, ϵ and δ have zero mean.
-
(d) The errors ϵ, δ and ξ are jointly Gaussian.
3.2 Regression y on z and x
Let us denote
(32)
\[ {\sigma _{\epsilon \delta }}=\operatorname{\mathbf{E}}\epsilon \delta ,\hspace{2em}\mu =\operatorname{\mathbf{E}}x,\hspace{2em}{\sigma _{\xi }^{2}}=\operatorname{\mathbf{Var}}(\xi ),\hspace{2em}{\sigma _{e}^{2}}=\operatorname{\mathbf{Var}}(e),\hspace{2em}{\sigma _{\delta }^{2}}=\operatorname{\mathbf{Var}}(\delta ).\]Lemma 3.
Assume conditions (a) to (d). Then the response variable (30) admits the representation
where z and ${(x,u)^{T}}$ are independent, the vector c remains unchanged compared with (30), $\operatorname{\mathbf{E}}\left[\left.u\right|x\right]=0$, $\operatorname{\mathbf{E}}\left[\left.{u^{2}}\right|x\right]<\infty $, and ${\beta _{0x}}\in \mathbb{R}$, ${\beta _{x}}\in {\mathbb{R}^{k}}$ are transformed (nonrandom) parameters of the polynomial regression.
Proof.
In the new notation, we have from (10) and (11):
where z, x and ${({\gamma _{1}},{\gamma _{2}})^{T}}$ are independent, and ${({\gamma _{1}},{\gamma _{2}})^{T}}$ has the Gaussian distribution $\mathcal{N}\left(0,{V_{1|2}}\right)$.
Now, substitute (34) and (35) into (30) and get
It holds $\operatorname{\mathbf{E}}\left[\left.u\right|x\right]=0$, $\operatorname{\mathbf{E}}\left[\left.{u^{2}}\right|x\right]<\infty $, and relations (36)–(37) imply the statement. □
(36)
\[\begin{array}{l}\displaystyle y={c^{T}}z+{\beta _{0}}+{\sum \limits_{j=1}^{k}}{\beta _{j}}{(a+Kx+{\gamma _{1}})^{j}}+b+fx+e+{\gamma _{2}},\\ {} \displaystyle y={c^{T}}z+{\beta _{0}}+{\sum \limits_{j=1}^{k}}{\beta _{j}}{\sum \limits_{p=0}^{j}}\left(\genfrac{}{}{0.0pt}{}{j}{p}\right){(a+Kx)^{j-p}}\operatorname{\mathbf{E}}\left[\left.{\gamma _{1}^{p}}\right|x\right]+b+fx+u,\end{array}\]3.3 Individual and mean prediction
We consider independent copies of the polynomial model (30)–(31):
\[ \left({y_{i}},{z_{i}},{\xi _{i}},{e_{i}},{\epsilon _{i}},{x_{i}},{\delta _{i}}\right),\hspace{1em}i=1,\dots ,n.\]
Based on observations (14) and for given ${z_{0}}$, ${x_{0}}$, we want to estimate the individual predictor ${\hat{y}_{0}}$ and the mean predictor ${\hat{\eta }_{0}}$ for the polynomial model.Assume conditions (a) to (d) and suppose that all model parameters are unknown. Lemma 3 implies the expansion (33) with $\operatorname{\mathbf{E}}\left[u|x,z\right]=0$. All the underlying r.v.’s and the random vector z have finite 2nd moments, hence
the sample covariance matrices ${S_{rr}}$ and ${S_{ry}}$ are defined in (16). The corresponding OLS predictor is
\[ {\hat{y}_{0}}:={c^{T}}{z_{0}}+{\beta _{0x}}+{\beta _{x}^{T}}{\left({x_{0}},{x_{0}^{2}},\dots ,{x_{0}^{k}}\right)^{T}}\]
is the best mean squared error predictor of ${y_{0}}$. We estimate the coefficients c, ${\beta _{0x}}$ and ${\beta _{x}}$ using the sample (14) from the polynomial model. The OLS estimator minimizes the penalty function
\[ Q\left(c,{\beta _{0}},\beta \right):={\sum \limits_{i=1}^{n}}{\left({y_{i}}-{c^{T}}{z_{i}}-{\beta _{0}}-{\beta ^{T}}{\left({x_{i}},{x_{i}^{2}},\dots ,{x_{i}^{k}}\right)^{T}}\right)^{2}},\]
$c\in {\mathbb{R}^{q}}$, ${\beta _{0}}\in \mathbb{R}$, $\beta \in {\mathbb{R}^{k}}$. The OLS estimator can be computed by relations similar to (17)–(18):
(38)
\[\begin{array}{l}\displaystyle \bar{y}={\hat{c}^{T}}\bar{z}+{\hat{\beta }_{0x}}+{\hat{\beta }_{x}^{T}}{\left(\overline{x},\overline{{x^{2}}},\dots ,\overline{{x^{k}}}\right)^{T}},\\ {} \displaystyle \left(\substack{\hat{c}\\ {} {\hat{\beta }_{x}}}\right)={S_{rr}^{+}}{S_{ry}},\hspace{1em}r:={({z^{T}},x,\dots ,{x^{k}})^{T}};\end{array}\]Theorem 4.
Proof.
Following the lines of the proof of Theorem 2, it is enough to check the strong consistency of the estimators $\hat{c}$ and ${\hat{\beta }_{x}}$. We have a.s. as $n\to \infty $:
By conditions (b) and (d), x is a nondegenerate Gaussian r.v., therefore, r.v.’s $1,x,\dots ,{x^{k}}$ are linearly independent in the Hilbert space ${L_{2}}\left(\Omega ,\operatorname{\mathbf{P}}\right)$ of square integrable r.v.’s, and the covariance matrix D is nonsingular. Relations (38), (40), and (41) imply that a.s. as $n\to \infty $
\[ \left(\substack{\hat{c}\\ {} {\hat{\beta }_{x}}}\right)\to \left(\substack{{\Sigma _{z}^{-1}}{\Sigma _{z}}c\\ {} {D^{-1}}D{\beta _{x}}}\right)=\left(\substack{c\\ {} {\beta _{x}}}\right).\]
And the statements of Theorem 4 follow. □Similarly to Theorem 3 one can construct a consistent estimator of the mean predictor (2) in the polynomial EIV model. The strongly consistent estimator of μ is given in (28) and the one of ${\sigma _{x}^{2}}$ is constructed similarly to (29):
(42)
\[ {\hat{\sigma }_{x}^{2}}=\frac{1}{n-1}{\sum \limits_{i=1}^{n}}{\left({x_{i}}-\bar{x}\right)^{2}}.\]Theorem 5.
Assume conditions (a) to (d) and suppose that ${\sigma _{\epsilon \delta }}$ defined in (32) is the only parameter which is known in the model (30), (31). Consider the estimators (39), (28), and (42). Then
\[ {\tilde{\eta }_{0}}:={\tilde{y}_{0}}-{\sigma _{\epsilon \delta }}{\hat{\sigma }_{x}^{-2}}({x_{0}}-\hat{\mu })\]
is a strongly consistent estimator of the mean predictor (2), and moreover,
3.4 Confidence interval for response in quadratic model
Consider a quadratic EIV model It is a particular case of the model (30), (31) with $k=2$, $z=0$ and $\epsilon =0$.
We use notations (32). Our conditions are similar to (a)–(d), but we assume additionally that the reliability ratio
is separated away from zero. Thus, assume the following conditions.
-
(e) The random variables ξ, e and δ are independent; ξ and δ are Gaussian; e and δ have zero mean and ${\sigma _{e}^{2}}<\infty $; ${\sigma _{x}^{2}}>0$.
-
(f) Model parameters are unknown, but a lower bound ${K_{0}}$ for the reliability ratio (45) is given, with $0<{K_{0}}\le 1\hspace{-0.1667em}/2$.
Consider indepedent copies of the quadratic model
Based on observations $({y_{i}},{x_{i}}),\hspace{0.2778em}i=1,\dots ,n$, and for a given ${x_{0}}$, we can construct the OLS predictor ${\tilde{y}_{0}}$, see (39), for ${y_{0}}$ with $k=2$, ${z_{0}}=0$. Now, we show the way how to construct an asymptotic confidence interval for ${y_{0}}$. (In a similar way this can be done for a polynomial EIV model of higher order.)
First we write down the representation (36), (37). Denote
We have with independent ${m_{x}}$ and γ:
Then
Here $\operatorname{\mathbf{E}}\left(\left.u\right|x\right)=0$. From (3.4) we get that the best prediction is
Those coefficients can be estimated using the strongly consistent OLS estimator, cf. (38),
(47)
\[ \xi ={m_{x}}+\gamma ,\hspace{1em}\gamma \sim \mathcal{N}\left(0,K{\sigma _{\delta }^{2}}\right).\](48)
\[\begin{array}{l}\displaystyle y={\beta _{0}}+{\beta _{1}}({m_{x}}+\gamma )+{\beta _{2}}{({m_{x}}+\gamma )^{2}}+e=\hspace{2em}\hspace{2em}\hspace{2em}\hspace{2em}\hspace{2em}\hspace{2em}\hspace{2em}\hspace{1em}\\ {} \displaystyle \hspace{2em}\hspace{2em}\hspace{2em}\hspace{2.5pt}={\beta _{0}}+{\beta _{1}}{m_{x}}+{\beta _{2}}\left({m_{x}^{2}}+K{\sigma _{\delta }^{2}}\right)+u=:\hat{y}+u,\end{array}\](50)
\[\begin{array}{l}\displaystyle \hat{y}={\beta _{0x}}+{\beta _{1x}}\cdot x+{\beta _{2x}}\cdot {x^{2}},\\ {} \displaystyle {\beta _{1x}}={\beta _{1}}K+2{\beta _{2}}K(1-K)\mu ,\hspace{2.5pt}{\beta _{2x}}={\beta _{2}}\cdot {K^{2}}.\end{array}\]
\[ \left(\substack{{\hat{\beta }_{1x}}\\ {} {\hat{\beta }_{2x}}}\right)={S_{rr}^{+}}{S_{ry}},\hspace{1em}r:={(x,{x^{2}})^{T}}.\]
The OLS estimator ${\hat{\beta }_{0x}}$ satisfies
and the OLS predictor of ${y_{0}}$ is equal to
To construct a confidence interval for ${y_{0}}$, we have to bound the conditional variance of u given ${x_{0}}$. From (49) we have
Here we used the relations
\[ \operatorname{\mathbf{Var}}\left(\left.u\right|x\right)={\sigma _{\epsilon }^{2}}+{({\beta _{1}}+2{m_{x}}{\beta _{2}})^{2}}K{\sigma _{\delta }^{2}}+{\beta _{2}^{2}}\cdot 2{\left(K{\sigma _{\delta }^{2}}\right)^{2}},\]
where $2{\left(K{\sigma _{\delta }^{2}}\right)^{2}}=\operatorname{\mathbf{Var}}({\gamma ^{2}})$. Denote
\[ {m_{{u^{2}}}}=\operatorname{\mathbf{E}}\left[\operatorname{\mathbf{Var}}\left(\left.u\right|x\right)\right].\]
It holds $\hspace{2.5pt}\text{a.s. as}\hspace{2.5pt}n\to \infty $:
\[ \frac{1}{n}{\sum \limits_{i=1}^{n}}{\left({y_{i}}-{\beta _{0x}}-{\beta _{1x}}{x_{i}}-{\beta _{2x}}{x_{i}^{2}}\right)^{2}}\to {m_{{u^{2}}}}.\]
Therefore, we have $\hspace{2.5pt}\text{a.s. as}\hspace{2.5pt}n\to \infty $:
\[ {\hat{m}_{{u^{2}}}}:=\frac{1}{n}{\sum \limits_{i=1}^{n}}{\left({y_{i}}-{\hat{\beta }_{0x}}-{\hat{\beta }_{1x}}{x_{i}}-{\hat{\beta }_{2x}}{x_{i}^{2}}\right)^{2}}\to {m_{{u^{2}}}}.\]
We have to bound the difference
(52)
\[\begin{array}{l}\displaystyle \operatorname{\mathbf{Var}}\left(\left.u\right|x\right)-{m_{{u^{2}}}}=4K{\sigma _{\delta }^{2}}\left({\beta _{2}^{2}}{K^{2}}\cdot F(K,x,\mu )+{\beta _{1}}{\beta _{2}}K(x-\mu )\right),\\ {} \displaystyle F(k,x,\mu ):={x^{2}}-{\mu ^{2}}-{\sigma _{x}^{2}}+2K(1-K)\mu (x-\mu ).\end{array}\]
\[\begin{array}{l}\displaystyle {m_{x}}-\operatorname{\mathbf{E}}{m_{x}}=K(x-\mu ),\\ {} \displaystyle {m_{x}^{2}}-\operatorname{\mathbf{E}}{m_{x}^{2}}={K^{2}}-{\mu ^{2}}-{\sigma _{x}^{2}}+2K(1-K)\mu (x-\mu ).\end{array}\]
Next, we express (52) through ${\beta _{ix}}$ rather than ${\beta _{i}}$. Using (50) we get:
\[\begin{array}{l}\displaystyle {\sigma _{\delta }^{2}}={\sigma _{x}^{2}}(1-K),\\ {} \displaystyle \begin{aligned}{}& \operatorname{\mathbf{Var}}\left(\left.u\right|x\right)-{m_{{u^{2}}}}=4(1-K){\sigma _{x}^{2}}\cdot \frac{{\beta _{2x}^{2}}}{K}\left(F(K,x,\mu )-\frac{2(1-K)}{K}\mu (x-\mu )\right)+\\ {} & \hspace{2em}+4(1-K){\sigma _{x}^{2}}{\beta _{1x}}{\beta _{2x}}\cdot \frac{x-\mu }{K}\le 4\left(\frac{1}{{K_{0}}}-1\right){\sigma _{x}^{2}}\cdot G(x,\mu ,{\sigma _{x}^{2}},{\beta _{1x}},{\beta _{2x}}),\end{aligned}\\ {} \displaystyle \begin{aligned}{}& G(x,\mu ,{\sigma _{x}^{2}},{\beta _{1x}},{\beta _{2x}})={\beta _{2x}^{2}}\bigg[{x^{2}}-{\mu ^{2}}-{\sigma _{x}^{2}}+\\ {} & \hspace{2em}\hspace{2em}\hspace{2em}+2{\left(\mu (x-\mu )\right)_{-}}{(1-{K_{0}})^{2}}\left(1+\frac{1}{{K_{0}}}\right)\bigg]+{\left({\beta _{1x}}{\beta _{2x}}(x-\mu )\right)_{+}}.\end{aligned}\end{array}\]
Here ${A_{+}}:=\max (A,0)$, ${A_{-}}:=-\min (A,0)$, $A\in \mathbb{R}$. Finally,
We are ready to construct a confidence interval for ${y_{0}}$.
Theorem 6.
For the model (43)–(44), assume conditions (e) and (f). Fix the confidence probability $1-\alpha $. Define
\[\begin{aligned}{}{I_{\alpha }}& =\Bigg\{h\in \mathbb{R}:\left|h-{\tilde{y}_{0}}\right|\le {\alpha ^{-1\hspace{-0.1667em}/2}}\times \\ {} & \hspace{2em}\hspace{2em}\hspace{2em}\hspace{1em}\hspace{2.5pt}\times {\left[{\hat{m}_{{u^{2}}}}+4\left(\frac{1}{{K_{0}}}-1\right){\hat{\sigma }_{x}^{2}}G({x_{0}},\hat{\mu },{\hat{\sigma }_{x}^{2}},{\hat{\beta }_{1x}},{\hat{\beta }_{2x}})\right]_{+}^{1\hspace{-0.1667em}/2}}\Bigg\},\end{aligned}\]
where ${\hat{m}_{{u^{2}}}}$, ${\hat{\sigma }_{x}^{2}}$, $\hat{\mu }$, ${\hat{\beta }_{1x}}$ and ${\hat{\beta }_{2x}}$ are strongly consistent estimators of the corresponding parameters; the estimators were presented above. Then
Proof.
It holds for $t>0$:
\[ \operatorname{\mathbf{P}}\left(\left.\left|{y_{0}}-{\hat{y}_{0}}\right|>t\right|{x_{0}}\right)\le \frac{\operatorname{\mathbf{Var}}\left(\left.u\right|{x_{0}}\right)}{{t^{2}}}\le \alpha \]
if t is selected such that $t\ge {\alpha ^{-1\hspace{-0.1667em}/2}}{\left[\operatorname{\mathbf{Var}}\left(\left.u\right|{x_{0}}\right)\right]^{1\hspace{-0.1667em}/2}}$. Now, the statement follows from the inequality (53) and the consistency of ${\tilde{y}_{0}}$, ${\hat{m}_{{u^{2}}}}$, ${\hat{\sigma }_{x}^{2}}$, $\hat{\mu }$, ${\hat{\beta }_{1x}}$ and ${\hat{\beta }_{2x}}$. □4 Prediction in other EIV models
The OLS predictor $\tilde{y}$ approximates the best mean squared error predictor $\hat{y}$ presented in (1) not only in the plynomial EIV model. Let us consider the model with exponential regression function
where the real numbers β and λ are unknown regression parameters, and assume condition (e) from Section 3.4. Using expansion (47)–(46), we get
(56)
\[\begin{array}{l}\displaystyle {\beta _{x}}=\beta {e^{\lambda (1-K)\mu }}\cdot \operatorname{\mathbf{E}}{e^{\lambda \gamma }},\hspace{2.5pt}{\lambda _{x}}=K\lambda ,\hspace{2.5pt}\operatorname{\mathbf{E}}{e^{\lambda \gamma }}=\exp \left(\frac{{\lambda ^{2}}K{\sigma _{\delta }^{2}}}{2}\right),\\ {} \displaystyle u={\beta _{x}}{e^{{\lambda _{x}}\cdot x}}({e^{\lambda \gamma }}-\operatorname{\mathbf{E}}{e^{\lambda \gamma }}).\end{array}\]Under mild conditions, the OLS predictor ${\tilde{y}_{0}}:={\hat{\beta }_{x}}\exp \left({\hat{\lambda }_{x}}\cdot {x_{0}}\right)$ is a strongly consistent estimator of ${\hat{y}_{0}}$, where ${\hat{\beta }_{x}}$ and ${\hat{\lambda }_{x}}$ are the OLS estimators of the regression parameters in the model (54).
Similar conclusion can be made for the trigonometric model
\[ y={a_{0}}+{\sum \limits_{k=1}^{m}}\left({a_{k}}\cos k\omega \xi +{b_{k}}\sin k\omega \xi \right)+e,\hspace{2em}x=\xi +\delta ,\]
where ${a_{k}},0\le k\le m$, ${b_{k}},1\le k\le m$, and $\omega >0$ are unknown regression parameters.Finally, we give an example of the model, where the OLS predictor does not approximate the best mean squared error predictor. Let
where the real numbers β and a are unknown regression parameters, and assume condition (e) from Section 3.4; suppose also that ${\sigma _{\xi }^{2}}$ and ${\sigma _{\delta }^{2}}$ are positive.
For ${\gamma _{0}}\sim \mathcal{N}(0,1)$, evaluate
\[ F(a):=\operatorname{\mathbf{E}}|{\gamma _{0}}+a|=2\phi (a)+a(2\Phi (a)-1),\hspace{1em}a\in \mathbb{R},\]
where ϕ and Φ are the pdf and cdf of ${\gamma _{0}}$. Then the best mean squared error predictor is as follows:
\[\begin{array}{l}\displaystyle \begin{aligned}{}& \hat{y}=\operatorname{\mathbf{E}}\left(\left.y\right|x\right)=\beta \operatorname{\mathbf{E}}\bigg[\Big|a+Kx+(1-K)\mu +{\sigma _{\delta }}\sqrt{K}{\gamma _{0}}\Big|\bigg|x\bigg]=\\ {} & \hspace{2em}\hspace{2em}\hspace{2em}\hspace{2em}\hspace{2em}={\beta _{x}}F\left({k_{x}}\cdot x+{b_{x}}\right),\hspace{1em}{k_{x}}>0,\hspace{2.5pt}{\beta _{x}}\in \mathbb{R},\hspace{2.5pt}{b_{x}}\in \mathbb{R},\end{aligned}\\ {} \displaystyle {\beta _{x}}=\beta {\sigma _{\delta }}\sqrt{K},\hspace{2.5pt}{k_{x}}=\frac{\sqrt{K}}{{\sigma _{\delta }}},\hspace{2.5pt}{b_{x}}=\frac{a+(1-K)\mu }{{\sigma _{\delta }}\sqrt{K}}.\end{array}\]
The LS estimators ${\hat{k}_{x}}$, ${\hat{\beta }_{x}}$ and ${\hat{b}_{x}}$ of ${k_{x}}$, ${\beta _{x}}$ and ${b_{x}}$ minimize the penalty function
Under mild additional conditions, the LS estimators are strongly consistent, and the LS predictor
converges a.s. to ${\hat{y}_{0}}=\operatorname{\mathbf{E}}\left(\left.{y_{0}}\right|{x_{0}}\right)={\beta _{x}}F\left({b_{x}}\cdot {x_{0}}+{b_{x}}\right)$ as the sample size grows. Notice that for this model (57), the OLS predictor $\hat{\beta }\left|{x_{0}}+\hat{a}\right|$ needs not to converge in probability to ${\hat{y}_{0}}$, where the OLS estimators $\hat{\beta }$ and $\hat{a}$ minimize the penalty function
5 Conclusion
We considered structural EIV models with the classical measurement error. We gave a list of models where the OLS predictor of response ${y_{0}}$ converges with probability one to the best mean squared error predictor ${\hat{y}_{0}}=\operatorname{\mathbf{E}}\left[\left.{y_{0}}\right|{z_{0}},{x_{0}}\right]$. In such models, a functional dependence ${\hat{y}_{0}}={\hat{y}_{0}}({z_{0}},{x_{0}})$ belongs to the same parametric family as the initial regression function ${\eta _{0}}({z_{0}},{\xi _{0}})=\operatorname{\mathbf{E}}\left[\left.{y_{0}}\right|{z_{0}},{\xi _{0}}\right]$. Such a situation looks exceptional for nonlinear models, and we gave an example of model (57), where the OLS predictor does not perform well.
We dealt with both the mean and individual prediction. They coincide in the case of nondifferential errors, where it is known that the errors in response and in covariates are uncorrelated. Otherwise, to construct the mean prediction, one has to know the covariance of the errors.
In linear models, we managed to construct an asymptotic confidence region for response around the OLS prediction, under totally unknown model parameters. In the quadratic model, we did it under the known lower bound of the reliability ratio. The procedure can be expanded to polynomial models of higher order.
Notice that in linear models without intercept and in incomplete polynomial models (like, e.g. $y={\beta _{0}}+{\beta _{2}}{\xi ^{2}}+e$, $x=\xi +\delta $), a prediction with $(z,x)$ naively substituted for $(z,\xi )$ in the regression of y on $(z,x)$ can have huge prediction errors. As stated in [2, Section 2.6], predicting y from $(z,x)$ is merely a matter of substituting known values of x and z into the regression model for y on $(z,x)$. We can add that, in nonlinear EIV models, the corresponding error $v=y-\operatorname{\mathbf{E}}\left[\left.y\right|z,x\right]$ has the variance depending on x, i.e., the regression of y on $(z,x)$ is heteroskedastic; this should be taken into account in order to construct a confidence region for y in a proper way.
Finally, we make a caveat for practitioners. Consistent EIV regression parameter estimators are useful especially for prediction if the observation errors for the predicted subject differ from those in the data used for the model fitting. This is usually the case when the model is fitted by some experimental data while the prediction is made for a real world subject. The idea to use inconsistent OLS estimators for prediction in this case is not good.