1 Introduction
Regression models with measurement errors in covariates are quite popular nowadays [1, 2, 4], see also [5] for the comparison of various estimation methods in such models.
We consider a linear regression model under the presence of the classical and Berkson errors in the covariate:
Here, y is the observable response variable, ξ and x are unobservable latent variables, w is the observable surrogate variable; ε, δ and u are centred errors, ε is error in response, δ is the classical measurement error, and u is Berkson measurement error; random variables x, ε, δ and u are independent.
In model (1.1), we have a mixture of the classical and Berkson errors. Let $\mathsf{D}$ stand for the variance. Indicate two extreme cases.
Thus, the model (1.1) combines seminal models (1.2) and (1.3).
Models with a mixture of the classical and Berkson errors appear in radio-epidemiology. In [4, Section 7.2] the following measurement error model is considered:
Here, ${D_{i}^{mes}}$ is the measured individual instrumental absorbed thyroid dose for the ith person of a cohort of persons residing in Ukrainian regions that suffered from the Chornobyl accident, ${D_{i}^{tr}}$ is the corresponding true absorbed thyroid dose (i.e., the first latent variable), ${\bar{{D_{i}}}^{tr}}$ is the second latent variable; ${\sigma _{i}}{\gamma _{i}}$ is the additive classical error, ${\delta _{F,i}}$ is the multiplicative Berkson error, ${\sigma _{i}}$ is the standard deviation of the heteroscedastic classical measurement error, ${\gamma _{i}}$ is standard normal and ${\delta _{F,i}}$ is a lognormal random variable; ${\bar{{D_{i}}}^{tr}}$, ${\gamma _{i}}$ and ${\delta _{F,i}}$ are independent random variables.
(1.4)
\[ {D_{i}^{mes}}={\bar{{D_{i}}}^{tr}}+{\sigma _{i}}{\gamma _{i}},\hspace{2.5pt}{D_{i}^{tr}}={\bar{{D_{i}}}^{tr}}{\delta _{F,i}}.\]In [4], the model (1.4) is combined with the binary model which resembles a logistic one:
where ${\lambda _{i}}$ is the total incidence rate related to cases of thyroid cancer,
Here, positive regression coefficients ${\lambda _{0}}$ and $EAR$ are the background incidence rate and the excess absolute risk, respectively. In the binary model (1.5), (1.6), (1.4), the observed sample consists of pairs $({Y_{i}},{D_{i}^{mes}}),i=1,\dots ,N$, where ${Y_{i}}=1$ in the case of detected disease, and ${Y_{i}}=0$ in the absence of disease within some time interval.
(1.5)
\[ \operatorname{\mathsf{P}}({Y_{i}}=1|{D_{i}^{tr}})=\frac{{\lambda _{i}}}{1+{\lambda _{i}}},\hspace{2.5pt}\operatorname{\mathsf{P}}({Y_{i}}=0|{D_{i}^{tr}})=\frac{1}{1+{\lambda _{i}}},\]The presented linear model (1.1) is a simplified analogue of the binary measurement error model, where ξ, w and x are counterparts of ${D_{i}^{tr}}$, ${D_{i}^{mes}}$ and ${\bar{{D_{i}}}^{tr}}$, respectively, and the binary model (1.5), (1.6) is replaced with the linear regression, and the multiplicative Berkson error ${\delta _{F,i}}$ is replaced with the additive Berkson error u.
The goal of the present paper is to study asymptotic properties of estimators of model parameters in the linear regression (1.1). The modest aim is to have a better understanding of the binary model (1.5), (1.6), (1.4) and similar models.
The paper is organized as follows. In Section 2, we present the observation model in more detail, and under the normality of x and u, derive from the underlying model the one like (1.3) with the classical error only. At that we obtain consistent estimators for ${\beta _{0}}$ and ${\beta _{1}}$ which unexpectedly coincide with the adjusted least squares estimators [2, 4], constructed by ignoring Berkson error u. The proposed estimators remain consistent without the normality of x and u. Section 3 gives conditions for the asymptotic normality of the estimators, and we divide them into two asymptotically independent groups. In doing so, we reparametrize the model similarly to [3], where the basic model (1.3) was studied. Section 4 concludes our findings.
We use the following notation. The symbol $\mathsf{E}$ denotes expectation and acts as an operator on the total product of quantities, $\operatorname{\mathbf{cov}}$ stands for the covariance of two random variables and for the covariance matrix of a random vector. The upper index ⊤ denotes transposition. In the paper, all the vectors are column ones. The bar means averaging over $i=1,\dots ,n$, e.g., $\overline{a}:={n^{-1}}{\textstyle\sum _{i=1}^{n}}{a_{i}}$, $\overline{a{b^{\top }}}:={n^{-1}}{\textstyle\sum _{i=1}^{n}}{a_{i}}{b_{i}^{\top }}$. Sample covariance of random variables $\{{a_{i}},{b_{i}},\hspace{2.5pt}i=1,\dots ,n\}$ is denoted as ${S_{ab}}$, i.e. ${S_{ab}}={n^{-1}}{\textstyle\sum _{i=1}^{n}}({a_{i}}-\overline{a})({b_{i}}-\overline{b})$. Convergence with probability 1 and in distribution are denoted as $\stackrel{\text{P1}}{\to }$ and $\stackrel{\text{d}}{\to }$, respectively. A sequence of random variables that converges to zero in probability is denoted as ${o_{p}}(1)$, and a sequence of bounded in probability random variables is denoted as ${O_{p}}(1)$. ${I_{p}}$ stands for the identity $p\times p$ matrix.
2 Construction of estimators for the normal latent variable and the normal Berkson error
2.1 Model and assumptions
We consider the structural model (1.1). Denote $\mu =\mathsf{E}\xi $ and let ${\sigma _{y}^{2}}$, ${\sigma _{\xi }^{2}}$, ${\sigma _{\varepsilon }^{2}}$, ${\sigma _{w}^{2}}$, ${\sigma _{x}^{2}}$, ${\sigma _{\delta }^{2}}$ and ${\sigma _{u}^{2}}$ be the variances of y, ξ, ε, w, x, δ and u, respectively. We need the following conditions for the consistency of the proposed estimators of model parameters.
-
(i) Random variables x, ε, δ and u are independent.
-
(ii) Random variables ε, δ and u have zero expectations and finite variances, and x has a finite and positive variance ${\sigma _{x}^{2}}$.
-
(iii) Variances of ${\sigma _{\delta }^{2}}$ and ${\sigma _{u}^{2}}$ are positive and known, and other model parameters ${\beta _{0}}$, ${\beta _{1}}$, μ, ${\sigma _{\varepsilon }^{2}}$, ${\sigma _{x}^{2}}$ are unknown.
Consider independent copies of model (1.1):
\[ {y_{i}}={\beta _{0}}+{\beta _{1}}{\xi _{i}}+{\varepsilon _{i}},\hspace{2.5pt}{w_{i}}={x_{i}}+{\delta _{i}},\hspace{2.5pt}{\xi _{i}}={x_{i}}+{u_{i}},\hspace{2.5pt}i=1,2,\dots \]
Under assumption (i), this means that random vectors ${({x_{i}},{\varepsilon _{i}},{\delta _{i}},{u_{i}})^{\top }},i=1,2,\dots $, are i.i.d. and have the same distribution as ${(x,\varepsilon ,\delta ,u)^{\top }}$. Based on observations $({y_{i}}$, ${w_{i}})$, $i=1,\dots ,n$, we want to estimate the unknown model parameters.Remark 1.
We allow ${\sigma _{\varepsilon }^{2}}=0$. The corresponding model (with $\varepsilon =0$) is called data model.
Now, we explain why we impose condition (iii). The classical errors-in-variables model (1.3), with normally distributed ξ, ε and δ, and unknown 6 parameters ${\beta _{0}}$, ${\beta _{1}}$, μ, ${\sigma _{\xi }^{2}}$, ${\sigma _{\varepsilon }^{2}}$, ${\sigma _{\delta }^{2}}$ is not identifiable [2]. Hence for the model (1.1), condition (iii) assumes ${\sigma _{\delta }^{2}}$ to be known. The next statement explains why we suppose that ${\sigma _{u}^{2}}$ is known as well.
Lemma 1.
Consider the model (1.1) under conditions (i) and (ii). Let ${\sigma _{\delta }^{2}}$ be known and random variables x, ε, δ and u be Gaussian. Then this model with 6 unknown parameters ${\beta _{0}}$, ${\beta _{1}}$, μ, ${\sigma _{x}^{2}}$, ${\sigma _{\varepsilon }^{2}}$, ${\sigma _{u}^{2}}$ is not identifiable.
Proof.
The distribution of the observed Gaussian vector $Z:={(y,w)^{\top }}$ is uniquely defined by $\mathsf{E}Z$ and $C:=\operatorname{\mathbf{cov}}(Z)$. Introduce two different collections of model parameters:
In both cases it holds
Therefore, the distribution of Z is the same for both collections of parameters, and the model is not identifiable. □
Notice that under conditions of Lemma 1, the parameters ${\beta _{0}}$ and ${\beta _{1}}$ are identifiable (see [2] for the definition of an identifiable parameter). Moreover, in the next subsection we will construct consistent estimators, as $n\to \infty $, for ${\beta _{0}}$ and ${\beta _{1}}$ under the only known parameter ${\sigma _{\delta }^{2}}$.
2.2 Consistent estimators of model parameters
Now, out of (1.1) we derive a linear model with the classical error only. The conditional distribution of x given ξ is as follows [1, 4]:
$K:={\sigma _{x}^{2}}/{\sigma _{\xi }^{2}}$ is the reliability ratio [2], $0\le K\le 1$. Moreover, x can be decomposed as
where ξ, γ, ε, δ are mutually independent. Then
\[\begin{array}{l}\displaystyle w=K\xi +(1-K)\mu +\sqrt{K}{\sigma _{u}}\gamma +\delta .\\ {} \displaystyle \frac{w}{K}-\frac{1-K}{K}\mu =\xi +\frac{{\sigma _{u}}}{\sqrt{K}}\gamma +\frac{\delta }{K}.\end{array}\]
Introduce new variables
\[ z:=\frac{w}{K}-\frac{1-K}{K}\mu ,\hspace{2.5pt}\hspace{2.5pt}v:=\frac{{\sigma _{u}}}{\sqrt{K}}\gamma +\frac{\delta }{K}.\]
We derived a linear model with the classical error:
with independent ξ, ε, v and ${\sigma _{v}^{2}}:=\mathsf{D}v={\sigma _{u}^{2}}/K+{\sigma _{\delta }^{2}}/{K^{2}}$.Suppose at the moment that K is known. Then the adjusted least squares (ALS) estimator ${\widetilde{\beta }_{1}}$ of ${\beta _{1}}$ is consistent and given as [2, 4]:
When K is unknown, we can estimate it consistently as
Now, we insert (2.3) into (2.2) instead of K and obtain the desired estimator
(2.2)
\[ {\widetilde{\beta }_{1}}:=\frac{{S_{zy}}}{{S_{zz}}-{\sigma _{v}^{2}}}=\frac{\frac{1}{K}{S_{wy}}}{\frac{1}{{K^{2}}}{S_{ww}}-{\sigma _{v}^{2}}}=\frac{{S_{wy}}}{\frac{{S_{ww}}-{\sigma _{\delta }^{2}}}{K}-{\sigma _{u}^{2}}}.\](2.3)
\[ \widehat{K}=\frac{{\widehat{\sigma }_{x}^{2}}}{{\widehat{\sigma }_{x}^{2}}+{\sigma _{u}^{2}}}=\frac{{S_{ww}}-{\sigma _{\delta }^{2}}}{{S_{ww}}-{\sigma _{\delta }^{2}}+{\sigma _{u}^{2}}}.\]Next, in model (2.1) the ALS estimator of ${\beta _{0}}$ is as follows [2, 4]:
\[ {\widetilde{\beta }_{0}}:=\overline{y}-{\widetilde{\beta }_{1}}\overline{z}=\overline{y}-{\widetilde{\beta }_{1}}\bigg(\frac{\overline{w}}{K}-\frac{1-K}{K}\mu \bigg).\]
But K and μ are unknown, and instead of them we substitute the corresponding consistent estimators (2.3) and
Then ${\widetilde{\beta }_{1}}$ changes to ${\widehat{\beta }_{1}}$, and we obtain the desired estimator
It is remarkable that ${\widehat{\beta }_{0}}$ and ${\widehat{\beta }_{1}}$ are the so-called naive ALS estimators in the model (1.1), where we neglected the presence of the Berkson error u. To be precise, ${\widehat{\beta }_{0}}$ and ${\widehat{\beta }_{1}}$ are the ALS estimators for the classical model (1.3). The estimators (2.4), (2.6) use ${\sigma _{\delta }^{2}}$ but not ${\sigma _{u}^{2}}$.
In our model, we have to estimate 5 parameters ${\beta _{0}}$, ${\beta _{1}}$, μ, ${\sigma _{x}^{2}}$, ${\sigma _{\varepsilon }^{2}}$. We possess already 3 estimators (2.6), (2.4) and (2.5). Moreover, we used the estimator
Finally, in the model (2.1) the ALS estimator of ${\sigma _{\varepsilon }^{2}}$ is as follows [4]:
\[ {\widetilde{\sigma }_{\varepsilon }^{2}}={S_{yy}}-{\widetilde{\beta }_{1}}{S_{zy}}={S_{yy}}-\frac{{\widetilde{\beta }_{1}}}{K}{S_{wy}}.\]
Instead of unknown K, we substitute (2.3) and get the final estimator
Though we derived the estimators under the normality assumption (iv), they remain consistent without this restriction.
Theorem 1.
In model (1.1), assume conditions (i)–(iii). Then there exists a random number ${n_{0}}$ such that expressions (2.4), (2.6), (2.5), (2.7), (2.8) are well defined with probability 1 for all $n\ge {n_{0}}$ and yield strongly consistent estimators of ${\beta _{1}}$, ${\beta _{0}}$, μ, ${\sigma _{x}^{2}}$, ${\sigma _{\varepsilon }^{2}}$, respectively, i.e., they converge a.s. to the corresponding true values as $n\to \infty $.
Proof.
Here, we check the strong consistency of ${\widehat{\beta }_{1}}$ only. We have
\[\begin{array}{l}\displaystyle \operatorname{\mathbf{cov}}(w,y)=\operatorname{\mathbf{cov}}(x,{\beta _{1}}\xi )={\beta _{1}}\operatorname{\mathbf{cov}}(x,x+u)={\beta _{1}}{\sigma _{x}^{2}},\\ {} \displaystyle {\widehat{\beta }_{1}}\stackrel{\text{P1}}{\to }\frac{\operatorname{\mathbf{cov}}(w,y)}{\mathsf{D}w-{\sigma _{\delta }^{2}}}=\frac{{\beta _{1}}{\sigma _{x}^{2}}}{{\sigma _{x}^{2}}}={\beta _{1}},\end{array}\]
where $\stackrel{\text{P1}}{\to }$ denotes the convergence with probability 1 and indicates the strong consistency of the estimator. □3 Asymptotic normality of the estimators
3.1 Asymptotic variance of the estimator of slope coefficient
We need the following moment assumption.
Theorem 2.
Proof.
We follow the line of the proof of Theorem 2.22 [4], and use expansions of sample covariances and Slutsky’s lemma [4, p. 44]. We centralize x as
Then
Using (2.4) and expansions (3.3)–(3.5), we obtain
Next,
Using condition (v) and Central Limit Theorem, we get
Since ${\sigma _{w}^{2}}={\sigma _{x}^{2}}+{\sigma _{\delta }^{2}}$, the right-hand sides of (3.8) and (3.2) coincide. □
(3.3)
\[\begin{array}{l}\displaystyle y={\beta _{0}}+{\beta _{1}}\mu +{\beta _{1}}\rho +\varepsilon +{\beta _{1}}u,\hspace{2.5pt}\hspace{2.5pt}\hspace{2.5pt}w=\mu +\rho +\delta ,\\ {} \displaystyle {S_{wy}}={S_{\rho +\delta ,\hspace{2.5pt}{\beta _{1}}\rho +\varepsilon +{\beta _{1}}u}}={\beta _{1}}{S_{\rho \rho }}+{S_{\rho \varepsilon }}+{\beta _{1}}{S_{\rho u}}+{\beta _{1}}{S_{\delta \rho }}+{S_{\delta \varepsilon }}+{\beta _{1}}{S_{\delta u}},\end{array}\](3.6)
\[ \sqrt{n}({\widehat{\beta }_{1}}-{\beta _{1}})=\frac{-{\beta _{1}}\sqrt{n}({S_{\delta \delta }}-{\sigma _{\delta }^{2}}-{S_{u\rho }}+{S_{\delta \rho }}-{S_{u\delta }})+\sqrt{n}({S_{\rho \varepsilon }}+{S_{\delta \varepsilon }})}{{\sigma _{x}^{2}}+{o_{p}}(1)}.\]
\[\begin{array}{l}\displaystyle {S_{\delta \delta }}-{\sigma _{\delta }^{2}}=\overline{{\delta ^{2}}}-{\sigma _{\delta }^{2}}+\frac{{o_{p}}(1)}{\sqrt{n}},\\ {} \displaystyle {S_{u\rho }}=\overline{u\rho }+\frac{{o_{p}}(1)}{\sqrt{n}},\hspace{2.5pt}{S_{\delta \rho }}=\overline{\delta \rho }+\frac{{o_{p}}(1)}{\sqrt{n}},\\ {} \displaystyle {S_{u\delta }}=\overline{u\delta }+\frac{{o_{p}}(1)}{\sqrt{n}},\hspace{2.5pt}{S_{\rho \varepsilon }}=\overline{\rho \varepsilon }+\frac{{o_{p}}(1)}{\sqrt{n}},\hspace{2.5pt}{S_{\delta \varepsilon }}=\overline{\delta \varepsilon }+\frac{{o_{p}}(1)}{\sqrt{n}}.\end{array}\]
We insert these relations into (3.6) and get
(3.7)
\[ \sqrt{n}({\widehat{\beta }_{1}}-{\beta _{1}})={o_{p}}(1)+\frac{-{\beta _{1}}\sqrt{n}(\overline{{\delta ^{2}}}-{\sigma _{\delta }^{2}}-\overline{u\rho }+\overline{\delta \rho }-\overline{u\delta })+\sqrt{n}(\overline{\rho \varepsilon }+\overline{\delta \varepsilon })}{{\sigma _{x}^{2}}}.\]
\[\begin{array}{l}\displaystyle \sqrt{n}{(\overline{{\delta ^{2}}}-{\sigma _{\delta }^{2}},\overline{u\rho },\overline{\delta \rho },\overline{u\delta },\overline{\rho \varepsilon },\overline{\delta \varepsilon })^{\top }}\stackrel{\text{d}}{\to }\gamma ={({\gamma _{i}})_{1}^{6}}\sim {N_{6}}(0,S),\\ {} \displaystyle S=diag(\mathsf{D}({\delta ^{2}}),{\sigma _{u}^{2}}{\sigma _{x}^{2}},{\sigma _{\delta }^{2}}{\sigma _{x}^{2}},{\sigma _{\delta }^{2}}{\sigma _{u}^{2}},{\sigma _{x}^{2}}{\sigma _{\varepsilon }^{2}},{\sigma _{\delta }^{2}}{\sigma _{\varepsilon }^{2}}).\end{array}\]
The diagonal of S contains variances of averaged random variables, e.g., ${S_{22}}=\mathsf{D}(u\rho )=\mathsf{E}{(u\rho )^{2}}={\sigma _{u}^{2}}{\sigma _{x}^{2}}$, and off-diagonal entries of S are vanishing because δ, ρ, ε, u are independent. Then the numerator in (3.7) converges in distribution to
\[ -{\beta _{1}}({\gamma _{1}}-{\gamma _{2}}+{\gamma _{3}}-{\gamma _{4}})+{\gamma _{5}}+{\gamma _{6}}\sim N(0,{\beta _{1}^{2}}({S_{11}}+{S_{22}}+{S_{33}}+{S_{44}})+{S_{55}}+{S_{66}}).\]
Relation (3.7) and Slutsky’s lemma imply (3.1) with
(3.8)
\[ {\sigma _{{\beta _{1}}}^{2}}=\frac{{\beta _{1}^{2}}(\mathsf{D}({\delta ^{2}})+{\sigma _{u}^{2}}{\sigma _{x}^{2}}+{\sigma _{\delta }^{2}}{\sigma _{x}^{2}}+{\sigma _{u}^{2}}{\sigma _{\delta }^{2}})+{\sigma _{x}^{2}}{\sigma _{\varepsilon }^{2}}+{\sigma _{\delta }^{2}}{\sigma _{\varepsilon }^{2}}}{{\sigma _{x}^{4}}}.\]Remark 2.
The convergence (3.1), (3.2) can be applied to construct the asymptotic confidence interval for ${\beta _{1}}$. For this purpose we have to ensure that ${\sigma _{{\beta _{1}}}^{2}}>0$ (this holds if either ${\sigma _{\delta }^{2}}>0$ or ${\beta _{1}}\ne 0$) and to estimate ${\sigma _{{\beta _{1}}}^{2}}$ consistently. The latter is possible for normal δ, since all the parameters on the right-hand side of (3.9) are estimated consistently due to Theorem 1. Without the normality of δ, it is problematic to estimate the 4th moment $\mathsf{D}({\delta ^{2}})$ in (3.8). If $\mathsf{E}{\delta ^{4}}$ is assumed known, then ${\sigma _{{\beta _{1}}}^{2}}$ can be estimated consistently as well.
Analysis of formula (3.8) allows to find out in which proportion the classical error and Berkson one affect the quality of the slope estimation. Denote
\[ {\Lambda _{u}}=|{\beta _{1}}|{\sigma _{u}}{\sigma _{x}},\hspace{2.5pt}{\Lambda _{\delta }}=\sqrt{{\beta _{1}^{2}}({\sigma _{\delta }^{2}}{\sigma _{x}^{2}}+\mathsf{D}({\delta ^{2}}))+{\sigma _{\delta }^{2}}{\sigma _{\varepsilon }^{2}}},\hspace{2.5pt}{\Lambda _{u\delta }}=|{\beta _{1}}|{\sigma _{u}}{\sigma _{\delta }}.\]
Then
\[ {\sigma _{{\beta _{1}}}^{2}}={\sigma _{x}^{-4}}({\Lambda _{\delta }^{2}}+{\Lambda _{u}^{2}}+{\Lambda _{u\delta }^{2}}+{\sigma _{x}^{2}}{\sigma _{\varepsilon }^{2}}).\]
The normalized slope estimator $\sqrt{n}({\widehat{\beta }_{1}}-{\beta _{1}})$ can be approximated in distribution by a random variable
\[ {\sigma _{x}^{-2}}({\Lambda _{\delta }}{\gamma _{1}}+{\Lambda _{u}}{\gamma _{2}}+{\Lambda _{u\delta }}{\gamma _{3}}+{\sigma _{x}}{\sigma _{\varepsilon }}{\gamma _{4}}),\]
with i.i.d. standard normal ${\gamma _{1}},\dots ,{\gamma _{4}}$. Given ${\sigma _{x}^{2}}$ and ${\sigma _{\varepsilon }^{2}}$, terms ${\Lambda _{\delta }}{\gamma _{1}}$, ${\Lambda _{u}}{\gamma _{2}}$ and ${\Lambda _{u\delta }}{\gamma _{3}}$ distinguish the influence of the classical error, Berkson error and of the cumulative effect from both errors, respectively, on the precision of the slope estimator. Thus, this influence can be evaluated in proportion ${\Lambda _{\delta }}:{\Lambda _{u}}:{\Lambda _{u\delta }}$. Suppose that ${\beta _{1}}\ne 0$ and $\mathsf{E}{\delta ^{4}}$ is known. The influence can be estimated in proportion ${\widehat{\Lambda }_{\delta }}:{\widehat{\Lambda }_{u}}:{\widehat{\Lambda }_{u\delta }}$, with
\[ {\widehat{\Lambda }_{u}}:=|{\widehat{\beta }_{1}}|{\sigma _{u}}{\widehat{\sigma }_{x}},\hspace{2.5pt}{\widehat{\Lambda }_{\delta }}:=\sqrt{{\widehat{\beta }_{1}^{2}}({\sigma _{\delta }^{2}}{\widehat{\sigma }_{x}^{2}}+\mathsf{D}({\delta ^{2}}))+{\sigma _{\delta }^{2}}{\widehat{\sigma }_{\varepsilon }^{2}}},\hspace{2.5pt}{\widehat{\Lambda }_{u\delta }}:=|{\widehat{\beta }_{1}}|{\sigma _{u}}{\sigma _{\delta }}.\]
3.2 Asymptotic independence of groups of estimators
We slightly reparametrize the model (1.1) to a form
This model is obtained from (1.1) after introducing a new parameter ${\mu _{y}}={\beta _{0}}+{\beta _{1}}\mu $ in place of ${\beta _{0}}$. Based on independent copies of the model
(3.10)
\[ y={\mu _{y}}+{\beta _{1}}(\xi -\mu )+\varepsilon ,\hspace{2.5pt}w=x+\delta ,\hspace{2.5pt}\xi =x+u.\]
\[ {y_{i}}={\mu _{y}}+{\beta _{1}}({\xi _{i}}-\mu )+{\varepsilon _{i}},\hspace{2.5pt}{w_{i}}={x_{i}}+{\delta _{i}},\hspace{2.5pt}{\xi _{i}}={x_{i}}+{u_{i}}\]
(here, independent random vectors $({x_{i}},{\varepsilon _{i}},{\delta _{i}},{u_{i}}),\hspace{2.5pt}i\ge 1$, are distributed as a random vector $(x,\varepsilon ,\delta ,u)$ in (3.10)) and on observations $({y_{i}},{w_{i}}),i=1,\dots ,n$, we estimate a vector of unknown parameters
\[ \theta ={(\mu ,{\mu _{y}},{\sigma _{w}^{2}},{\beta _{1}},{\sigma _{\varepsilon }^{2}})^{\top }},\]
assuming condition (iii) which states that ${\sigma _{\delta }^{2}}$ and ${\sigma _{u}^{2}}$ are known. For model (3.10), we assume also (i), (ii) and impose two additional assumptions.Theorem 1 implies that a strongly consistent estimator
Introduce the corresponding estimation function
With probability one, the estimator (3.11) satisfies the estimating equation
\[ \widehat{\theta }={(\widehat{\mu },{\widehat{\mu }_{y}},{\widehat{\sigma }_{w}^{2}},{\widehat{\beta }_{1}},{\widehat{\sigma }_{\varepsilon }^{2}})^{\top }}\]
of θ can be defined explicitly as
(3.11)
\[ \widehat{\theta }={(\overline{w},\overline{y},{S_{ww}},\frac{{S_{wy}}}{{S_{ww}}-{\sigma _{\delta }^{2}}},{S_{yy}}-\frac{{S_{wy}^{2}}({S_{ww}}-{\sigma _{\delta }^{2}}+{\sigma _{u}^{2}})}{{({S_{ww}}-{\sigma _{\delta }^{2}})^{2}}})^{\top }}.\](3.12)
\[\begin{array}{l}\displaystyle s=s(\theta ;w,y)={({s^{\mu }},{s^{{\mu _{y}}}},{s^{{\sigma _{w}^{2}}}},{s^{{\beta _{1}}}},{s^{{\sigma _{\varepsilon }^{2}}}})^{\top }},\\ {} \displaystyle {s^{\mu }}:=w-\mu ,\hspace{2.5pt}{s^{{\mu _{y}}}}:=y-{\mu _{y}},\hspace{2.5pt}{s^{{\sigma _{w}^{2}}}}:={(w-\mu )^{2}}-{\sigma _{w}^{2}},\\ {} \displaystyle {s^{{\beta _{1}}}}:={\beta _{1}}{(w-\mu )^{2}}-{\beta _{1}}{\sigma _{\delta }^{2}}-(w-\mu )(y-{\mu _{y}}),\\ {} \displaystyle {s^{{\sigma _{\varepsilon }^{2}}}}:={(y-{\mu _{y}})^{2}}-{\sigma _{\varepsilon }^{2}}-{\beta _{1}^{2}}{(w-\mu )^{2}}+{\beta _{1}^{2}}({\sigma _{\delta }^{2}}-{\sigma _{u}^{2}}).\end{array}\]Definition 1.
Let $\widehat{\alpha }$ and $\widehat{\beta }$ be asymptotically normal estimators of $\alpha \in {\mathbb{R}^{p}}$ and $\beta \in {\mathbb{R}^{q}}$, respectively, such that
\[ \sqrt{n}\left[\begin{array}{c}\widehat{\alpha }-\alpha \\ {} \widehat{\beta }-\beta \end{array}\right]\hspace{2.5pt}\stackrel{\text{d}}{\to }\hspace{2.5pt}{N_{p+q}}(0,\Sigma )\hspace{2.5pt}\mathit{as}\hspace{2.5pt}n\to \infty ,\]
with a nonsingular asymptotic covariance matrix Σ. The estimators $\widehat{\alpha }$ and $\widehat{\beta }$ are called asymptotically independent if Σ can be partitioned as
\[ \Sigma =\textit{block-diag}({\Sigma _{\alpha }},{\Sigma _{\beta }})=\left[\begin{array}{c@{\hskip10.0pt}c}{\Sigma _{\alpha }}& 0\\ {} 0& {\Sigma _{\beta }}\end{array}\right],\]
with ${\Sigma _{\alpha }}\in {\mathbb{R}^{p\times p}}$ and ${\Sigma _{\beta }}\in {\mathbb{R}^{q\times q}}$.
It is convenient to deal with asymptotically independent estimators $\widehat{\alpha }$ and $\widehat{\beta }$, because asymptotic confidence region for the augmented parameter ${({\alpha ^{\top }}{\beta ^{\top }})^{\top }}$ can be constructed as the Cartesian product of asymptotic confidence ellipsoids for α and β.
Theorem 3.
Assume conditions (i)–(iii), (vii) and that x, δ, u have finite 4th moments. Then:
-
(b) under additional assumption (vi), groups of estimators ${(\widehat{\mu },{\widehat{\mu }_{y}})^{\top }}$ and ${({\widehat{\sigma }_{w}^{2}},{\widehat{\beta }_{1}},{\widehat{\sigma }_{\varepsilon }^{2}})^{\top }}$ are asymptotically independent.
Proof.
(a) We prove (3.13) with a nonsingular ${\Sigma ^{\theta }}$.
1. Since all the variances in the underlying model are assumed positive, the true vector θ is an inner point of the parameter set $\Theta ={\mathbb{R}^{2}}\times (0,\infty )\times \mathbb{R}\times (0,\infty )$.
As was mentioned above, $\widehat{\theta }$ is strongly consistent. The estimating function (3.12) is unbiased, i.e. ${\mathsf{E}_{\theta }}s(\theta ;w,y)=0$. Introduce two matrices
\[ V:=-{\mathsf{E}_{\theta }}\frac{\partial s(\theta ;w,y)}{\partial {\theta ^{\top }}}=\text{block-diag}({I_{2}},{V_{2}}),\]
with
and
Since ε, x, δ, u have finite 4th moments, B is well defined.The unbiasedness of $s(\theta ;w,y)$, consistency of $\widehat{\theta }$ and nonsingularity of V imply (3.13) by Theorem A.26 from [4], and ${\Sigma ^{\theta }}$ can be found by the sandwich formula
2. It remains to prove that B is nonsingular. For this purpose, we have to show that the five random variables
\[\begin{array}{l}\displaystyle {s^{\mu }}={s^{\mu }}(\theta ;w,y),\hspace{2.5pt}{s^{{\mu _{y}}}}={s^{{\mu _{y}}}}(\theta ;w,y),\hspace{2.5pt}{s^{{\sigma _{w}^{2}}}}={s^{{\sigma _{w}^{2}}}}(\theta ;w,y),\\ {} \displaystyle {s^{{\beta _{1}}}}={s^{{\beta _{1}}}}(\theta ;w,y),\hspace{2.5pt}{s^{{\sigma _{\varepsilon }^{2}}}}={s^{{\sigma _{\varepsilon }^{2}}}}(\theta ;w,y)\end{array}\]
are linearly independent for the true value of θ.Consider a random vector
It holds
(3.14)
\[ h:={(w-\mu ,y-{\mu _{y}},{(w-\mu )^{2}},(w-\mu )(y-{\mu _{y}}),{(y-{\mu _{y}})^{2}})^{\top }}.\]
\[ {({s^{\mu }},{s^{{\mu _{y}}}},{s^{{\sigma _{w}^{2}}}},{s^{{\beta _{1}}}},{s^{{\sigma _{\varepsilon }^{2}}}})^{\top }}=Th+a,\]
where $T=T(\theta )$ is a nonsingular square matrix and $a=a(\theta )$ is a nonrandom vector. The matrix T is nonsingular, hence it is enough to show that neither nontrivial linear combination of the components of h is a constant.We use the centralization $\rho =x-\mu $. Suppose that for some real numbers ${a_{11}}$, ${a_{12}}$, ${a_{22}}$, ${a_{1}}$, ${a_{2}}$ and ${a_{3}}$, the following holds with probability one:
\[\begin{array}{l}\displaystyle F:={a_{11}}{(w-\mu )^{2}}+{a_{12}}(w-\mu )(y-{\mu _{y}})+{a_{22}}{(y-{\mu _{y}})^{2}}+\\ {} \displaystyle +{a_{1}}(w-\mu )+{a_{2}}(y-{\mu _{y}})+{a_{3}}=0,\\ {} \displaystyle F={a_{11}}{(\rho +\delta )^{2}}+{a_{12}}(\rho +\delta )({\beta _{1}}\rho +{\beta _{1}}u+\varepsilon )+{a_{22}}{({\beta _{1}}\rho +{\beta _{1}}u+\varepsilon )^{2}}+\\ {} \displaystyle +{a_{1}}(\rho +\delta )+{a_{2}}({\beta _{1}}\rho +{\beta _{1}}u+\varepsilon )+{a_{3}}=0.\end{array}\]
Then a.s.
\[ 0=\mathsf{E}[F|\varepsilon ]={a_{22}}{\varepsilon ^{2}}+{a_{2}}\varepsilon +{b_{3}},\hspace{2.5pt}{b_{3}}\in \mathbb{R},\]
hence due to condition (vii), ${a_{22}}={a_{2}}=0$. And we have a.s.
Consider two cases about the support of δ.
2.1. Here we suppose that δ is not concentrated at two points. Then (3.15) implies ${a_{11}}={a_{1}}=0$. Next, a.s.
and we get the desired
2.2. Now, we suppose that for some ${\delta _{0}}\ne 0$, it holds
\[ \operatorname{\mathsf{P}}(\delta ={\delta _{0}})=\operatorname{\mathsf{P}}(\delta =-{\delta _{0}})=\frac{1}{2}.\]
Then with probability one,
\[\begin{array}{l}\displaystyle F(\rho ,\varepsilon ,{\delta _{0}},u)=F(\rho ,\varepsilon ,-{\delta _{0}},u)=0,\\ {} \displaystyle 0=F(\rho ,\varepsilon ,{\delta _{0}},u)-F(\rho ,\varepsilon ,-{\delta _{0}},u)=2{\delta _{0}}G,\\ {} \displaystyle 0=G=2{a_{11}}\rho +{a_{12}}({\beta _{1}}\rho +{\beta _{1}}u+\varepsilon )+{a_{1}},\\ {} \displaystyle 0=\mathsf{E}[G|\varepsilon ]={a_{12}}\varepsilon +{a_{1}},\hspace{2.5pt}{a_{12}}={a_{1}}=0;\\ {} \displaystyle {a_{11}}\rho =0\hspace{2.5pt}\text{a.s.},\hspace{2.5pt}{a_{11}}=0.\end{array}\]
Thus, in this case (3.16) holds as well. Statement (a) of Theorem 3 is proven.(b) Now, we rely additionally on the assumption (vi) about vanishing centered third moments. By statement (a), B is nonsingular. We have to show that it has a block-diagonal structure
with some matrices ${B_{1}}\in {\mathbb{R}^{2\times 2}}$ and ${B_{2}}\in {\mathbb{R}^{3\times 3}}$, then ${\Sigma ^{\theta }}$ will be block-diagonal as well, with nonsingular blocks:
\[ {\Sigma ^{\theta }}=\text{block-diag}({\Sigma _{1}},{\Sigma _{2}}),\hspace{2.5pt}{\Sigma _{1}}={B_{1}},\hspace{2.5pt}{\Sigma _{2}}={V_{2}^{-1}}{B_{2}}{({V_{2}^{-1}})^{T}},\]
and statement (b) of Theorem 3 will be proven.Using assumption (vi), we have:
\[\begin{array}{l}\displaystyle \operatorname{\mathbf{cov}}({s^{\mu }},{s^{{\sigma _{w}^{2}}}})=\mathsf{E}{(x-\mu )^{3}}+\mathsf{E}{\delta ^{3}}=0;\\ {} \displaystyle \operatorname{\mathbf{cov}}({s^{\mu }},{s^{{\beta _{1}}}})=\operatorname{\mathbf{cov}}(w-\mu ,{\beta _{1}}{(w-\mu )^{2}})-\operatorname{\mathbf{cov}}(w-\mu ,(w-\mu )(y-{\mu _{y}}))=\\ {} \displaystyle =-\mathsf{E}{(\rho +\delta )^{2}}({\beta _{1}}\rho +{\beta _{1}}u+\varepsilon )=-{\beta _{1}}\mathsf{E}{\rho ^{3}}=0;\\ {} \displaystyle \operatorname{\mathbf{cov}}({s^{\mu }},{s^{{\sigma _{\varepsilon }^{2}}}})=\operatorname{\mathbf{cov}}(w-\mu ,{(y-{\mu _{y}})^{2}})-{\beta _{1}^{2}}\operatorname{\mathbf{cov}}(w-\mu ,{(w-\mu )^{2}})=\\ {} \displaystyle =\mathsf{E}(w-\mu ){(y-{\mu _{y}})^{2}}=\mathsf{E}(\rho +\delta ){({\beta _{1}}\rho +{\beta _{1}}u+\varepsilon )^{2}}={\beta _{1}^{2}}\mathsf{E}{\rho ^{3}}=0;\\ {} \displaystyle \operatorname{\mathbf{cov}}({s^{{\mu _{y}}}},{s^{{\sigma _{w}^{2}}}})={\beta _{1}}\mathsf{E}{\rho ^{3}}=0;\\ {} \displaystyle \operatorname{\mathbf{cov}}({s^{{\mu _{y}}}},{s^{{\beta _{1}}}})={\beta _{1}}\mathsf{E}{(w-\mu )^{2}}(y-{\mu _{y}})-\mathsf{E}(w-\mu ){(y-{\mu _{y}})^{2}}=\\ {} \displaystyle ={\beta _{1}^{2}}\mathsf{E}{\rho ^{3}}-{\beta _{1}^{2}}\mathsf{E}{\rho ^{3}}=0;\\ {} \displaystyle \operatorname{\mathbf{cov}}({s^{{\mu _{y}}}},{s^{{\sigma _{\varepsilon }^{2}}}})=\mathsf{E}{(y-{\mu _{y}})^{3}}-{\beta _{1}^{2}}\mathsf{E}{(w-\mu )^{2}}(y-{\mu _{y}})=\\ {} \displaystyle =\mathsf{E}{({\beta _{1}}\rho +{\beta _{1}}u+\varepsilon )^{3}}-{\beta _{1}^{3}}\mathsf{E}{\rho ^{3}}={\beta _{1}^{3}}\mathsf{E}{u^{3}}+\mathsf{E}{\varepsilon ^{3}}=0.\end{array}\]
This proves relation (3.17). □Remark 3.
Theorem 3 is not valid without condition (vii). Indeed, suppose that for some ${\varepsilon _{0}}\ne 0$,
\[ \operatorname{\mathsf{P}}(\varepsilon ={\varepsilon _{0}})=\operatorname{\mathsf{P}}(\varepsilon =-{\varepsilon _{0}})=\frac{1}{2}.\]
If additionally ${\beta _{1}}=0$ then ${(y-{\mu _{y}})^{2}}={\varepsilon ^{2}}={\varepsilon _{0}^{2}}$ a.s., and
Thus, certain nontrivial linear combination of components of vector (3.14) is a constant, hence the block ${B_{2}}$ in (3.17) is singular, and the asymptotic covariance matrix ${\Sigma ^{\theta }}$ is degenerate in this specific case.4 Simulation study
We simulated test data in order to evaluate the cover probability for the asymptotic confidence interval of the slope parameter, which is constructed based on Theorem 2. Observations in model (1.1) were generated as follows: $x\sim N(-1,1)$, $u\sim N(0,{\sigma _{u}^{2}})$ with ${\sigma _{u}^{2}}\in \{10i:i=1,\dots ,15\}$, $\delta \sim N(0,1)$, $\varepsilon \sim N(0,1)$, ${\beta _{1}}=2$, ${\beta _{0}}=1$, with the sample size $n\in \{10i:i=1,\dots ,10\}$. For each collection of model parameters, $N=10,000$ realizations were generated. For each realization, the slope estimate and the estimate of its asymptotic variance were computed (here, we inserted into (3.2) the estimates of all unknown model parameters). For an ensemble of N realizations, the cover probability was calculated for constructed 95% asymptotic confidence intervals for the slope parameter. We briefly report the obtained results.
Figure 1 shows how the cover probability deviation decreases from 0.95 with increase of the sample size. This effect is stable for different values of the Berkson error variance. Figure 2 illustrates how the cover probability deviation increases from 0.95 with increase of the Berkson error variance. As can be seen in Figure 1, the latter effect is getting weaker with increase of the sample size.
5 Conclusion
We dealt with a linear observation model (1.1) with a mixture of the classical and Berkson errors in the covariate. Surprisingly enough, we constructed consistent estimators for the regression parameters without the knowledge of the variance of the Berkson error. Nevertheless, the size of the Berkson error makes influence on the asymptotic variances of ${\widehat{\beta }_{0}}$ and ${\widehat{\beta }_{1}}$.
Then we modified the model to an equivalent centralized form (3.10). This made possible to divide estimators of all unknown model parameters into two asymptotically independent groups.
In future we intend to consider the prediction problem for the model (1.1), like it was done in [6] for various measurement error models. Also it would be interesting to consider a polynomial model with a mixture of the classical and Berkson errors, as well as a version of linear model with a vector response and vector covariate.