Introduction

VMSTA

Modern Stochastics: Theory and Applications

2351-60542351-6046

2351-6046

VTeX

Mokslininkų g. 2A, 08412 Vilnius, Lithuania

VMSTA186

10.15559/21-VMSTA186

Research Article

Estimation in a linear errors-in-variables model under a mixture of classical and Berkson errors

https://orcid.org/0000-0003-0153-3047

Yakovliev

Mykyta

mykyta.yakovliev@gmail.com∗

https://orcid.org/0000-0002-4143-8928

Kukush

Alexander

alexander_kukush@univ.keiv.ua Taras Shevchenko National University of Kyiv, Kyiv, Ukraine

∗Corresponding author.

2021

2672021

83373386103202116620212862021

2021

Open access article under the CC BY license.

A linear structural regression model is studied, where the covariate is observed with a mixture of the classical and Berkson measurement errors. Both variances of the classical and Berkson errors are assumed known. Without normality assumptions, consistent estimators of model parameters are constructed and conditions for their asymptotic normality are given. The estimators are divided into two asymptotically independent groups.

Keywords Linear errors-in-variables model mixture of the classical and Berkson errors consistent estimators asymptotically independent estimators

2010 MSC 62505 62H12

Research is supported by the National Research Fund of Ukraine grant 2020.02/0026.

1 Introduction

Regression models with measurement errors in covariates are quite popular nowadays [1, 2, 4], see also [5] for the comparison of various estimation methods in such models.

We consider a linear regression model under the presence of the classical and Berkson errors in the covariate: (1.1) y = β 0 + β 1 ξ + ε , w = x + δ , ξ = x + u . Here, y is the observable response variable, ξ and x are unobservable latent variables, w is the observable surrogate variable; ε, δ and u are centred errors, ε is error in response, δ is the classical measurement error, and u is Berkson measurement error; random variables x, ε, δ and u are independent.

In model (1.1), we have a mixture of the classical and Berkson errors. Let D stand for the variance. Indicate two extreme cases. (a)

D δ = 0, then δ = 0, and (1.1) yields a linear model with Berkson error [1, 2] (1.2) y = β 0 + β 1 ξ + ε , ξ = w + u .

(b)

D u = 0, then u = 0, and (1.1) yields a linear model with the classical error [1, 2] (1.3) y = β 0 + β 1 ξ + ε , w = ξ + δ .

Thus, the model (1.1) combines seminal models (1.2) and (1.3).

Models with a mixture of the classical and Berkson errors appear in radio-epidemiology. In [4, Section 7.2] the following measurement error model is considered: (1.4) D i m e s = D i ¯ t r + σ i γ i , D i t r = D i ¯ t r δ F , i . Here, D i m e s is the measured individual instrumental absorbed thyroid dose for the ith person of a cohort of persons residing in Ukrainian regions that suffered from the Chornobyl accident, D i t r is the corresponding true absorbed thyroid dose (i.e., the first latent variable), D i ¯ t r is the second latent variable; σ i γ i is the additive classical error, δ F , i is the multiplicative Berkson error, σ i is the standard deviation of the heteroscedastic classical measurement error, γ i is standard normal and δ F , i is a lognormal random variable; D i ¯ t r , γ i and δ F , i are independent random variables.

In [4], the model (1.4) is combined with the binary model which resembles a logistic one: (1.5) P ( Y i = 1 | D i t r ) = λ i 1 + λ i , P ( Y i = 0 | D i t r ) = 1 1 + λ i , where λ i is the total incidence rate related to cases of thyroid cancer, (1.6) λ i = λ 0 + E A R · D i t r . Here, positive regression coefficients λ 0 and E A R are the background incidence rate and the excess absolute risk, respectively. In the binary model (1.5), (1.6), (1.4), the observed sample consists of pairs ( Y i , D i m e s ) , i = 1 , … , N, where Y i = 1 in the case of detected disease, and Y i = 0 in the absence of disease within some time interval.

The presented linear model (1.1) is a simplified analogue of the binary measurement error model, where ξ, w and x are counterparts of D i t r , D i m e s and D i ¯ t r , respectively, and the binary model (1.5), (1.6) is replaced with the linear regression, and the multiplicative Berkson error δ F , i is replaced with the additive Berkson error u.

The goal of the present paper is to study asymptotic properties of estimators of model parameters in the linear regression (1.1). The modest aim is to have a better understanding of the binary model (1.5), (1.6), (1.4) and similar models.

The paper is organized as follows. In Section 2, we present the observation model in more detail, and under the normality of x and u, derive from the underlying model the one like (1.3) with the classical error only. At that we obtain consistent estimators for β 0 and β 1 which unexpectedly coincide with the adjusted least squares estimators [2, 4], constructed by ignoring Berkson error u. The proposed estimators remain consistent without the normality of x and u. Section 3 gives conditions for the asymptotic normality of the estimators, and we divide them into two asymptotically independent groups. In doing so, we reparametrize the model similarly to [3], where the basic model (1.3) was studied. Section 4 concludes our findings.

We use the following notation. The symbol E denotes expectation and acts as an operator on the total product of quantities, cov stands for the covariance of two random variables and for the covariance matrix of a random vector. The upper index ⊤ denotes transposition. In the paper, all the vectors are column ones. The bar means averaging over i = 1 , … , n, e.g., a ‾ : = n − 1 ∑ i = 1 n a i , a b ⊤ ‾ : = n − 1 ∑ i = 1 n a i b i ⊤ . Sample covariance of random variables { a i , b i , i = 1 , … , n } is denoted as S a b , i.e. S a b = n − 1 ∑ i = 1 n ( a i − a ‾ ) ( b i − b ‾ ). Convergence with probability 1 and in distribution are denoted as → P1 and → d , respectively. A sequence of random variables that converges to zero in probability is denoted as o p ( 1 ), and a sequence of bounded in probability random variables is denoted as O p ( 1 ). I p stands for the identity p × p matrix.

2 Construction of estimators for the normal latent variable and the normal Berkson error 2.1 Model and assumptions

We consider the structural model (1.1). Denote μ = E ξ and let σ y 2 , σ ξ 2 , σ ε 2 , σ w 2 , σ x 2 , σ δ 2 and σ u 2 be the variances of y, ξ, ε, w, x, δ and u, respectively. We need the following conditions for the consistency of the proposed estimators of model parameters.

(i)

Random variables x, ε, δ and u are independent.

(ii)

Random variables ε, δ and u have zero expectations and finite variances, and x has a finite and positive variance σ x 2 .

(iii)

Variances of σ δ 2 and σ u 2 are positive and known, and other model parameters β 0 , β 1 , μ, σ ε 2 , σ x 2 are unknown.

Consider independent copies of model (1.1): y i = β 0 + β 1 ξ i + ε i , w i = x i + δ i , ξ i = x i + u i , i = 1 , 2 , … Under assumption (i), this means that random vectors ( x i , ε i , δ i , u i ) ⊤ , i = 1 , 2 , …, are i.i.d. and have the same distribution as ( x , ε , δ , u ) ⊤ . Based on observations ( y i , w i ), i = 1 , … , n, we want to estimate the unknown model parameters.

Remark 1.

We allow σ ε 2 = 0. The corresponding model (with ε = 0) is called data model.

Now, we explain why we impose condition (iii). The classical errors-in-variables model (1.3), with normally distributed ξ, ε and δ, and unknown 6 parameters β 0 , β 1 , μ, σ ξ 2 , σ ε 2 , σ δ 2 is not identifiable [2]. Hence for the model (1.1), condition (iii) assumes σ δ 2 to be known. The next statement explains why we suppose that σ u 2 is known as well.

Lemma 1.

Consider the model (1.1) under conditions (i) and (ii). Let σ δ 2 be known and random variables x, ε, δ and u be Gaussian. Then this model with 6 unknown parameters β 0 , β 1 , μ, σ x 2 , σ ε 2 , σ u 2 is not identifiable.

Proof.

The distribution of the observed Gaussian vector Z : = ( y , w ) ⊤ is uniquely defined by E Z and C : = cov ( Z ). Introduce two different collections of model parameters: (a)

β 0 = 0, β 1 = 1, μ = 0, σ x 2 = 1, σ ε 2 = 1, σ u 2 = 1, and

(b)

β 0 = 0, β 1 = 1, μ = 0, σ x 2 = 1, σ ε 2 = 0.5, σ u 2 = 1.5.

In both cases it holds E Z = 0 , C = 3 1 1 3 + σ δ 2 .

Therefore, the distribution of Z is the same for both collections of parameters, and the model is not identifiable. □

Notice that under conditions of Lemma 1, the parameters β 0 and β 1 are identifiable (see [2] for the definition of an identifiable parameter). Moreover, in the next subsection we will construct consistent estimators, as n → ∞, for β 0 and β 1 under the only known parameter σ δ 2 .

2.2 Consistent estimators of model parameters

Now, besides conditions (i) to (iii), we assume the following.

(iv)

Random variables x and u are Gaussian.

Now, out of (1.1) we derive a linear model with the classical error only. The conditional distribution of x given ξ is as follows [1, 4]: L ( x | ξ ) = N ( K ξ + ( 1 − K ) μ , K σ u 2 ) , K : = σ x 2 / σ ξ 2 is the reliability ratio [2], 0 ≤ K ≤ 1. Moreover, x can be decomposed as x = K ξ + ( 1 − K ) μ + K σ u γ , γ ∼ N ( 0 , 1 ) , where ξ, γ, ε, δ are mutually independent. Then w = K ξ + ( 1 − K ) μ + K σ u γ + δ . w K − 1 − K K μ = ξ + σ u K γ + δ K . Introduce new variables z : = w K − 1 − K K μ , v : = σ u K γ + δ K . We derived a linear model with the classical error: (2.1) y = β 0 + β 1 ξ + ε , z = ξ + v , with independent ξ, ε, v and σ v 2 : = D v = σ u 2 / K + σ δ 2 / K 2 .

Suppose at the moment that K is known. Then the adjusted least squares (ALS) estimator β ˜ 1 of β 1 is consistent and given as [2, 4]: (2.2) β ˜ 1 : = S z y S z z − σ v 2 = 1 K S w y 1 K 2 S w w − σ v 2 = S w y S w w − σ δ 2 K − σ u 2 . When K is unknown, we can estimate it consistently as (2.3) K ˆ = σ ˆ x 2 σ ˆ x 2 + σ u 2 = S w w − σ δ 2 S w w − σ δ 2 + σ u 2 . Now, we insert (2.3) into (2.2) instead of K and obtain the desired estimator (2.4) β ˆ 1 = S w y S w w − σ δ 2 .

Next, in model (2.1) the ALS estimator of β 0 is as follows [2, 4]: β ˜ 0 : = y ‾ − β ˜ 1 z ‾ = y ‾ − β ˜ 1 ( w ‾ K − 1 − K K μ ) . But K and μ are unknown, and instead of them we substitute the corresponding consistent estimators (2.3) and (2.5) μ ˆ : = w ‾ . Then β ˜ 1 changes to β ˆ 1 , and we obtain the desired estimator (2.6) β ˆ 0 = y ‾ − β ˆ 1 ( w ‾ K ˆ − 1 − K ˆ K ˆ w ‾ ) = y ‾ − β ˆ 1 w ‾ .

It is remarkable that β ˆ 0 and β ˆ 1 are the so-called naive ALS estimators in the model (1.1), where we neglected the presence of the Berkson error u. To be precise, β ˆ 0 and β ˆ 1 are the ALS estimators for the classical model (1.3). The estimators (2.4), (2.6) use σ δ 2 but not σ u 2 .

In our model, we have to estimate 5 parameters β 0 , β 1 , μ, σ x 2 , σ ε 2 . We possess already 3 estimators (2.6), (2.4) and (2.5). Moreover, we used the estimator (2.7) σ ˆ x 2 = S w w − σ δ 2 . Finally, in the model (2.1) the ALS estimator of σ ε 2 is as follows [4]: σ ˜ ε 2 = S y y − β ˜ 1 S z y = S y y − β ˜ 1 K S w y . Instead of unknown K, we substitute (2.3) and get the final estimator (2.8) σ ˆ ε 2 : = S y y − β ˆ 1 S w y ( S w w − σ δ 2 + σ u 2 ) S w w − σ δ 2 .

Though we derived the estimators under the normality assumption (iv), they remain consistent without this restriction. Theorem 1.

In model (1.1), assume conditions (i)–(iii). Then there exists a random number n 0 such that expressions (2.4), (2.6), (2.5), (2.7), (2.8) are well defined with probability 1 for all n ≥ n 0 and yield strongly consistent estimators of β 1 , β 0 , μ, σ x 2 , σ ε 2 , respectively, i.e., they converge a.s. to the corresponding true values as n → ∞.

Proof.

Here, we check the strong consistency of β ˆ 1 only. We have cov ( w , y ) = cov ( x , β 1 ξ ) = β 1 cov ( x , x + u ) = β 1 σ x 2 , β ˆ 1 → P1 cov ( w , y ) D w − σ δ 2 = β 1 σ x 2 σ x 2 = β 1 , where → P1 denotes the convergence with probability 1 and indicates the strong consistency of the estimator. □

3 Asymptotic normality of the estimators 3.1 Asymptotic variance of the estimator of slope coefficient

We need the following moment assumption.

(v)

E δ 4 < ∞.

Theorem 2.

Assume conditions (i), (ii), (v), and suppose that σ δ 2 is known and positive. Then the estimator (2.4) is asymptotically normal, in more detail, (3.1) n ( β ˆ 1 − β 1 ) → d N ( 0 , σ β 1 2 ) , where (3.2) σ β 1 2 : = β 1 2 ( σ δ 2 σ x 2 + D ( δ 2 ) + σ u 2 σ w 2 ) + σ ε 2 σ w 2 σ x 4 .

Proof.

We follow the line of the proof of Theorem 2.22 [4], and use expansions of sample covariances and Slutsky’s lemma [4, p. 44]. We centralize x as ρ : = x − μ . Then (3.3) y = β 0 + β 1 μ + β 1 ρ + ε + β 1 u , w = μ + ρ + δ , S w y = S ρ + δ , β 1 ρ + ε + β 1 u = β 1 S ρ ρ + S ρ ε + β 1 S ρ u + β 1 S δ ρ + S δ ε + β 1 S δ u , (3.4) S w w = S ρ + δ , ρ + δ = S ρ ρ + 2 S ρ δ + S δ δ , (3.5) S w w − σ δ 2 = σ x 2 + o p ( 1 ) . Using (2.4) and expansions (3.3)–(3.5), we obtain (3.6) n ( β ˆ 1 − β 1 ) = − β 1 n ( S δ δ − σ δ 2 − S u ρ + S δ ρ − S u δ ) + n ( S ρ ε + S δ ε ) σ x 2 + o p ( 1 ) . Next, S δ δ − σ δ 2 = δ 2 ‾ − σ δ 2 + o p ( 1 ) n , S u ρ = u ρ ‾ + o p ( 1 ) n , S δ ρ = δ ρ ‾ + o p ( 1 ) n , S u δ = u δ ‾ + o p ( 1 ) n , S ρ ε = ρ ε ‾ + o p ( 1 ) n , S δ ε = δ ε ‾ + o p ( 1 ) n . We insert these relations into (3.6) and get (3.7) n ( β ˆ 1 − β 1 ) = o p ( 1 ) + − β 1 n ( δ 2 ‾ − σ δ 2 − u ρ ‾ + δ ρ ‾ − u δ ‾ ) + n ( ρ ε ‾ + δ ε ‾ ) σ x 2 . Using condition (v) and Central Limit Theorem, we get n ( δ 2 ‾ − σ δ 2 , u ρ ‾ , δ ρ ‾ , u δ ‾ , ρ ε ‾ , δ ε ‾ ) ⊤ → d γ = ( γ i ) 1 6 ∼ N 6 ( 0 , S ) , S = d i a g ( D ( δ 2 ) , σ u 2 σ x 2 , σ δ 2 σ x 2 , σ δ 2 σ u 2 , σ x 2 σ ε 2 , σ δ 2 σ ε 2 ) . The diagonal of S contains variances of averaged random variables, e.g., S 22 = D ( u ρ ) = E ( u ρ ) 2 = σ u 2 σ x 2 , and off-diagonal entries of S are vanishing because δ, ρ, ε, u are independent. Then the numerator in (3.7) converges in distribution to − β 1 ( γ 1 − γ 2 + γ 3 − γ 4 ) + γ 5 + γ 6 ∼ N ( 0 , β 1 2 ( S 11 + S 22 + S 33 + S 44 ) + S 55 + S 66 ) . Relation (3.7) and Slutsky’s lemma imply (3.1) with (3.8) σ β 1 2 = β 1 2 ( D ( δ 2 ) + σ u 2 σ x 2 + σ δ 2 σ x 2 + σ u 2 σ δ 2 ) + σ x 2 σ ε 2 + σ δ 2 σ ε 2 σ x 4 . Since σ w 2 = σ x 2 + σ δ 2 , the right-hand sides of (3.8) and (3.2) coincide. □

Remark 2.

If condition (v) is replaced with the assumption δ ∼ N ( 0 , σ δ 2 ), then D ( δ 2 ) = 2 σ δ 4 and (3.2) is simplified as (3.9) σ β 1 2 = β 1 2 ( σ δ 4 + σ δ 2 σ w 2 + σ u 2 σ w 2 ) + σ w 2 σ ε 2 σ x 4 .

The convergence (3.1), (3.2) can be applied to construct the asymptotic confidence interval for β 1 . For this purpose we have to ensure that σ β 1 2 > 00$]]> (this holds if either σ δ 2 > 00$]]> or β 1 ≠ 0) and to estimate σ β 1 2 consistently. The latter is possible for normal δ, since all the parameters on the right-hand side of (3.9) are estimated consistently due to Theorem 1. Without the normality of δ, it is problematic to estimate the 4th moment D ( δ 2 ) in (3.8). If E δ 4 is assumed known, then σ β 1 2 can be estimated consistently as well.

Analysis of formula (3.8) allows to find out in which proportion the classical error and Berkson one affect the quality of the slope estimation. Denote Λ u = | β 1 | σ u σ x , Λ δ = β 1 2 ( σ δ 2 σ x 2 + D ( δ 2 ) ) + σ δ 2 σ ε 2 , Λ u δ = | β 1 | σ u σ δ . Then σ β 1 2 = σ x − 4 ( Λ δ 2 + Λ u 2 + Λ u δ 2 + σ x 2 σ ε 2 ) . The normalized slope estimator n ( β ˆ 1 − β 1 ) can be approximated in distribution by a random variable σ x − 2 ( Λ δ γ 1 + Λ u γ 2 + Λ u δ γ 3 + σ x σ ε γ 4 ) , with i.i.d. standard normal γ 1 , … , γ 4 . Given σ x 2 and σ ε 2 , terms Λ δ γ 1 , Λ u γ 2 and Λ u δ γ 3 distinguish the influence of the classical error, Berkson error and of the cumulative effect from both errors, respectively, on the precision of the slope estimator. Thus, this influence can be evaluated in proportion Λ δ : Λ u : Λ u δ . Suppose that β 1 ≠ 0 and E δ 4 is known. The influence can be estimated in proportion Λ ˆ δ : Λ ˆ u : Λ ˆ u δ , with Λ ˆ u : = | β ˆ 1 | σ u σ ˆ x , Λ ˆ δ : = β ˆ 1 2 ( σ δ 2 σ ˆ x 2 + D ( δ 2 ) ) + σ δ 2 σ ˆ ε 2 , Λ ˆ u δ : = | β ˆ 1 | σ u σ δ .

3.2 Asymptotic independence of groups of estimators

We slightly reparametrize the model (1.1) to a form (3.10) y = μ y + β 1 ( ξ − μ ) + ε , w = x + δ , ξ = x + u . This model is obtained from (1.1) after introducing a new parameter μ y = β 0 + β 1 μ in place of β 0 . Based on independent copies of the model y i = μ y + β 1 ( ξ i − μ ) + ε i , w i = x i + δ i , ξ i = x i + u i (here, independent random vectors ( x i , ε i , δ i , u i ) , i ≥ 1, are distributed as a random vector ( x , ε , δ , u ) in (3.10)) and on observations ( y i , w i ) , i = 1 , … , n, we estimate a vector of unknown parameters θ = ( μ , μ y , σ w 2 , β 1 , σ ε 2 ) ⊤ , assuming condition (iii) which states that σ δ 2 and σ u 2 are known. For model (3.10), we assume also (i), (ii) and impose two additional assumptions.

(vi)

x, δ, u and ε have zero skewness, i.e., their centered third moments are zeros.

(vii)

σ ε 2 > 00$]]>, E ε 4 < ∞ and the distribution of ε is not concentrated at two points.

Theorem 1 implies that a strongly consistent estimator θ ˆ = ( μ ˆ , μ ˆ y , σ ˆ w 2 , β ˆ 1 , σ ˆ ε 2 ) ⊤ of θ can be defined explicitly as (3.11) θ ˆ = ( w ‾ , y ‾ , S w w , S w y S w w − σ δ 2 , S y y − S w y 2 ( S w w − σ δ 2 + σ u 2 ) ( S w w − σ δ 2 ) 2 ) ⊤ . Introduce the corresponding estimation function (3.12) s = s ( θ ; w , y ) = ( s μ , s μ y , s σ w 2 , s β 1 , s σ ε 2 ) ⊤ , s μ : = w − μ , s μ y : = y − μ y , s σ w 2 : = ( w − μ ) 2 − σ w 2 , s β 1 : = β 1 ( w − μ ) 2 − β 1 σ δ 2 − ( w − μ ) ( y − μ y ) , s σ ε 2 : = ( y − μ y ) 2 − σ ε 2 − β 1 2 ( w − μ ) 2 + β 1 2 ( σ δ 2 − σ u 2 ) . With probability one, the estimator (3.11) satisfies the estimating equation ∑ i = 1 n s ( θ ˆ ; w i , y i ) = 0 .

Definition 1.

Let α ˆ and β ˆ be asymptotically normal estimators of α ∈ R p and β ∈ R q , respectively, such that n α ˆ − α β ˆ − β → d N p + q ( 0 , Σ ) as n → ∞ , with a nonsingular asymptotic covariance matrix Σ. The estimators α ˆ and β ˆ are called asymptotically independent if Σ can be partitioned as Σ = block-diag ( Σ α , Σ β ) = Σ α 0 0 Σ β , with Σ α ∈ R p × p and Σ β ∈ R q × q .

It is convenient to deal with asymptotically independent estimators α ˆ and β ˆ, because asymptotic confidence region for the augmented parameter ( α ⊤ β ⊤ ) ⊤ can be constructed as the Cartesian product of asymptotic confidence ellipsoids for α and β. Theorem 3.

Assume conditions (i)–(iii), (vii) and that x, δ, u have finite 4th moments. Then: (a)

the estimator (3.11) in model (3.10) is asymptotically normal, in more detail, (3.13) n ( θ ˆ − θ ) → d N 5 ( 0 , Σ θ ) with a nonsingular asymptotic covariance matrix Σ θ ;

(b)

under additional assumption (vi), groups of estimators ( μ ˆ , μ ˆ y ) ⊤ and ( σ ˆ w 2 , β ˆ 1 , σ ˆ ε 2 ) ⊤ are asymptotically independent.

Proof.

(a) We prove (3.13) with a nonsingular Σ θ .

1. Since all the variances in the underlying model are assumed positive, the true vector θ is an inner point of the parameter set Θ = R 2 × ( 0 , ∞ ) × R × ( 0 , ∞ ).

As was mentioned above, θ ˆ is strongly consistent. The estimating function (3.12) is unbiased, i.e. E θ s ( θ ; w , y ) = 0. Introduce two matrices V : = − E θ ∂ s ( θ ; w , y ) ∂ θ ⊤ = block-diag ( I 2 , V 2 ) , with V 2 : = 1 0 0 0 − σ x 2 0 0 2 β 1 ( σ x 2 + σ u 2 ) 1 , and B : = cov θ ( s ( θ ; w , y ) ) . Since ε, x, δ, u have finite 4th moments, B is well defined.

The unbiasedness of s ( θ ; w , y ), consistency of θ ˆ and nonsingularity of V imply (3.13) by Theorem A.26 from [4], and Σ θ can be found by the sandwich formula Σ θ = V − 1 B ( V − 1 ) ⊤ .

2. It remains to prove that B is nonsingular. For this purpose, we have to show that the five random variables s μ = s μ ( θ ; w , y ) , s μ y = s μ y ( θ ; w , y ) , s σ w 2 = s σ w 2 ( θ ; w , y ) , s β 1 = s β 1 ( θ ; w , y ) , s σ ε 2 = s σ ε 2 ( θ ; w , y ) are linearly independent for the true value of θ.

Consider a random vector (3.14) h : = ( w − μ , y − μ y , ( w − μ ) 2 , ( w − μ ) ( y − μ y ) , ( y − μ y ) 2 ) ⊤ . It holds ( s μ , s μ y , s σ w 2 , s β 1 , s σ ε 2 ) ⊤ = T h + a , where T = T ( θ ) is a nonsingular square matrix and a = a ( θ ) is a nonrandom vector. The matrix T is nonsingular, hence it is enough to show that neither nontrivial linear combination of the components of h is a constant.

We use the centralization ρ = x − μ. Suppose that for some real numbers a 11 , a 12 , a 22 , a 1 , a 2 and a 3 , the following holds with probability one: F : = a 11 ( w − μ ) 2 + a 12 ( w − μ ) ( y − μ y ) + a 22 ( y − μ y ) 2 + + a 1 ( w − μ ) + a 2 ( y − μ y ) + a 3 = 0 , F = a 11 ( ρ + δ ) 2 + a 12 ( ρ + δ ) ( β 1 ρ + β 1 u + ε ) + a 22 ( β 1 ρ + β 1 u + ε ) 2 + + a 1 ( ρ + δ ) + a 2 ( β 1 ρ + β 1 u + ε ) + a 3 = 0 . Then a.s. 0 = E [ F | ε ] = a 22 ε 2 + a 2 ε + b 3 , b 3 ∈ R , hence due to condition (vii), a 22 = a 2 = 0. And we have a.s. (3.15) 0 = E [ F | δ ] = a 11 δ 2 + a 1 δ + c 3 , c 3 ∈ R .

Consider two cases about the support of δ.

2.1. Here we suppose that δ is not concentrated at two points. Then (3.15) implies a 11 = a 1 = 0. Next, a.s. 0 = E [ F | ε , δ ] = a 12 δ ε + d 3 , d 3 ∈ R , 0 = D ( a 12 δ ε ) = a 12 2 σ δ 2 σ ε 2 , a 12 = 0 , and we get the desired (3.16) a 11 = a 12 = a 22 = a 1 = a 2 = 0 .

2.2. Now, we suppose that for some δ 0 ≠ 0, it holds P ( δ = δ 0 ) = P ( δ = − δ 0 ) = 1 2 . Then with probability one, F ( ρ , ε , δ 0 , u ) = F ( ρ , ε , − δ 0 , u ) = 0 , 0 = F ( ρ , ε , δ 0 , u ) − F ( ρ , ε , − δ 0 , u ) = 2 δ 0 G , 0 = G = 2 a 11 ρ + a 12 ( β 1 ρ + β 1 u + ε ) + a 1 , 0 = E [ G | ε ] = a 12 ε + a 1 , a 12 = a 1 = 0 ; a 11 ρ = 0 a.s. , a 11 = 0 . Thus, in this case (3.16) holds as well. Statement (a) of Theorem 3 is proven.

(b) Now, we rely additionally on the assumption (vi) about vanishing centered third moments. By statement (a), B is nonsingular. We have to show that it has a block-diagonal structure (3.17) B = block-diag ( B 1 , B 2 ) with some matrices B 1 ∈ R 2 × 2 and B 2 ∈ R 3 × 3 , then Σ θ will be block-diagonal as well, with nonsingular blocks: Σ θ = block-diag ( Σ 1 , Σ 2 ) , Σ 1 = B 1 , Σ 2 = V 2 − 1 B 2 ( V 2 − 1 ) T , and statement (b) of Theorem 3 will be proven.

Using assumption (vi), we have: cov ( s μ , s σ w 2 ) = E ( x − μ ) 3 + E δ 3 = 0 ; cov ( s μ , s β 1 ) = cov ( w − μ , β 1 ( w − μ ) 2 ) − cov ( w − μ , ( w − μ ) ( y − μ y ) ) = = − E ( ρ + δ ) 2 ( β 1 ρ + β 1 u + ε ) = − β 1 E ρ 3 = 0 ; cov ( s μ , s σ ε 2 ) = cov ( w − μ , ( y − μ y ) 2 ) − β 1 2 cov ( w − μ , ( w − μ ) 2 ) = = E ( w − μ ) ( y − μ y ) 2 = E ( ρ + δ ) ( β 1 ρ + β 1 u + ε ) 2 = β 1 2 E ρ 3 = 0 ; cov ( s μ y , s σ w 2 ) = β 1 E ρ 3 = 0 ; cov ( s μ y , s β 1 ) = β 1 E ( w − μ ) 2 ( y − μ y ) − E ( w − μ ) ( y − μ y ) 2 = = β 1 2 E ρ 3 − β 1 2 E ρ 3 = 0 ; cov ( s μ y , s σ ε 2 ) = E ( y − μ y ) 3 − β 1 2 E ( w − μ ) 2 ( y − μ y ) = = E ( β 1 ρ + β 1 u + ε ) 3 − β 1 3 E ρ 3 = β 1 3 E u 3 + E ε 3 = 0 . This proves relation (3.17). □

Remark 3.

Theorem 3 is not valid without condition (vii). Indeed, suppose that for some ε 0 ≠ 0, P ( ε = ε 0 ) = P ( ε = − ε 0 ) = 1 2 . If additionally β 1 = 0 then ( y − μ y ) 2 = ε 2 = ε 0 2 a.s., and F 0 : = ( y − μ y ) 2 − ε 0 2 = 0 a.s. Thus, certain nontrivial linear combination of components of vector (3.14) is a constant, hence the block B 2 in (3.17) is singular, and the asymptotic covariance matrix Σ θ is degenerate in this specific case.

4 Simulation study

We simulated test data in order to evaluate the cover probability for the asymptotic confidence interval of the slope parameter, which is constructed based on Theorem 2. Observations in model (1.1) were generated as follows: x ∼ N ( − 1 , 1 ), u ∼ N ( 0 , σ u 2 ) with σ u 2 ∈ { 10 i : i = 1 , … , 15 }, δ ∼ N ( 0 , 1 ), ε ∼ N ( 0 , 1 ), β 1 = 2, β 0 = 1, with the sample size n ∈ { 10 i : i = 1 , … , 10 }. For each collection of model parameters, N = 10 , 000 realizations were generated. For each realization, the slope estimate and the estimate of its asymptotic variance were computed (here, we inserted into (3.2) the estimates of all unknown model parameters). For an ensemble of N realizations, the cover probability was calculated for constructed 95% asymptotic confidence intervals for the slope parameter. We briefly report the obtained results.

Fig. 1.

Cover Probability Plot - Sample Size effect perspective

Fig. 2.

Cover Probability Plot - Berkson effect perspective

Figure 1 shows how the cover probability deviation decreases from 0.95 with increase of the sample size. This effect is stable for different values of the Berkson error variance. Figure 2 illustrates how the cover probability deviation increases from 0.95 with increase of the Berkson error variance. As can be seen in Figure 1, the latter effect is getting weaker with increase of the sample size.

5 Conclusion

We dealt with a linear observation model (1.1) with a mixture of the classical and Berkson errors in the covariate. Surprisingly enough, we constructed consistent estimators for the regression parameters without the knowledge of the variance of the Berkson error. Nevertheless, the size of the Berkson error makes influence on the asymptotic variances of β ˆ 0 and β ˆ 1 .

Then we modified the model to an equivalent centralized form (3.10). This made possible to divide estimators of all unknown model parameters into two asymptotically independent groups.

In future we intend to consider the prediction problem for the model (1.1), like it was done in [6] for various measurement error models. Also it would be interesting to consider a polynomial model with a mixture of the classical and Berkson errors, as well as a version of linear model with a vector response and vector covariate.

References [1]

Carroll, R.J., Ruppert, D., Stefanski, L.A., Crainiceanu, C.M.: Measurement Error in Nonlinear Models: A Modern Perspective, 2nd edn. Monogr. Stat. Appl. Probab., vol. 105, p. 455. Chapman & Hall/CRC, Boca Raton, FL (2006). MR2243417. https://doi.org/10.1201/9781420010138

[2]

Cheng, C.-L., Ness, J.W.V.: Statistical Regression with Measurement Error. Kendall’s Library of Statistics, vol. 6. Arnold, London; co-published by Oxford University Press, New York (1999). MR1719513

[3]

Kukush, O.G., Tsaregorodtsev, Y.V., Shklyar, S.V.: Asymptotically independent estimators in a structural linear model with measurement errors. Ukr. Math. J. 68(11), 1741–1755 (2017). MR3621452. https://doi.org/10.1007/s11253-017-1324-8

[4]

Masiuk, S.V., Kukush, A.G., Shklyar, S.V., Chepurny, M.I., Likhtarov, I.A.: Radiation Risk Estimation: Based on Measurement Error Models, 2nd edn. De Gruyter Series in Mathematics and Life Sciences, vol. 5, p. 238. De Gruyter, Berlin (2017). MR3726857. https://doi.org/10.1515/9783110433661

[5]

Schneeweiss, H., Kukush, A.: Comparing the efficiency of structural and functional methods in measurement error models. Theory Probab. Math. Stat. 80, 131–142 (2010). MR2541958. https://doi.org/10.1090/S0094-9000-2010-00800-3

[6]

Senko, I., Kukush, A.: Prediction in polynomial errors-in-variables models. Mod. Stoch. Theory Appl. 7(2), 203–219 (2020). MR4120615. https://doi.org/10.15559/20-vmsta154