We obtain limit theorems for extreme residuals in linear regression model in the case of minimax estimation of parameters.
Linear regressionminimax estimatormaximal residual60G7062J05Introduction
Consider the model of linear regression
yj=∑i=1qθixji+ϵj,j=1,N‾,
where θ=(θ1,θ2,…,θq) is an unknown parameter, ϵj are independent identically distributed (i.i.d.) random variables (r.v.-s) with distribution function (d.f.) F(x), and X=(xji) is a regression design matrix.
Let θˆ=(θ1ˆ,…,θqˆ) be the least squares estimator (LSE) of θ. Introduce the notation
yjˆ=∑i=1qθiˆxji,ϵjˆ=yj−yjˆ,j=1,N‾;ZN=max1≤j≤Nϵj,ZNˆ=max1≤j≤Nϵjˆ,ZN∗=max1≤j≤N|ϵj|,ZNˆ∗=max1≤j≤N|ϵjˆ|.
Asymptotic behavior of the r.v.-s ZN, ZN∗ is studied in the theory of extreme values (see classical works by Frechet [10], Fisher and Tippet [3], and Gnedenko [5] and monographs [4, 8]). In the papers [6, 7], it was shown that under mild assumptions asymptotic properties of the r.v.-s ZN, ZNˆ, ZN∗, and ZNˆ∗ are similar in the cases of both finite variance and heavy tails of observation errors ϵj.
In the present paper, we study asymptotic properties of minimax estimator (MME) of θ and maximal absolute residual. For MME, we keep the same notation θˆ.
A random variable θˆ=(θ1ˆ,…,θqˆ) is called MME for θ by the observations (1)
Δˆ=Δ(θˆ)=minτ∈RqΔ(τ),
where
Δ(τ)=max1≤j≤Nyj−∑i=1qτixji.
Denote WN=min1≤j≤Nϵj and let RN=ZN−WN and QN=ZN+WN2 be the range and midrange of the sequence ϵj,j=1,N‾.
The following statement shows essential difference in the behavior of MME and LSE.
If the model (1) contains a constant term, namely,xj1=1,j=1,N‾, then almost surely (a.s.)Δˆ≤RN2.
If the model (1) has the formyj=θ+ϵj,j=1,N‾,then a.s.Δˆ=RN2,θˆ−θ=QN.
From the point (ii) of Statement 1 it follows that MME θˆ is not consistent in the model (4) with some ϵj having all the moments (see Example 2).
The value Δˆ can be represented as a solution of the following linear programming problem (LPP):
Δˆ=minΔ∈DΔ,D={(τ,Δ)∈Rq×R+:yj−∑i=1qτixji≤Δ,j=1,N‾}={(τ,Δ)∈Rq×R+:∑i=1qτixji+Δ≥yj,−∑i=1qτixji+Δ≥−yj,j=1,N‾}.
So, the problem (2) of determination of the values Δˆ and θˆ is reduced to solving LPP (5). The LPP can be efficiently solved numerically by the simplex method; see [2, 12]). Investigation of asymptotic properties of maximal absolute residual Δˆ and MME θˆ is quite difficult in the case of general model (1). However, under additional assumptions on regression experiment design and observation errors ϵj, it is possible to find the limiting distribution of Δˆ, to prove the consistency of MME θˆ, and even estimate the rate of convergence θˆ→θ, N→∞.
The main theorems
First, we recall briefly some results of extreme value theory. Let r.v.-s (ϵj) have the d.f. F(x). Assume that for some constants bn>00$]]> and an, as n→∞,
bn(Zn−an)⟶Dζ,
and ζ has a nondegenerate d.f. G(x)=P(ζ<x). If assumption (6) holds, then we say that d.f. F belongs to the domain of maximum attraction of the probability distribution G and write F∈D(G).
IfF∈D(G), then G must have just one of the following three types of distributions [5, 8]:
Type I:Φα(x)=0,x≤0,exp(−x−α),α>0,x>0;0,\hspace{2.5pt}x>0;\end{array}\right.\]]]>
Type II:Ψα(x)=exp(−(−x)α),α>0,x≤0,1,x>0;0,\hspace{2.5pt}x\le 0,\\{} 1,\hspace{1em}& x>0;\end{array}\right.\]]]>
Type III:Λ(x)=exp(−e−x),∞<x<∞.
Necessary and sufficient conditions for convergence to each of d.f.-s Φα, Ψα, Λ are also well known.
Suppose in the model (1) that:
(ϵj) are symmetric r.v.-s;
(ϵj) satisfy relation (6), that is, F∈D(G) with normalizing constants an and bn, where G is one of the d.f.-s. Φα, Ψα, Λ defined in (7).
Assume further that regression experiment design is organized as follows:
xj=(xj1,…,xjq)∈{v1,v2,…,vk},vl=(vl1,…,vlq)∈Rq,vm≠vl,m≠l;
that is, xj take some fixed values only. Besides, suppose that
xj=Vlforj∈Il,l=1,k‾,card(Il)=n, Im∩Il=⊘, m≠l, N=kn is the sample size,
V=v11v12…v1qv21v22…v2q…………vk1vk2…vkq.
Under assumptions (A1), (A2), (8), and (9),Δn=bn(Δˆ−an)→DΔ0,n→∞,whereΔ0=maxu∈D∗L0∗(u),L0∗(u)=∑l=1k(ulζl+ul′ζl′),u=(u1,…,uk,u1′,…,uk′),D∗={u≥0:∑l=1k(ul−ul′)vli=0,∑l=1k(ul+ul′)=1,i=1,q‾},ζl,ζl′,l=1,k‾, are i.r.v.-s having d.f.G(x).
For a number sequence bn→∞ and random sequence (ξn), we will write ξn=PO(bn−1) if
supnP(bn|ξn|>C)→0asC→∞.C\big)\to 0\hspace{1em}\text{as}\hspace{2.5pt}C\to \infty .\]]]>
Assume that k≥q and there exists square submatrix V˜⊂V of order qV˜=vl11…vl1q………vlq1…vlqq,
such that
detV˜≠0.
Assume that, under conditions of Theorem1,k≥q, assumption (12) holds andbn→∞asn→∞.Then MMEθˆis consistent, andθˆi−θi=PO(bn−1),i=1,q‾.
Let in the model of simple linear regression
yj=θ0+θ1xj+ϵj,j=1,N‾,xj=v, j=1,N‾, that is, k=1 and q=2.
Then such a model can be rewritten in the form (4) with θ=θ0+θ1v. Clearly, the parameters θ0, θ1 cannot be defined unambiguously here. So, it does not make sense to speak about the consistency of MME θˆ when k<q.
Consider regression model (4) with errors ϵj having the Laplace density f(x)=12e−|x|. For this distribution, the famous von Mises condition is satisfied ([8], p. 16) for the type III distribution, that is, F∈D(Λ). For symmetric F∈D(Λ), we have
limn→∞P{2bnQn<x}=11+e−x.
The limiting distribution is a logistic one (see [9], p. 62). Using further well-known formulas for the type Λ ([9], p. 49) an=F−1(1−1n) and bn=nf(an), we find an=lnn2 and bn=1. From Statement 1 it follows now that MME θˆ is not consistent. Thus, condition (13) of Theorem 2 cannot be weakened.
The following lemma allows us to check condition (13).
LetF∈D(G). Then we have:
IfG=Φα, thenxF=sup{x:F(x)<1}=∞,γn=F−1(1−1n)→∞,bn=γn−1→0as n→∞.Thus, (13) does not hold.
IfG=Ψα, thenxF<∞,1−F(xF−x)=xαL(x),whereL(x)is a slowly varying (s.v.) function at zero, and there exists s.v. at infinity functionL1(x)such thatbn=(xF−γn)−1=nαL1(n)→∞as n→∞.So (13) is true.
Similar results can be found in [9], Corollary 2.7, pp. 44–45; see also [4, 8].
Set
Znl=maxj∈Ilϵj,Wnl=minj∈IlϵjRnl=Znl−Wnl,Qnl=Znl+Wnl2,l=1,k‾.
It turns out that Theorems 1 and 2 can be significantly simplified in the case k=q.
Let for the model (1) conditions (8) and (9) be satisfied,k=q, and a matrix V satisfies condition (12). Then we have:
Δˆ=12max1≤l≤qRnl,θˆi−θi=detVQ(i)detV,i=1,q‾,where the matrixVQ(i)is obtained from V by replacement of the ith column by the column(Qn1,…,Qnq)T.
If additionally conditions(A1),(A2)are satisfied, thenlimn→∞P(2bn(Δˆ−an)<x)=(G⋆G(x))q,whereG⋆G(x)=∫−∞∞G(x−y)dG(y),and fori=1,q‾, asn→∞,2bn(θˆi−θi)⟶DdetVζ(i)detV,the matrixVζ(i)is obtained from the V by the replacement of the ith column by the column(ζ1−ζ1′,…,ζq−ζq′)T, where all the r.v.-sζi,ζi′are independent and have d.f. G.
Suppose that in the model (1), under assumptions (8), (9), k<q, and there exists a nondegenerate submatrix V˜⊂V of order k. Then
Δˆ≤12max1≤l≤kRnla.s.
For standard LSE,
θiˆ−θi=PO(n−1/2);
therefore, if, under the conditions of Theorems 2 and 3,
n−1/2bn→∞asn→∞,
then MME is more efficient than LSE.
In [6] (see also [9]), it is proved that if F∈D(Λ), then for any δ>00$]]>, bn=O(nδ). From this relation and Lemma 1 it follows that (19) is not satisfied for domains of maximum attraction D(Φα) and D(Λα). In the case of domain D(Ψα), condition (19) holds for α∈(0,2). For example, assume that r.v.-s (ϵj) are symmetrically distributed on the interval [−1,1] and
1−F(1−h)=hαL(h)ash↓0,α∈(0,2),
where L(h) is an s.v. function at zero. Then bn=n1/αL1(n), where L1 is an s.v. at infinity function, and, under the conditions of Theorems 2 and 3, as n→∞,
|θiˆ−θi|=PO((n1/αL1(n))−1)=o(n−1/2).
The next example also appears to be interesting.
Let (ϵj) be uniformly distributed in [−1,1], that is, F(x)=x+12,x∈[−1,1]. It is well known that F∈D(Ψ1), an=1,bn=n2. Then, under the conditions of Theorem 3, as n→∞,
P(n(1−Δˆ)<x)→1−[P{ζ1+ζ2>x}]q=1−(1+x)qexp(−qx),x\}\big]}^{q}=1-{(1+x)}^{q}\exp (-qx),\]]]>
where ζ1,ζ2 are i.i.d. r.v.-s, and P(ζi<x)=1−exp(−x),x>00$]]>.
The following corollary is an immediate consequence of the Theorem 3.
If for simple linear regression (14), conditions (8) and (9) are satisfied,k=q=2, andV=1v11v2,v1≠v2,then we have:
under assumptions(A1)and(A2), relation (17) holds forq=2, and, asn→∞,2bn(θˆ1−θ1)⟶Dζ2−ζ2′−ζ1+ζ1′v2−v1,2bn(θˆ0−θ0)⟶D(ζ1−ζ1′)v2−(ζ2−ζ2′)v1v2−v1,where the r.v.-sζ1,ζ1′,ζ2,ζ2′are independent and have d.f. G.
The conditions of Theorem 3 do not require (13). So it describes the asymptotic distribution of θˆ even for nonconsistent MME.
Proofs of the main results
Let us start with the following elementary lemma, where Zn(t), Wn(t), Rn(t), and Qn(t) are determined by a sequence t={t1,…,tn} and are respectively the maximum, minimum, range, and midrange of the sequence t.
Lett1,…,tnbe any real numbers, andαn=mins∈Rmax1≤j≤n|tj−s|.Thenαn=Rn(t)/2; moreover, the minimum in (20) is attained at the points=Qn(t).
Choose s=Qn(t). Then
max1≤i≤n|ti−s|=Zn(t)−Qn(t)=Qn(t)−Wn(t)=12Rn(t).
If s=Qn(t)+δ, then, for δ>00$]]>,
max1≤i≤n|ti−s|=s−Wn(t)=12Rn(t)+δ,
and, for δ<0,
max1≤i≤n|ti−s|=Zn(t)−s=12Rn(t)−δ,
that is, s=Qn(t) is the point of minimum. □
We will use Lemma 2:
Δˆ=minτ∈Rqmax1≤j≤N|ϵj−∑i=1q(τi−θi)xji|≤≤minτ1∈Rqmax1≤j≤N|ϵj−(τ1−θ1)|=12RN
(we put τi=0, i≥2). The point (ii) of Statement 2 follows directly from Lemma 2. □
Using the notation
d=(d1,…,dq),di=τi−θi,i=1,q‾,
and taking into account Eq. (1), conditions (8) and (9), we rewrite LPP (5) in the following form:
Δˆ=minΔ∈D1Δ,D1={(d,Δ)∈Rq×R+:∑i=1qdixji+Δ≥ϵj,−∑i=1qdixji+Δ≥−ϵj,j=1,N‾}={(d,Δ)∈Rq×R+:∑i=1qdivli+Δ≥Znl,−∑i=1qdivli+Δ≥−Wnl,l=1,k‾}.
LPP dual to (21) has the form
maxu∈D∗Ln∗(u),
where Ln∗(u)=∑l=1k(ulZnl−ul′Wnl), and the domain D∗ is given by (11).
According to the basic duality theorem ([11], Chap. 4),
Δˆ=maxu∈D∗Ln∗(u).
Hence, we obtain
bn(Δˆ−an)=maxu∈D∗bn(Ln∗(u)−an)=maxu∈D∗gn(u),gn(u)=∑l=1k[ulbn(Znl−an)+ul′bn(−Wnl−an)].
Denote by Γ∗ the set of vertices of the domain D∗ and
g0(u)=∑l=1k(ulζl+ul′ζl′).
Since the maximum in LPP (22) is attained at one of the vertices Γ∗,
maxu∈D∗gn(u)=maxu∈Γ∗gn(u),n≥1.
Obviously, card(Γ∗)<∞. Thus, to prove (10), it suffices to prove that, as n→∞maxu∈Γ∗gn(u)⟶Dmaxu∈Γ∗g0(u)
or
(gn(u),u∈Γ∗)⟶D(g0(u),u∈Γ∗).
The Cramer–Wold argument (see, e.g., §7 of the book [1]) reduces (23) to the following relation: for any tm∈R , as n→∞,
∑u(m)∈Γ∗gn(u(m))tm⟶D∑u(m)∈Γ∗g0(u(m))tm.
The last convergence holds if for any cl,cl′, as n→∞,
∑l=1k[cl(Znl−an)+cl′(−Wnl−an)]⟶D∑l=1k(clζl+cl′ζl′).
Under the conditions of Theorem 1,
ζnl=bn(Znl−an)⟶Dζl,ζnl′=bn(−Wnl−an)⟶Dζl′,l=1,k‾.
The vectors (Znl,Wnl), l=1,k‾, are independent, and, on the other hand, Znl and Wnl are asymptotically independent as n→∞ ([8], p. 28). To obtain (24), it remains to apply once more the Cramer–Wold argument. □
Let dˆ=(dˆ1,…,dˆq),Δˆ be the solution of LPP (21), and γl=∑i=1qdˆivli. Then, for any l=1,k‾,
γl+Δˆ≥Znl,−γl+Δˆ≥−Wnl.
Rewrite the asymptotic relation (25) and (10) in the form
Znl=an+ζnlbn,−Wnl=an+ζnl′bn,ζnl⟶Dζl,ζnl′⟶Dζl′,
and
Δˆ=an+Δnbn,Δn⟶DΔ0asn→∞.
Combining (26)–(28), we obtain, for l=1,k‾,
γl≥Znl−Δˆ=ζnl−Δnbn=O(bn−1),γl≤Wnl+Δˆ=−ζnl′+Δnbn=O(bn−1).
Choose l1,…,lq satisfying (12). Then
∑i=1qdˆivlji=γlj=O(bn−1),j=1,q‾,
and by Cramer’s rule,
θˆi−θi=dˆi=detV˜γ(i)detV˜=O(bn−1),
where the matrix V˜γ(i) is obtained from V˜ by replacement of the ith column by the column (γl1,…,γlq)T. □
(i) We have
Δ=minτ∈Rqmax1≤l≤qmaxj∈Ilyj−∑i=1qτivli=mind∈Rqmax1≤l≤qmaxj∈Ilϵj−∑i=1qdivli.
By Lemma 2,
mins∈Rmaxj∈Il|ϵj−s|=12Rnlass=Qnl,l=1,q‾.
Therefore, the minimum in d is attained in (29) at the point dˆ being the solution of the system of linear equations
∑i=1qdivli=Qnl,l=1,q‾.
Since the matrix V is nonsingular, by Cramer’s rule
dˆi=θˆi−θi=detVQ(i)detV,i=1,q‾.
Obviously, for such a choice of dˆ, Δ=12max1≤l≤qRnl, thats is, we have obtained formulae (15) and (16).
(ii) Using the asymptotic independence of r.v.-s Zn and Wn, we derive the following statement.
If r.v.-s(ϵj)satisfy conditions(A1),(A2), then, asn→∞,bn(Rn−2an)⟶Dζ+ζ′,2bnQn⟶Dζ−ζ′,where ζ andζ′are independent r.v.-s and have d.f. G.
In fact, this lemma is contained in Theorem 2.9.2 of the book [4] (see also Theorem 2.10 in [9]).
Equality (17) of Theorem 3 follows immediately from relation (30) of Lemma 3.
Similarly, from the asymptotic relation (31 ) and Eq. (16) we obtain (18) applying once more the Cramer–Wold argument. □
Remark3 follows directly from Theorem 3. Indeed, let k<q, and let there exist a nonsingular submatrix V˜⊂V,
V˜=v1i1…v1ik………vki1…vkik.
Choosing in LPP (21) from Theorem 1, di=0 for all i≠i1,i2,…ik (i.e., taking τi=θi for such indices i), we pass to the problem (29). It remains to apply Eq. (15) of Theorem 3.
Using the notation ζ¯−ζ′¯=(ζ1−ζ1′,…,ζq−ζq′)T, the coordinatewise relation (18) of Theorem 3 can be rewritten in the equivalent vector form
2bn(θˆ−θ)⟶DV−1(ζ¯−ζ′¯)asn→∞.
If Varζ=σG2 of r.v. ζ having d.f.G exists, then the covariance matrix of the limiting distribution in (32) is CG=2σG2(VTV)−1.
ReferencesBillingsley, P.: . Wiley, New York (1968). MR0233396Ermoliev, Y.M., et al.: . Vyshcha Shkola, Kyiv (1979) Fisher, R.A., Tippett, L.H.C: Limiting forms of the frequency distribution of the largest or smallest member of a sample. 2, 180–190 (1928) Galambos, J.: . Wiley, New York (1978). MR0489334Gnedenko, B.: Sur la distribution limite du terme maximum d’une série aléatoire. 44, 423–453 (1943). MR0008655Ivanov, A.V., Matsak, I.K.: Limit theorems for extreme residuals in linear and nonlinear regression models. 86, 79–91 (2013). MR2986451. doi:10.1090/S0094-9000-2013-00890-4Ivanov, O.V., Matsak, I.K.: Limit theorems for extreme residuals in regression models with heavy tails of observation errors. 88, 99–108 (2014). MR3112637. doi:10.1090/S0094-9000-2014-00921-7Leadbetter, M.R., Lindgren, G., Rootzén, H.: . Springer (1983). MR0691492Matsak, I.K.: . Comprint, Kyiv (2014) Frechet, M.: Sur la loi de probabilité de l’écart maximum. 6, 93–116 (1927) Murtagh, B.A.: . Mcgraw-Hill, New York (1981). MR0609151Zaychenko, Y.P.: . Vyshcha Shkola, Kyiv (1988)