VMSTA Modern Stochastics: Theory and Applications 2351-6054 2351-6046 2351-6046 VTeXMokslininkų g. 2A, 08412 Vilnius, Lithuania VMSTA40CNF 10.15559/15-VMSTA40CNF Research Article Extreme residuals in regression model. Minimax approach IvanovAleksanderalexntuu@gmail.coma MatsakIvanivanmatsak@univ.kiev.uab PolotskiySergiysergiy.polotskiy@gmail.comb National Technical University of Ukraine “Kyiv Polytechnic Institute”, 37 Peremogy Ave., Kyiv, 03056, Ukraine Taras Shevchenko National University of Kyiv, 4e Academician Glushkov Ave., Kyiv, 03127, Ukraine Corresponding author. 2015 510201523PRESTO-2015297308 2682015 2292015 2292015 © 2015 The Author(s). Published by VTeX2015 Open access article under the CC BY license.

We obtain limit theorems for extreme residuals in linear regression model in the case of minimax estimation of parameters.

Linear regression minimax estimator maximal residual 60G70 62J05
Introduction

Consider the model of linear regression yj= i=1qθixji+ϵj,j=1,N, where θ=(θ1,θ2,,θq) is an unknown parameter, ϵj are independent identically distributed (i.i.d.) random variables (r.v.-s) with distribution function (d.f.) F(x) , and X=(xji) is a regression design matrix.

Let θˆ=(θ1ˆ,,θqˆ) be the least squares estimator (LSE) of θ. Introduce the notation yjˆ= i=1qθiˆxji,ϵjˆ=yjyjˆ,j=1,N;ZN=max1jNϵj,ZNˆ=max1jNϵjˆ,ZN=max1jN|ϵj|,ZNˆ=max1jN|ϵjˆ|.

Asymptotic behavior of the r.v.-s ZN , ZN is studied in the theory of extreme values (see classical works by Frechet , Fisher and Tippet , and Gnedenko  and monographs ). In the papers , it was shown that under mild assumptions asymptotic properties of the r.v.-s ZN , ZNˆ , ZN , and ZNˆ are similar in the cases of both finite variance and heavy tails of observation errors ϵj .

In the present paper, we study asymptotic properties of minimax estimator (MME) of θ and maximal absolute residual. For MME, we keep the same notation θˆ .

A random variable θˆ=(θ1ˆ,,θqˆ) is called MME for θ by the observations (1) Δˆ=Δ(θˆ)=minτRqΔ(τ), where Δ(τ)=max1jNyj i=1qτixji.

Denote WN=min1jNϵj and let RN=ZNWN and QN=ZN+WN2 be the range and midrange of the sequence ϵj,j=1,N .

The following statement shows essential difference in the behavior of MME and LSE.

If the model (1) contains a constant term, namely, xj1=1 , j=1,N , then almost surely (a.s.) ΔˆRN2.

If the model (1) has the form yj=θ+ϵj,j=1,N, then a.s. Δˆ=RN2,θˆθ=QN.

From the point (ii) of Statement 1 it follows that MME θˆ is not consistent in the model (4) with some ϵj having all the moments (see Example 2).

The value Δˆ can be represented as a solution of the following linear programming problem (LPP): Δˆ=minΔDΔ,D={(τ,Δ)Rq×R+:yj i=1qτixjiΔ,j=1,N}={(τ,Δ)Rq×R+: i=1qτixji+Δyj, i=1qτixji+Δyj,j=1,N}.

So, the problem (2) of determination of the values Δˆ and θˆ is reduced to solving LPP (5). The LPP can be efficiently solved numerically by the simplex method; see ). Investigation of asymptotic properties of maximal absolute residual Δˆ and MME θˆ is quite difficult in the case of general model (1). However, under additional assumptions on regression experiment design and observation errors ϵj , it is possible to find the limiting distribution of Δˆ , to prove the consistency of MME θˆ , and even estimate the rate of convergence θˆθ , N .

The main theorems

First, we recall briefly some results of extreme value theory. Let r.v.-s (ϵj) have the d.f. F(x) . Assume that for some constants bn>0 0$]]> and an , as n , bn(Znan)Dζ, and ζ has a nondegenerate d.f. G(x)=P(ζ<x) . If assumption (6) holds, then we say that d.f. F belongs to the domain of maximum attraction of the probability distribution G and write FD(G) . If FD(G) , then G must have just one of the following three types of distributions : Type I: Φα(x)=0,x0,exp(xα),α>0,x>0; 0,\hspace{2.5pt}x>0;\end{array}\right.\]]]> Type II: Ψα(x)=exp((x)α),α>0,x0,1,x>0; 0,\hspace{2.5pt}x\le 0,\\{} 1,\hspace{1em}& x>0;\end{array}\right.\]]]> Type III: Λ(x)=exp(ex),<x<. Necessary and sufficient conditions for convergence to each of d.f.-s Φα , Ψα , Λ are also well known. Suppose in the model (1) that: ( ϵj ) are symmetric r.v.-s; ( ϵj ) satisfy relation (6), that is, FD(G) with normalizing constants an and bn , where G is one of the d.f.-s. Φα , Ψα , Λ defined in (7). Assume further that regression experiment design is organized as follows: xj=(xj1,,xjq){v1,v2,,vk},vl=(vl1,,vlq)Rq,vmvl,ml; that is, xj take some fixed values only. Besides, suppose that xj=VlforjIl,l=1,k, card(Il)=n , ImIl= , ml , N=kn is the sample size, V=v11v12v1qv21v22v2qvk1vk2vkq. Under assumptions (A1), (A2), (8), and (9), Δn=bn(Δˆan)DΔ0,n, where Δ0=maxuDL0(u),L0(u)= l=1k(ulζl+ulζl),u=(u1,,uk,u1,,uk),D={u0: l=1k(ulul)vli=0, l=1k(ul+ul)=1,i=1,q}, ζl , ζl , l=1,k , are i.r.v.-s having d.f. G(x) . For a number sequence bn and random sequence (ξn) , we will write ξn=PO(bn1) if supnP(bn|ξn|>C)0asC. C\big)\to 0\hspace{1em}\text{as}\hspace{2.5pt}C\to \infty .\]]]> Assume that kq and there exists square submatrix V˜V of order q V˜=vl11vl1qvlq1vlqq, such that detV˜0. Assume that, under conditions of Theorem 1, kq , assumption (12) holds and bnasn. Then MME θˆ is consistent, and θˆiθi=PO(bn1),i=1,q. Let in the model of simple linear regression yj=θ0+θ1xj+ϵj,j=1,N, xj=v , j=1,N , that is, k=1 and q=2 . Then such a model can be rewritten in the form (4) with θ=θ0+θ1v . Clearly, the parameters θ0 , θ1 cannot be defined unambiguously here. So, it does not make sense to speak about the consistency of MME θˆ when k<q . Consider regression model (4) with errors ϵj having the Laplace density f(x)=12e|x| . For this distribution, the famous von Mises condition is satisfied (, p. 16) for the type III distribution, that is, FD(Λ) . For symmetric FD(Λ), we have limnP{2bnQn<x}=11+ex. The limiting distribution is a logistic one (see , p. 62). Using further well-known formulas for the type Λ (, p. 49) an=F1(11n) and bn=nf(an) , we find an=lnn2 and bn=1 . From Statement 1 it follows now that MME θˆ is not consistent. Thus, condition (13) of Theorem 2 cannot be weakened. The following lemma allows us to check condition (13). Let FD(G) . Then we have: If G=Φα , then xF=sup{x:F(x)<1}=,γn=F1(11n),bn=γn10as n. Thus, (13) does not hold. If G=Ψα , then xF<,1F(xFx)=xαL(x), where L(x) is a slowly varying (s.v.) function at zero, and there exists s.v. at infinity function L1(x) such that bn=(xFγn)1=nαL1(n)as n. So (13) is true. If G=Λ , then bn=r(γn),where r(x)=R(x),R(x)=ln(1F(x)). Clearly, (13) holds if xF=,r(x)as x. Similar results can be found in , Corollary 2.7, pp. 44–45; see also . Set Znl=maxjIlϵj,Wnl=minjIlϵjRnl=ZnlWnl,Qnl=Znl+Wnl2,l=1,k. It turns out that Theorems 1 and 2 can be significantly simplified in the case k=q . Let for the model (1) conditions (8) and (9) be satisfied, k=q , and a matrix V satisfies condition (12). Then we have: Δˆ=12max1lqRnl, θˆiθi=detVQ(i)detV,i=1,q, where the matrix VQ(i) is obtained from V by replacement of the ith column by the column (Qn1,,Qnq)T . If additionally conditions (A1),(A2) are satisfied, then limnP(2bn(Δˆan)<x)=(GG(x))q, where GG(x)=G(xy)dG(y), and for i=1,q , as n , 2bn(θˆiθi)DdetVζ(i)detV, the matrix Vζ(i) is obtained from the V by the replacement of the ith column by the column (ζ1ζ1,,ζqζq)T , where all the r.v.-s ζi,ζi are independent and have d.f. G. Suppose that in the model (1), under assumptions (8), (9), k<q , and there exists a nondegenerate submatrix V˜V of order k. Then Δˆ12max1lkRnla.s. For standard LSE, θiˆθi=PO(n1/2); therefore, if, under the conditions of Theorems 2 and 3, n1/2bnasn, then MME is more efficient than LSE. In  (see also ), it is proved that if FD(Λ) , then for any δ>0 0$]]>, bn=O(nδ) . From this relation and Lemma 1 it follows that (19) is not satisfied for domains of maximum attraction D(Φα) and D(Λα) . In the case of domain D(Ψα) , condition (19) holds for α(0,2) . For example, assume that r.v.-s  (ϵj) are symmetrically distributed on the interval [1,1] and 1F(1h)=hαL(h)ash0,α(0,2), where L(h) is an s.v. function at zero. Then bn=n1/αL1(n) , where L1 is an s.v. at infinity function, and, under the conditions of Theorems 2 and 3, as n , |θiˆθi|=PO((n1/αL1(n))1)=o(n1/2).

The next example also appears to be interesting.

Let (ϵj) be uniformly distributed in [1,1] , that is, F(x)=x+12,x[1,1] . It is well known that FD(Ψ1) , an=1,bn=n2 . Then, under the conditions of Theorem 3, as n , P(n(1Δˆ)<x)1[P{ζ1+ζ2>x}]q=1(1+x)qexp(qx), x\}\big]}^{q}=1-{(1+x)}^{q}\exp (-qx),\]]]> where ζ1,ζ2 are i.i.d. r.v.-s, and P(ζi<x)=1exp(x),x>0 0$]]>. The following corollary is an immediate consequence of the Theorem 3. If for simple linear regression (14), conditions (8) and (9) are satisfied, k=q=2 , and V=1v11v2,v1v2, then we have: Δˆ=12max(Rn1,Rn2),θˆ1θ1=Qn2Qn1v2v1,θˆ0θ0=Qn1v2Qn2v1v2v1; under assumptions (A1) and (A2) , relation (17) holds for q=2 , and, as n , 2bn(θˆ1θ1)Dζ2ζ2ζ1+ζ1v2v1, 2bn(θˆ0θ0)D(ζ1ζ1)v2(ζ2ζ2)v1v2v1, where the r.v.-s ζ1,ζ1,ζ2,ζ2 are independent and have d.f. G. The conditions of Theorem 3 do not require (13). So it describes the asymptotic distribution of θˆ even for nonconsistent MME. Proofs of the main results Let us start with the following elementary lemma, where Zn(t) , Wn(t) , Rn(t) , and Qn(t) are determined by a sequence t={t1,,tn} and are respectively the maximum, minimum, range, and midrange of the sequence t. Let t1,,tn be any real numbers, and αn=minsRmax1jn|tjs|. Then αn=Rn(t)/2 ; moreover, the minimum in (20) is attained at the point s=Qn(t) . Choose s=Qn(t) . Then max1in|tis|=Zn(t)Qn(t)=Qn(t)Wn(t)=12Rn(t). If s=Qn(t)+δ , then, for δ>0 0$]]>, max1in|tis|=sWn(t)=12Rn(t)+δ, and, for δ<0 , max1in|tis|=Zn(t)s=12Rn(t)δ, that is, s=Qn(t) is the point of minimum.  □

We will use Lemma 2: Δˆ=minτRqmax1jN|ϵj i=1q(τiθi)xji|minτ1Rqmax1jN|ϵj(τ1θ1)|=12RN (we put τi=0 , i2 ). The point (ii) of Statement 2 follows directly from Lemma 2.  □

Using the notation d=(d1,,dq),di=τiθi,i=1,q, and taking into account Eq. (1), conditions (8) and (9), we rewrite LPP (5) in the following form: Δˆ=minΔD1Δ,D1={(d,Δ)Rq×R+: i=1qdixji+Δϵj, i=1qdixji+Δϵj,j=1,N}={(d,Δ)Rq×R+: i=1qdivli+ΔZnl, i=1qdivli+ΔWnl,l=1,k}. LPP dual to (21) has the form maxuDLn(u), where Ln(u)=l=1k(ulZnlulWnl) , and the domain D is given by (11).

According to the basic duality theorem (, Chap. 4), Δˆ=maxuDLn(u). Hence, we obtain bn(Δˆan)=maxuDbn(Ln(u)an)=maxuDgn(u),gn(u)= l=1k[ulbn(Znlan)+ulbn(Wnlan)].

Denote by Γ the set of vertices of the domain D and g0(u)= l=1k(ulζl+ulζl). Since the maximum in LPP (22) is attained at one of the vertices Γ , maxuDgn(u)=maxuΓgn(u),n1. Obviously, card(Γ)< . Thus, to prove (10), it suffices to prove that, as n maxuΓgn(u)DmaxuΓg0(u) or (gn(u),uΓ)D(g0(u),uΓ).

The Cramer–Wold argument (see, e.g., §7 of the book ) reduces (23) to the following relation: for any tmR , as n , u(m)Γgn(u(m))tmDu(m)Γg0(u(m))tm. The last convergence holds if for any cl,cl , as n , l=1k[cl(Znlan)+cl(Wnlan)]D l=1k(clζl+clζl).

Under the conditions of Theorem 1, ζnl=bn(Znlan)Dζl,ζnl=bn(Wnlan)Dζl,l=1,k. The vectors (Znl,Wnl) , l=1,k , are independent, and, on the other hand, Znl and Wnl are asymptotically independent as n (, p. 28). To obtain (24), it remains to apply once more the Cramer–Wold argument.  □

Let dˆ=(dˆ1,,dˆq),Δˆ be the solution of LPP (21), and γl=i=1qdˆivli . Then, for any l=1,k , γl+ΔˆZnl,γl+ΔˆWnl. Rewrite the asymptotic relation (25) and (10) in the form Znl=an+ζnlbn,Wnl=an+ζnlbn,ζnlDζl,ζnlDζl, and Δˆ=an+Δnbn,ΔnDΔ0asn. Combining (26)–(28), we obtain, for l=1,k , γlZnlΔˆ=ζnlΔnbn=O(bn1),γlWnl+Δˆ=ζnl+Δnbn=O(bn1). Choose l1,,lq satisfying (12). Then i=1qdˆivlji=γlj=O(bn1),j=1,q, and by Cramer’s rule, θˆiθi=dˆi=detV˜γ(i)detV˜=O(bn1), where the matrix V˜γ(i) is obtained from V˜ by replacement of the ith column by the column (γl1,,γlq)T .  □

(i) We have Δ=minτRqmax1lqmaxjIlyj i=1qτivli=mindRqmax1lqmaxjIlϵj i=1qdivli. By Lemma 2, minsRmaxjIl|ϵjs|=12Rnlass=Qnl,l=1,q. Therefore, the minimum in d is attained in (29) at the point dˆ being the solution of the system of linear equations i=1qdivli=Qnl,l=1,q. Since the matrix V is nonsingular, by Cramer’s rule dˆi=θˆiθi=detVQ(i)detV,i=1,q. Obviously, for such a choice of dˆ , Δ=12max1lqRnl , thats is, we have obtained formulae (15) and (16).

(ii) Using the asymptotic independence of r.v.-s Zn and Wn , we derive the following statement.

If r.v.-s (ϵj) satisfy conditions (A1) , (A2) , then, as n , bn(Rn2an)Dζ+ζ, 2bnQnDζζ, where ζ and ζ are independent r.v.-s and have d.f. G.

In fact, this lemma is contained in Theorem 2.9.2 of the book  (see also Theorem 2.10 in ).

Equality (17) of Theorem 3 follows immediately from relation (30) of Lemma 3.

Similarly, from the asymptotic relation (31 ) and Eq. (16) we obtain (18) applying once more the Cramer–Wold argument.  □

Remark3 follows directly from Theorem 3. Indeed, let k<q , and let there exist a nonsingular submatrix V˜V , V˜=v1i1v1ikvki1vkik. Choosing in LPP (21) from Theorem 1, di=0 for all ii1,i2,ik (i.e., taking τi=θi for such indices i), we pass to the problem (29). It remains to apply Eq. (15) of Theorem 3.

Using the notation ζ¯ζ¯=(ζ1ζ1,,ζqζq)T , the coordinatewise relation (18) of Theorem 3 can be rewritten in the equivalent vector form 2bn(θˆθ)DV1(ζ¯ζ¯)asn. If Varζ=σG2 of r.v. ζ having d.f.G exists, then the covariance matrix of the limiting distribution in (32) is CG=2σG2(VTV)1 .

References Billingsley, P.: Convergence of Probability Measures. Wiley, New York (1968). MR0233396 Ermoliev, Y.M., et al.: Mathematical Methods of Operations Research. Vyshcha Shkola, Kyiv (1979) Fisher, R.A., Tippett, L.H.C: Limiting forms of the frequency distribution of the largest or smallest member of a sample. Proc. Camb. Philos. Soc. 2, 180190 (1928) Galambos, J.: The Asymptotic Theory of Extreme Order Statistics. Wiley, New York (1978). MR0489334 Gnedenko, B.: Sur la distribution limite du terme maximum d’une série aléatoire. Ann. Math. 44, 423453 (1943). MR0008655 Ivanov, A.V., Matsak, I.K.: Limit theorems for extreme residuals in linear and nonlinear regression models. Theory Probab. Math. Stat. 86, 7991 (2013). MR2986451. doi:10.1090/S0094-9000-2013-00890-4 Ivanov, O.V., Matsak, I.K.: Limit theorems for extreme residuals in regression models with heavy tails of observation errors. Theory Probab. Math. Stat. 88, 99108 (2014). MR3112637. doi:10.1090/S0094-9000-2014-00921-7 Leadbetter, M.R., Lindgren, G., Rootzén, H.: Extremes and Related Properties of Random Sequences and Processes. Springer (1983). MR0691492 Matsak, I.K.: Elements of the Theory of Extreme Values. Comprint, Kyiv (2014) Frechet, M.: Sur la loi de probabilité de l’écart maximum. Ann. Soc. Pol. Math. Crac. 6, 93116 (1927) Murtagh, B.A.: Advanced Linear Programming: Computation and Practice. Mcgraw-Hill, New York (1981). MR0609151 Zaychenko, Y.P.: Operations Research. Vyshcha Shkola, Kyiv (1988)