Introduction

VMSTA

Modern Stochastics: Theory and Applications

2351-6054 2351-6046

2351-6046

VTeX

Mokslininkų g. 2A, 08412 Vilnius, Lithuania

VMSTA40CNF

10.15559/15-VMSTA40CNF

Research Article

Extreme residuals in regression model. Minimax approach

Ivanov

Aleksander

alexntuu@gmail.coma∗ Matsak

Ivan

ivanmatsak@univ.kiev.uab Polotskiy

Sergiy

sergiy.polotskiy@gmail.comb aNational Technical University of Ukraine “Kyiv Polytechnic Institute”, 37 Peremogy Ave., Kyiv, 03056, Ukraine bTaras Shevchenko National University of Kyiv, 4e Academician Glushkov Ave., Kyiv, 03127, Ukraine

∗Corresponding author.

2015

5102015

PRESTO-2015

297308 2682015 2292015 2292015

2015

Open access article under the CC BY license.

We obtain limit theorems for extreme residuals in linear regression model in the case of minimax estimation of parameters.

Keywords Linear regression minimax estimator maximal residual

2010 MSC 60G70 62J05

1 Introduction

Consider the model of linear regression (1) yj= ∑i=1qθixji+ϵj,j=1,N‾, where θ=(θ1,θ2,…,θq) is an unknown parameter, ϵj are independent identically distributed (i.i.d.) random variables (r.v.-s) with distribution function (d.f.) F(x) , and X=(xji) is a regression design matrix.

Let θˆ=(θ1ˆ,…,θqˆ) be the least squares estimator (LSE) of θ. Introduce the notation yjˆ= ∑i=1qθiˆxji,ϵjˆ=yj−yjˆ,j=1,N‾;ZN=max1≤j≤Nϵj,ZNˆ=max1≤j≤Nϵjˆ,ZN∗=max1≤j≤N|ϵj|,ZNˆ∗=max1≤j≤N|ϵjˆ|.

Asymptotic behavior of the r.v.-s ZN , ZN∗ is studied in the theory of extreme values (see classical works by Frechet [10], Fisher and Tippet [3], and Gnedenko [5] and monographs [4, 8]). In the papers [6, 7], it was shown that under mild assumptions asymptotic properties of the r.v.-s ZN , ZNˆ , ZN∗ , and ZNˆ∗ are similar in the cases of both finite variance and heavy tails of observation errors ϵj .

In the present paper, we study asymptotic properties of minimax estimator (MME) of θ and maximal absolute residual. For MME, we keep the same notation θˆ .

Definition 1.

A random variable θˆ=(θ1ˆ,…,θqˆ) is called MME for θ by the observations (1) (2) Δˆ=Δ(θˆ)=minτ∈RqΔ(τ), where Δ(τ)=max1≤j≤Nyj− ∑i=1qτixji.

Denote WN=min1≤j≤Nϵj and let RN=ZN−WN and QN=ZN+WN2 be the range and midrange of the sequence ϵj,j=1,N‾ .

The following statement shows essential difference in the behavior of MME and LSE.

Statement 1.

(i)

If the model (1) contains a constant term, namely, xj1=1 , j=1,N‾ , then almost surely (a.s.) (3) Δˆ≤RN2.

(ii)

If the model (1) has the form (4) yj=θ+ϵj,j=1,N‾, then a.s. Δˆ=RN2,θˆ−θ=QN.

Remark 1.

From the point (ii) of Statement 1 it follows that MME θˆ is not consistent in the model (4) with some ϵj having all the moments (see Example 2).

Remark 2.

The value Δˆ can be represented as a solution of the following linear programming problem (LPP): (5) Δˆ=minΔ∈DΔ,D={(τ,Δ)∈Rq×R+:yj− ∑i=1qτixji≤Δ,j=1,N‾}={(τ,Δ)∈Rq×R+: ∑i=1qτixji+Δ≥yj,− ∑i=1qτixji+Δ≥−yj,j=1,N‾}.

So, the problem (2) of determination of the values Δˆ and θˆ is reduced to solving LPP (5). The LPP can be efficiently solved numerically by the simplex method; see [2, 12]). Investigation of asymptotic properties of maximal absolute residual Δˆ and MME θˆ is quite difficult in the case of general model (1). However, under additional assumptions on regression experiment design and observation errors ϵj , it is possible to find the limiting distribution of Δˆ , to prove the consistency of MME θˆ , and even estimate the rate of convergence θˆ→θ , N→∞ .

2 The main theorems

First, we recall briefly some results of extreme value theory. Let r.v.-s (ϵj) have the d.f. F(x) . Assume that for some constants bn>0 0$]]> and an , as n→∞ , (6) bn(Zn−an)⟶Dζ, and ζ has a nondegenerate d.f. G(x)=P(ζ<x) . If assumption (6) holds, then we say that d.f. F belongs to the domain of maximum attraction of the probability distribution G and write F∈D(G) .

If F∈D(G) , then G must have just one of the following three types of distributions [5, 8]:

Type I: Φα(x)=0,x≤0,exp(−x−α),α>0,x>0; 0,\hspace{2.5pt}x>0;\end{array}\right.\]]]>

Type II: Ψα(x)=exp(−(−x)α),α>0,x≤0,1,x>0; 0,\hspace{2.5pt}x\le 0,\\{} 1,\hspace{1em}& x>0;\end{array}\right.\]]]>

Type III: (7) Λ(x)=exp(−e−x),∞<x<∞.

Necessary and sufficient conditions for convergence to each of d.f.-s Φα , Ψα , Λ are also well known.

Suppose in the model (1) that: (A1)

( ϵj ) are symmetric r.v.-s;

(A2)

( ϵj ) satisfy relation (6), that is, F∈D(G) with normalizing constants an and bn , where G is one of the d.f.-s. Φα , Ψα , Λ defined in (7).

Assume further that regression experiment design is organized as follows: (8) xj=(xj1,…,xjq)∈{v1,v2,…,vk},vl=(vl1,…,vlq)∈Rq,vm≠vl,m≠l;

that is, xj

take some fixed values only. Besides, suppose that (9) xj=Vlforj∈Il,l=1,k‾,

card(Il)=n

, Im∩Il=⊘

, m≠l

, N=kn

is the sample size, V=v11v12…v1qv21v22…v2q…………vk1vk2…vkq.

Theorem 1.

Under assumptions (A1), (A2), (8), and (9), (10) Δn=bn(Δˆ−an)→DΔ0,n→∞, where (11) Δ0=maxu∈D∗L0∗(u),L0∗(u)= ∑l=1k(ulζl+ul′ζl′),u=(u1,…,uk,u1′,…,uk′),D∗={u≥0: ∑l=1k(ul−ul′)vli=0, ∑l=1k(ul+ul′)=1,i=1,q‾}, ζl , ζl′ , l=1,k‾ , are i.r.v.-s having d.f. G(x) .

For a number sequence bn→∞ and random sequence (ξn) , we will write ξn=PO(bn−1) if supnP(bn|ξn|>C)→0asC→∞. C\big)\to 0\hspace{1em}\text{as}\hspace{2.5pt}C\to \infty .\]]]>

Assume that k≥q and there exists square submatrix V˜⊂V of order q V˜=vl11…vl1q………vlq1…vlqq, such that (12) detV˜≠0.

Theorem 2.

Assume that, under conditions of Theorem 1, k≥q , assumption (12) holds and (13) bn→∞asn→∞. Then MME θˆ is consistent, and θˆi−θi=PO(bn−1),i=1,q‾.

Example 1.

Let in the model of simple linear regression (14) yj=θ0+θ1xj+ϵj,j=1,N‾, xj=v , j=1,N‾ , that is, k=1 and q=2 .

Then such a model can be rewritten in the form (4) with θ=θ0+θ1v . Clearly, the parameters θ0 , θ1 cannot be defined unambiguously here. So, it does not make sense to speak about the consistency of MME θˆ when k<q .

Example 2.

Consider regression model (4) with errors ϵj having the Laplace density f(x)=12e−|x| . For this distribution, the famous von Mises condition is satisfied ([8], p. 16) for the type III distribution, that is, F∈D(Λ) . For symmetric F∈D(Λ), we have limn→∞P{2bnQn<x}=11+e−x. The limiting distribution is a logistic one (see [9], p. 62). Using further well-known formulas for the type Λ ([9], p. 49) an=F−1(1−1n) and bn=nf(an) , we find an=lnn2 and bn=1 . From Statement 1 it follows now that MME θˆ is not consistent. Thus, condition (13) of Theorem 2 cannot be weakened.

The following lemma allows us to check condition (13).

Lemma 1.

Let F∈D(G) . Then we have: 1.

If G=Φα , then xF=sup{x:F(x)<1}=∞,γn=F−1(1−1n)→∞,bn=γn−1→0as n→∞. Thus, (13) does not hold.

If G=Ψα , then xF<∞,1−F(xF−x)=xαL(x), where L(x) is a slowly varying (s.v.) function at zero, and there exists s.v. at infinity function L1(x) such that bn=(xF−γn)−1=nαL1(n)→∞as n→∞. So (13) is true.

If G=Λ , then bn=r(γn),where r(x)=R′(x),R(x)=−ln(1−F(x)). Clearly, (13) holds if xF=∞,r(x)→∞as x→∞.

Similar results can be found in [9], Corollary 2.7, pp. 44–45; see also [4, 8].

Set Znl=maxj∈Ilϵj,Wnl=minj∈IlϵjRnl=Znl−Wnl,Qnl=Znl+Wnl2,l=1,k‾.

It turns out that Theorems 1 and 2 can be significantly simplified in the case k=q .

Theorem 3.

Let for the model (1) conditions (8) and (9) be satisfied, k=q , and a matrix V satisfies condition (12). Then we have:

(i)

(15) Δˆ=12max1≤l≤qRnl, (16) θˆi−θi=detVQ(i)detV,i=1,q‾, where the matrix VQ(i) is obtained from V by replacement of the ith column by the column (Qn1,…,Qnq)T .

(ii)

If additionally conditions (A1),(A2) are satisfied, then (17) limn→∞P(2bn(Δˆ−an)<x)=(G⋆G(x))q, where G⋆G(x)=∫−∞∞G(x−y)dG(y), and for i=1,q‾ , as n→∞ , (18) 2bn(θˆi−θi)⟶DdetVζ(i)detV, the matrix Vζ(i) is obtained from the V by the replacement of the ith column by the column (ζ1−ζ1′,…,ζq−ζq′)T , where all the r.v.-s ζi,ζi′ are independent and have d.f. G.

Remark 3.

Suppose that in the model (1), under assumptions (8), (9), k<q , and there exists a nondegenerate submatrix V˜⊂V of order k. Then Δˆ≤12max1≤l≤kRnla.s.

Remark 4.

For standard LSE, θiˆ−θi=PO(n−1/2);

therefore, if, under the conditions of Theorems 2 and 3, (19) n−1/2bn→∞asn→∞, then MME is more efficient than LSE.

In [6] (see also [9]), it is proved that if F∈D(Λ) , then for any δ>0 0$]]>, bn=O(nδ) . From this relation and Lemma 1 it follows that (19) is not satisfied for domains of maximum attraction D(Φα) and D(Λα) . In the case of domain D(Ψα) , condition (19) holds for α∈(0,2) . For example, assume that r.v.-s (ϵj) are symmetrically distributed on the interval [−1,1] and 1−F(1−h)=hαL(h)ash↓0,α∈(0,2), where L(h) is an s.v. function at zero. Then bn=n1/αL1(n) , where L1 is an s.v. at infinity function, and, under the conditions of Theorems 2 and 3, as n→∞ , |θiˆ−θi|=PO((n1/αL1(n))−1)=o(n−1/2).

The next example also appears to be interesting.

Example 3.

Let (ϵj) be uniformly distributed in [−1,1] , that is, F(x)=x+12,x∈[−1,1] . It is well known that F∈D(Ψ1) , an=1,bn=n2 . Then, under the conditions of Theorem 3, as n→∞ , P(n(1−Δˆ)<x)→1−[P{ζ1+ζ2>x}]q=1−(1+x)qexp(−qx), x\}\big]}^{q}=1-{(1+x)}^{q}\exp (-qx),\]]]> where ζ1,ζ2 are i.i.d. r.v.-s, and P(ζi<x)=1−exp(−x),x>0 0$]]>.

The following corollary is an immediate consequence of the Theorem 3. Corollary 1.

If for simple linear regression (14), conditions (8) and (9) are satisfied, k=q=2 , and V=1v11v2,v1≠v2, then we have: (i)

Δˆ=12max(Rn1,Rn2),θˆ1−θ1=Qn2−Qn1v2−v1,θˆ0−θ0=Qn1v2−Qn2v1v2−v1;

(ii)

under assumptions (A1) and (A2) , relation (17) holds for q=2 , and, as n→∞ , 2bn(θˆ1−θ1)⟶Dζ2−ζ2′−ζ1+ζ1′v2−v1, 2bn(θˆ0−θ0)⟶D(ζ1−ζ1′)v2−(ζ2−ζ2′)v1v2−v1, where the r.v.-s ζ1,ζ1′,ζ2,ζ2′ are independent and have d.f. G.

Remark 5.

The conditions of Theorem 3 do not require (13). So it describes the asymptotic distribution of θˆ even for nonconsistent MME.

3 Proofs of the main results

Let us start with the following elementary lemma, where Zn(t) , Wn(t) , Rn(t) , and Qn(t) are determined by a sequence t={t1,…,tn} and are respectively the maximum, minimum, range, and midrange of the sequence t.

Lemma 2.

Let t1,…,tn be any real numbers, and (20) αn=mins∈Rmax1≤j≤n|tj−s|. Then αn=Rn(t)/2 ; moreover, the minimum in (20) is attained at the point s=Qn(t) .

Proof.

Choose s=Qn(t) . Then max1≤i≤n|ti−s|=Zn(t)−Qn(t)=Qn(t)−Wn(t)=12Rn(t). If s=Qn(t)+δ , then, for δ>0 0$]]>, max1≤i≤n|ti−s|=s−Wn(t)=12Rn(t)+δ, and, for δ<0 , max1≤i≤n|ti−s|=Zn(t)−s=12Rn(t)−δ, that is, s=Qn(t) is the point of minimum. □

Proof of Statement 1.

We will use Lemma 2: Δˆ=minτ∈Rqmax1≤j≤N|ϵj− ∑i=1q(τi−θi)xji|≤≤minτ1∈Rqmax1≤j≤N|ϵj−(τ1−θ1)|=12RN (we put τi=0 , i≥2 ). The point (ii) of Statement 2 follows directly from Lemma 2. □

Proof of Theorem 1.

Using the notation d=(d1,…,dq),di=τi−θi,i=1,q‾, and taking into account Eq. (1), conditions (8) and (9), we rewrite LPP (5) in the following form: (21) Δˆ=minΔ∈D1Δ,D1={(d,Δ)∈Rq×R+: ∑i=1qdixji+Δ≥ϵj,− ∑i=1qdixji+Δ≥−ϵj,j=1,N‾}={(d,Δ)∈Rq×R+: ∑i=1qdivli+Δ≥Znl,− ∑i=1qdivli+Δ≥−Wnl,l=1,k‾}. LPP dual to (21) has the form (22) maxu∈D∗Ln∗(u), where Ln∗(u)=∑l=1k(ulZnl−ul′Wnl) , and the domain D∗ is given by (11).

According to the basic duality theorem ([11], Chap. 4), Δˆ=maxu∈D∗Ln∗(u). Hence, we obtain bn(Δˆ−an)=maxu∈D∗bn(Ln∗(u)−an)=maxu∈D∗gn(u),gn(u)= ∑l=1k[ulbn(Znl−an)+ul′bn(−Wnl−an)].

Denote by Γ∗ the set of vertices of the domain D∗ and g0(u)= ∑l=1k(ulζl+ul′ζl′). Since the maximum in LPP (22) is attained at one of the vertices Γ∗ , maxu∈D∗gn(u)=maxu∈Γ∗gn(u),n≥1. Obviously, card(Γ∗)<∞ . Thus, to prove (10), it suffices to prove that, as n→∞ maxu∈Γ∗gn(u)⟶Dmaxu∈Γ∗g0(u) or (23) (gn(u),u∈Γ∗)⟶D(g0(u),u∈Γ∗).

The Cramer–Wold argument (see, e.g., §7 of the book [1]) reduces (23) to the following relation: for any tm∈R , as n→∞ , ∑u(m)∈Γ∗gn(u(m))tm⟶D∑u(m)∈Γ∗g0(u(m))tm. The last convergence holds if for any cl,cl′ , as n→∞ , (24) ∑l=1k[cl(Znl−an)+cl′(−Wnl−an)]⟶D ∑l=1k(clζl+cl′ζl′).

Under the conditions of Theorem 1, (25) ζnl=bn(Znl−an)⟶Dζl,ζnl′=bn(−Wnl−an)⟶Dζl′,l=1,k‾. The vectors (Znl,Wnl) , l=1,k‾ , are independent, and, on the other hand, Znl and Wnl are asymptotically independent as n→∞ ([8], p. 28). To obtain (24), it remains to apply once more the Cramer–Wold argument. □

Proof of Theorem 2.

Let dˆ=(dˆ1,…,dˆq),Δˆ be the solution of LPP (21), and γl=∑i=1qdˆivli . Then, for any l=1,k‾ , (26) γl+Δˆ≥Znl,−γl+Δˆ≥−Wnl. Rewrite the asymptotic relation (25) and (10) in the form (27) Znl=an+ζnlbn,−Wnl=an+ζnl′bn,ζnl⟶Dζl,ζnl′⟶Dζl′, and (28) Δˆ=an+Δnbn,Δn⟶DΔ0asn→∞. Combining (26)–(28), we obtain, for l=1,k‾ , γl≥Znl−Δˆ=ζnl−Δnbn=O(bn−1),γl≤Wnl+Δˆ=−ζnl′+Δnbn=O(bn−1). Choose l1,…,lq satisfying (12). Then ∑i=1qdˆivlji=γlj=O(bn−1),j=1,q‾, and by Cramer’s rule, θˆi−θi=dˆi=detV˜γ(i)detV˜=O(bn−1), where the matrix V˜γ(i) is obtained from V˜ by replacement of the ith column by the column (γl1,…,γlq)T . □

Proof of Theorem 3.

(i) We have (29) Δ=minτ∈Rqmax1≤l≤qmaxj∈Ilyj− ∑i=1qτivli=mind∈Rqmax1≤l≤qmaxj∈Ilϵj− ∑i=1qdivli. By Lemma 2, mins∈Rmaxj∈Il|ϵj−s|=12Rnlass=Qnl,l=1,q‾. Therefore, the minimum in d is attained in (29) at the point dˆ being the solution of the system of linear equations ∑i=1qdivli=Qnl,l=1,q‾. Since the matrix V is nonsingular, by Cramer’s rule dˆi=θˆi−θi=detVQ(i)detV,i=1,q‾. Obviously, for such a choice of dˆ , Δ=12max1≤l≤qRnl , thats is, we have obtained formulae (15) and (16).

(ii) Using the asymptotic independence of r.v.-s Zn and Wn , we derive the following statement. Lemma 3.

If r.v.-s (ϵj) satisfy conditions (A1) , (A2) , then, as n→∞ , (30) bn(Rn−2an)⟶Dζ+ζ′, (31) 2bnQn⟶Dζ−ζ′, where ζ and ζ′ are independent r.v.-s and have d.f. G.

In fact, this lemma is contained in Theorem 2.9.2 of the book [4] (see also Theorem 2.10 in [9]).

Equality (17) of Theorem 3 follows immediately from relation (30) of Lemma 3.

Similarly, from the asymptotic relation (31 ) and Eq. (16) we obtain (18) applying once more the Cramer–Wold argument. □

Remark3 follows directly from Theorem 3. Indeed, let k<q , and let there exist a nonsingular submatrix V˜⊂V , V˜=v1i1…v1ik………vki1…vkik. Choosing in LPP (21) from Theorem 1, di=0 for all i≠i1,i2,…ik (i.e., taking τi=θi for such indices i), we pass to the problem (29). It remains to apply Eq. (15) of Theorem 3.

Remark 6.

Using the notation ζ¯−ζ′¯=(ζ1−ζ1′,…,ζq−ζq′)T , the coordinatewise relation (18) of Theorem 3 can be rewritten in the equivalent vector form (32) 2bn(θˆ−θ)⟶DV−1(ζ¯−ζ′¯)asn→∞. If Varζ=σG2 of r.v. ζ having d.f.G exists, then the covariance matrix of the limiting distribution in (32) is CG=2σG2(VTV)−1 .

References [1]

Billingsley, P.: Convergence of Probability Measures. Wiley, New York (1968). MR0233396

[2]

Ermoliev, Y.M., et al.: Mathematical Methods of Operations Research. Vyshcha Shkola, Kyiv (1979)

[3]

Fisher, R.A., Tippett, L.H.C: Limiting forms of the frequency distribution of the largest or smallest member of a sample. Proc. Camb. Philos. Soc. 2, 180–190 (1928)

[4]

Galambos, J.: The Asymptotic Theory of Extreme Order Statistics. Wiley, New York (1978). MR0489334

[5]

Gnedenko, B.: Sur la distribution limite du terme maximum d’une série aléatoire. Ann. Math. 44, 423–453 (1943). MR0008655

[6]

Ivanov, A.V., Matsak, I.K.: Limit theorems for extreme residuals in linear and nonlinear regression models. Theory Probab. Math. Stat. 86, 79–91 (2013). MR2986451. doi:10.1090/S0094-9000-2013-00890-4

[7]

Ivanov, O.V., Matsak, I.K.: Limit theorems for extreme residuals in regression models with heavy tails of observation errors. Theory Probab. Math. Stat. 88, 99–108 (2014). MR3112637. doi:10.1090/S0094-9000-2014-00921-7

[8]

Leadbetter, M.R., Lindgren, G., Rootzén, H.: Extremes and Related Properties of Random Sequences and Processes. Springer (1983). MR0691492

[9]

Matsak, I.K.: Elements of the Theory of Extreme Values. Comprint, Kyiv (2014)

[10]

Frechet, M.: Sur la loi de probabilité de l’écart maximum. Ann. Soc. Pol. Math. Crac. 6, 93–116 (1927)

[11]

Murtagh, B.A.: Advanced Linear Programming: Computation and Practice. Mcgraw-Hill, New York (1981). MR0609151

[12]

Zaychenko, Y.P.: Operations Research. Vyshcha Shkola, Kyiv (1988)