Introduction

VMSTA

Modern Stochastics: Theory and Applications

2351-6054 2351-6046

2351-6046

VTeX

Mokslininkų g. 2A, 08412 Vilnius, Lithuania

VMSTA47

10.15559/16-VMSTA47

Research Article

Equivariant adjusted least squares estimator in two-line fitting model

Shklyar

Sergiy

shklyar@univ.kiev.ua Taras Shevchenko National University of Kyiv, Ukraine

2016

2132016

311945 3012016 1922016 1922016

2016

Open access article under the CC BY license.

We consider the two-line fitting problem. True points lie on two straight lines and are observed with Gaussian perturbations. For each observed point, it is not known on which line the corresponding true point lies. The parameters of the lines are estimated.

This model is a restriction of the conic section fitting model because a couple of two lines is a degenerate conic section. The following estimators are constructed: two projections of the adjusted least squares estimator in the conic section fitting model, orthogonal regression estimator, parametric maximum likelihood estimator in the Gaussian model, and regular best asymptotically normal moment estimator.

The conditions for the consistency and asymptotic normality of the projections of the adjusted least squares estimator are provided. All the estimators constructed in the paper are equivariant. The estimators are compared numerically.

Keywords Conic section fitting curve fitting subspace clustering

2010 MSC 62J05 62H12 62H30

1 Introduction 1.1 Two-line fitting model

Consider a problem of estimation of two lines by perturbed observations of points that lie on the lines. Let the true points (ξi,ηi) lie on the union of two different lines η=k1ξ+h1 and η=k2ξ+h2 , that is, (1) [eitherηi=k1ξi+h1,orηi=k2ξi+h2,i=1,2,…. Let these points be observed with perturbations (δi,εi) , i=1,…,n , that is, the observed points are (xi,yi) , i=1,…,n , with (2) xi=ξi+δi, (3) yi=ηi+εi. The perturbations are assumed to be independent and identically normally distributed, (4) δiεi∼N(0,σ2I), where I is the 2×2 identity matrix.

The parameters k1 , h1 , k2 , h2 , and σ2 are to be estimated.

We consider both functional and structural models. In functional model, the true points are assumed to be nonrandom. In structural model, the true points are assumed to be independent and identically distributed (i.i.d.). The errors (δi,εi) are i.i.d. and independent of the true points.

In the structural model, (ξi,ηi,δi,εi) are i.i.d. random vectors, and thus, the observed points (xi,yi) are i.i.d. In the functional model, the observed points are independent, Gaussian, with different means but with common covariance matrix. Remark 1.

The true lines defined by Eqs. (1) cannot be parallel to the y-axis. In order to avoid overflows during evaluation of the estimators (except of RBAN-moment estimator), another parameterization is used internally: τ→⊤(ζ−ζ0)=0 , where τ→ is a unit vector orthogonal to the line, and ζ0 is a point on the line. The computation of the RBAN-moment estimator (see Section 2.4) is implemented for explicit parameterization only. Computational optimization of the RBAN-moment estimator is a matter of further work.

The explicit parameterization has the advantage that the number of parameters is equal to the dimension of parameter space. (In [12], the second-order equation (5) has six unknown coefficients, but the conic section can be parameterized with five parameters. The parameter space for the parameters of the conic section was the five-dimensional unit sphere in the six-dimensional Euclidean space. Mismatch between the number of parameters and the dimension of the parameter space made the asymptotic covariance matrix of the estimator singular.)

In simulations, the confidence intervals for the coordinates of the intersection point of the two lines are obtained based on the asymptotic covariance matrix for the intersection point. For the projections the ALS2 estimator, that asymptotic covariance matrix can be evaluated without use of explicit line parameterization.

1.2 Conic section fitting model

Let the true points (ξi,ηi) lie on the second-order algebraic curve (5) Aξi2+2Bξiηi+Cηi2+2Dξi+2Eηi+F=0,i=1,2,…. Hereafter, a second-order algebraic curve is called a “conic section” or a “conic.”

The points are observed with Gaussian perturbations, and the perturbed points are denoted as (xi,yi) . We have the same equations xi=ξi+δi,yi=ηi+εi,δiεi∼N(0,σ2I), as (2)–(4) in the two-line fitting model.

The vector of coefficients in (5) is denoted by β=(A,2B,C,2D,2E,F)⊤ . The nonzero vector β and the error variance σ2 are the parameters of interest.

Similarly to the two-line fitting model, the functional and the structural models are distinguished.

A couple of lines is a degenerate case of a conic section. Therefore, the conic section fitting model is an extension of the two-line fitting model.

1.3 ALS2 estimator in conic section fitting model

We consider the adjusted least squares (ALS) estimator for unknown σ2 . The estimator is constructed in [7]. Introduce the 6×6 symmetric matrix ψ(x,y;v)=x4−6x2v+3v2(x3−3xv)y(x2−v)(y2−v)∗∗∗(x3−3xv)y(x2−v)(y2−v)x(y3−3yv)∗∗∗(x2−v)(y2−v)x(y3−3yv)y4−6y2v+3v2∗∗∗x3−3xv(x2−v)yx(y2−v)x2−vxyx(x2−v)yx(y2−v)y3−3yvxyy2−vyx2−vxyy2−vxy1. Asterisks are typed instead of some entries above the diagonal of a symmetric matrix. The entries of the matrix ψ(x,y;v) are generalized Hermite polynomials in x and y. The matrix ψ(x,y;v) is constructed such that Eψ(xi,yi;σ2)=ψ(ξi,ηi;0) in the functional model and ψ(ξi,ηi;0)β=0 for the true points and true parameters.

Denote Ψn(v)= ∑i=1nψ(xi,yi;v).

The estimator σˆ2 of the error variance σ2 is obtained from the equation (6) λmin(Ψn(σˆ2))=0.

Equation (6) always has a unique nonnegative solution. If n≥6 , then the solution to (6) is positive almost surely.

The matrix Ψn(σˆ2) is singular. Define the estimator βˆ of the vector β as a nonzero solution to the equation Ψn(σˆ2)βˆ=0.

The strong consistency of the ALS2 estimator is proved in [7] and [11] under somewhat different conditions. The asymptotic normality is proved in [12] for the functional model and in [13] for the structural model. Two consistent estimators of the asymptotic covariance matrix are constructed in [13].

Denote ψv′(x,y;v)=∂∂vψ(x,y;v),Ψ1″=∂2∂v2ψ(x,y;v),Ψ‾n′(v)=ddvΨn(v),Ψ‾n= ∑i=1nψ(ξi,ηi;0),Ψ‾n′= ∑i=1nψv′(ξi,ηi;0),Ψ‾∞=limn→∞1nΨ‾n=limn→∞1nΨn(σ2),Ψ‾∞′=limn→∞1nΨ‾n′=limn→∞1nΨn′(σ2). Under the conditions of Proposition 1 stated further, the latter limits exist almost surely. See [12] for explicit expressions of the matrices ψv′(x,y;v) , Ψ1″ , Ψ‾∞ , and Ψ‾∞′ . Note that Ψ1″ is a constant matrix.

Proposition 1.

In the functional model, for all integer p≥0 and q≥0 such that p+q≤4 , let the following limits exist and be finite: limn→∞1n ∑i=1nξipηjq=:μp,q, whereas in the structural model, let Eξ14<∞ and Eη14<∞ . In both models, let rankΨ‾∞=5 . Then: 1.

The estimator βˆ is strongly consistent in the following sense: (7) min(‖βˆ‖βˆ‖−β‖β‖‖,‖βˆ‖βˆ‖+β‖β‖‖)→0a.s., (8) σˆ2→σ2a.s.

β⊤Ψ‾∞′β<0 .

Eventually, βˆ⊤Ψn′(σˆ2)βˆ<0 .

“Eventually” in the previous statement means that almost surely there exists n0 such that βˆ⊤Ψn′(σˆ2)βˆ<0 for all n≥n0 . In other words, almost surely, βˆ⊤Ψn′(σˆ2)βˆ≥0 holds only for finitely many n.

Denote the normalized version of the true parameter βtn=−1β⊤Ψ‾∞′ββ.

Normalize the estimator of β in such a way that β˜⊤Ψn′(σˆ2)β˜=−n and β⊤β˜≥0 . Therefore, denote (9) β˜=−nβˆ⊤Ψn′(σˆ2)βˆβˆif β⊤βˆ≥0,−−nβˆ⊤Ψn′(σˆ2)βˆβˆif β⊤βˆ<0. Proposition 2.

1. Under the conditions of Proposition 1, the estimator β˜ is a strongly consistent estimator of βtn=(−β⊤Ψ‾∞′β)−1/2β , that is, β˜→βtn a.s.

2. In the functional model, for all integer p≥0 and q≥0 such that p+q≤6 , let the following limits exist and be finite: limn→∞1n ∑i=1nξipηjq=:μp,q, whereas in the structural model, let Eξ16<∞ and Eη16<∞ . In both models, let rankΨ‾∞=5 . Then the estimator θˆ=(β˜⊤,σˆ2)⊤ is asymptotically normal in the following sense: (10) nβ˜−βtnσˆ2−σ2⟶dN(0,Σθˆ), where Σθˆ=Ψ‾∞Ψ‾∞′βtnβtn⊤Ψ‾∞′12βtn⊤Ψ1″βtn−1BΨ‾∞Ψ‾∞′βtnβtn⊤Ψ‾∞′12βtn⊤Ψ1″βtn−1,B=limn→∞1n ∑i=1nEsisi⊤,si=ψ(xi,yi;σ2)βtn12βtn⊤ψ(xi,yi;σ2)βtn−12.

3. Under the conditions of part 2 of Proposition 2, the following estimator of the asymptotic covariance matrix is consistent: Σˆθˆ(n)=A−1(n)B(n)A−1(n),A(n)=1nΨn(σˆ2)1nΨn′(σˆ2)β˜1nβ˜⊤Ψn′(σˆ2)12β˜⊤Ψ1″β˜,B(n)=1n ∑i=1nsˆisˆi⊤,sˆi=ψ(xi,yi;σˆ2)β˜12β˜⊤ψ(xi,yi;σˆ2)β˜−12, that is, Σˆθˆ(n)→Σθˆ in probability.

1.4 Estimation methods

The methods of fitting an algebraic curve (or surface) to observed points can be classified as follows.

Algebraic distance methods, where the residuals in the equations for the algebraic curve are minimized. For example, the minimum point of the sum of squared residuals ∑i=1n(Axi2+2Bxiyi+Cyi2+2Dxi+2Eyi+F)2 (with some normalizing constraint in order to avoid A=B=…=F=0 ) in the conic fitting problem and ∑i=1n(k1xi+h1−yi)2(k2xi+h2−yi)2 in the two-line fitting problem is called the ordinary least squares (OLS) estimator.

The criterion function for the OLS estimator is simple enough and can be adjusted so that the resulting estimator is consistent (under some conditions). Such an estimator is called the adjusted least squares (ALS) estimator. The OLS and ALS estimators are method-of-moments estimators, meaning that the criterion functions for the estimators are polynomials whose coefficients are sample moments of coordinates of the observed points. Hence, the OLS and ALS estimators can be computed efficiently.

In order to obtain parameters of two lines, the observed points are fitted with a conic section, and then the parameters of the conic section are used to obtain the parameters of two lines. There are some papers where this idea is used.

The problem of estimating the fundamental matrix for two-camera view is considered in [6]. The fundamental matrix is a singular matrix whose left and right null-vectors are the coordinates of each camera in the coordinate system of the other camera. Initially, the ALS estimator of the fundamental matrix is evaluated. Then it is projected so that the estimated fundamental matrix is singular.

In [14] the problem of segmentation of a finite-dimensional vector space onto linear subspaces is considered, and the generalized principal component analysis method is introduced. The sample is fitted with an algebraic cone (a set of points that satisfy a homogeneous algebraic equation) by the OLS method. Then subspaces are extracted from the algebraic cone with use of a small learning sample. An application of segmentation of a vector space onto hyperplanes for searching planes on binocular image is given in [16].

In [15] an ellipsoid fitting problem with a constraint such that a center of the ellipsoid lies on a given line is considered. The algebraic distance with embedded constraint is minimized. The analytical (behavioral) properties of the optimization problem are studied. We consider a conic section fitting problem but with different constraint—the conic is degenerated to a couple of straight lines.

Geometric distance methods, where distances between the estimated curve and each point are minimized. The sum of squares of those distances is minimized, and the orthogonal regression (OR) estimator is obtained.

A numerical algorithm for evaluation of the orthogonal regression estimator is presented in monograph [1].

The orthogonal regression is consistent in the single straight line fitting problem [3, Section 1.3.2(a)]. In nonlinear models, the estimator may be inconsistent. There is a one-step correction procedure in explicit and implicit models [5, 9] with application in the ellipsoid fitting model [9]. However, in the two-line fitting model, the correction from [9] is unstable.

Probabilistic methods. They are used to obtain the maximum likelihood (ML) estimator and Bayes estimators.

1.5 Notation

Let {An,n=1,2,…} be a sequence of random events. The random event An is said to hold eventually if almost surely there exists n0 such that An occurs for all n≥n0 . In other words, the random event An holds eventually if and only if it does not occur only for finitely many n almost surely.

The estimator βˆ is called asymptotically normal if n(βˆ−βtrue)→N(0,Σ) in distribution, were the asymptotic covariance matrix Σ may be singular, and n is the sample size. This definition differs from the conventional one adopted in asymptotic theory because here only n -asymptotic normality is considered.

Let ζ∼N(μ,Σ) be a bivariate random vector. Then E={z:(z−μ)⊤Σ−1(z−μ)≤1} is called the 40% ellipsoid of the normal distribution because P(ζ∈E)≈0.3935 . This is the ellipsoid where the probability density function is at least 0.3679 of its maximum.

1.6 Outline

In Section 2, we construct five estimators for parameters of the two line fitting model. In Section 3, we propose two definitions of the equivariance of an estimator and state that all of the five estimators are equivariant. The estimators are compared numerically in Section 4. The proofs are given in Appendix A.

2 Estimators 2.1 ALS2 estimator and its projections

The two-line fitting model is a restriction of the conic section fitting model. A couple of lines defined by the equation (k1ξ−η+h1)(k2ξ−η+h2)=0 is a degenerate conic section (11) Aξ2+2Bξη+Cη2+2Dξ+2Eη+F=0, with coefficients (12) A=Ck1k2,2D=C(k1h2+k2h1),2B=−C(k1+k2),2E=−C(h1+h2),F=Ch1h2, with a constraint C≠0 .

The conic section ALS2 estimator provides estimation of the error variance σ2 and the coefficients A,B,…,F .

Denote by ν(i)∈{1,2} the indicator of a line which the true point (ξi,ηi) belongs to. Equation (1) can be rewritten as ηi=kν(i)ξi+hν(i),i=1,2,…. The indicator ν(i) is nonrandom in the functional model, and it is a random variable in the structural model.

Proposition 3.

Let, in the functional model, ∑n=1∞ξn6n2<∞; eitherk1≠k2orh1≠h2,supn≥11n∑i=1nξi2<∞;and lim infn→∞λmin1n∑i=1,…,nν(i)=j1ξiξi2ξiξi2ξi3ξi2ξi3ξi4>0for j=1,2. 0\hspace{1em}\textit{for }j=1,\hspace{0.1667em}2.\]]]> Then the ASL2 estimators βˆ and σˆ2 are strongly consistent in the sense of (7) and (8).

There are two cases where the structural model is not identifiable. If the common distribution of the true points is concentrated on a straight line and on a single point (presumably not on the line), that is, (13) ∃lineℓ⊂R2∃z∈R2:supp(ξ1,η1)⊂ℓ∪{z}, then there are many ways to fit the true points with two lines. If the common distribution of the true points is concentrated in four points, that is, (14) #supp(ξ1,η1)=4, then there are three ways to fit the true points with two lines (unless three of the four points lie on a straight line, which is a particular case of (13)).

Proposition 4.

In the structural model, assume that E|ξ1|3<∞ and that nonidentifiability conditions (13) and (14) do not hold. Then the ALS2 estimator is strongly consistent in the sense of (7) and (8).

In order to estimate the parameters k1 , h1 , k2 , and h2 , we can solve Eqs. (12). With ignoring the last equation F=Ch1h2 , the solution is (15) k1,2=−B±B2−ACC, (16) h1=2(D+k1E)C(k2−k1), (17) h2=2(D+k2E)C(k1−k2).

Substituting the elements of the ALS2 estimator βˆ=(Aˆ,2Bˆ,Cˆ,2Dˆ,2Eˆ,Fˆ)⊤ into the right-hand side of (15)–(17), we obtain an “ignore- Fˆ ” estimator: (18) kˆ1,2=−Bˆ±Bˆ2−AˆCˆCˆ, (19) hˆ1=2(Dˆ+kˆ1Eˆ)Cˆ(kˆ2−kˆ1), (20) hˆ2=2(Dˆ+kˆ2Eˆ)Cˆ(kˆ1−kˆ2).

If the conic section estimated by the ALS2 estimator is a hyperbola, then the “ignore- Fˆ ” estimate of the two lines comprises the asymptotes of the hyperbola.

Choose the sign ± in (18) such that kˆ1<kˆ2 .

We need the notation (k1,h1,k2,h2)⊤=lob(β) for the function that expresses the line parameters k1 , h1 , k2 , h2 in elements of β and is defined by (15)–(17). With this notation, we can write (kˆ1,hˆ1,kˆ2,hˆ2)⊤=lob(βˆ)=lob(β˜).

Proposition 5.

In the functional model, assume the following: k1<k2, ∑n=1∞ξn6n2<∞,and lim infn→∞λmin1n∑i=1,…,nν(i)=j1ξiξi2ξiξi2ξi3ξi2ξi3ξi4>0for j=1,2. 0\hspace{1em}\textit{for }j=1,\hspace{0.1667em}2.\]]]> Then the “ignore- Fˆ ” estimator of the parameters of two lines is strongly consistent, that is, kˆj→kj,hˆj→hj,j=1,2, as n→∞ almost surely.

Proposition 6.

If in the structural model, k1<k2 , E|ξ1|3<∞ , and neither condition (13) nor condition (14) holds, then the “ignore- Fˆ ” estimator is consistent.

Now, we state the asymptotic normality of the “ignore- Fˆ ” estimator.

Proposition 7.

In the functional model, assume the following: •

k1<k2 ,

•

for j=1,2 and p=0,1,…,6 , the following limits exist and are finite: μp(j):=limn→∞1n∑i=1,…,nν(i)=jξip.

•

for j=1 and j=2 , the matrices μ0(j)μ1(j)μ2(j)μ1(j)μ2(j)μ3(j)μ2(j)μ3(j)μ4(j) are nonsingular.

Then the “ignore-

Fˆ

” estimator

(kˆ1,hˆ1,kˆ2,hˆ2)⊤

is asymptotically normal, namely

(21) nkˆ1−k1hˆ1−h1kˆ2−k2hˆ2−h2⟶dN(0,KΣβ˜K⊤),

where

Σβ˜

is the asymptotic covariance matrix of

β˜

, and K is the

4×6

matrix of derivatives of the mapping

(A,2B,C,2D,2E,F)⊤↦(k1,h1,k2,h2)⊤

defined in (15)–(17) at the true parameters

βtn

, that is,

K=dlob(β)dβ⊤|β=βtn.

The matrix KΣβ˜K⊤ is nonsingular.

Proposition 8.

If, in the structural model, k1<k2 , Eξ16<∞ , and neither (13) nor (14) holds, then the “ignore- Fˆ ” estimator is asymptotically normal, that is, (21) holds.

Remark 2.

The estimators kˆ1 , hˆ1 , kˆ2 , and hˆ2 obtained in (18)–(20) do not change if Aˆ , Bˆ , …, Eˆ are multiplied by a common factor. So it does not matter which normalization of β is used.

Equation (11) represents a couple of intersecting straight lines if and only if (22) ABDBCEDEF=0andAC<B2. Denote Δ(β)=ABDBCEDEF=ACF+2BDE−AE2−CD2−B2F, Δ′(β)=dΔ(β)dβ⊤=(CF−E2,DE−BF,AF−D2,BE−CD,BD−AE,AC−B2), where the function Δ(β) and its derivative Δ′(β) are evaluated at the point β=(A,2B,C,2D,2E,F)⊤ .

Perform one-step update of the estimator β˜ to make it closer to the surface Δ(β)=0 : (23) β˜1st=β˜−Δ(β˜)Δ′(β˜)Σˆβ˜Δ′(β˜)⊤Σˆβ˜Δ′(β˜)⊤. Then use expressions (15)–(17) to estimate k1,h1,k2,h2 : (kˆ1,1st,hˆ1,1st,kˆ2,1st,hˆ2,1st)⊤=lob(β˜1st). Proposition 9.

Under the conditions of Proposition 7 in the functional model or under the conditions of Proposition 8 in the structural model, the estimator (kˆ1,1st,hˆ1,1st,kˆ2,1st,hˆ2,1st)⊤ is consistent and asymptotically normal, and its asymptotic covariance matrix is equal to K(Σβ˜−Σβ˜Δ′(βtn)⊤Δ′(βtn)Σβ˜Δ′(βtn)Σβ˜Δ′(βtn)⊤)K⊤.

Remark 3.

The normalization of the estimator βˆ affects its asymptotic covariance matrix, and hence has effect on the estimates (kˆ1,1st,hˆ1,1st,kˆ2,1st,hˆ2,1st)⊤ . However, the normalization does not affect the asymptotic covariance matrix of (kˆ1,1st,hˆ1,1st,kˆ2,1st,hˆ2,1st)⊤ .

2.2 Orthogonal regression estimator

The sum of squared distances between each observed point and the closer of two lines is equal to (24) Q(k1,h1,k2,h2)= ∑i=1nmin((yi−k1xi−h1)2k12+1,(yi−k2xi−h2)2k22+1). The orthogonal regression estimator is a Borel-measurable function of observations such that (kˆ1,OR,hˆ1,OR,kˆ2,OR,hˆ2,OR)∈argmax(k1,h1,k2,h2)∈R4Q(k1,h1,k2,h2).

In the functional model, the orthogonal regression estimator is the maximum likelihood estimator. However, because the dimension of parameter space grows as the sample size is increasing, the orthogonal regression estimator may be inconsistent.

2.3 Parametric maximum likelihood estimator

The estimator is constructed in the structural model, so it should be called the structural maximum likelihood estimator.

If a Gaussian distribution of a random point (ξ,η) is concentrated on a straight line η=kξ+h , then it is a singular normal distribution: (25) (ξ,η)∼Nμξkμξ+h,σξ2kσξ2kσξ2k2σξ2, where μξ and σξ2 are the expectation and variance of the random variable ξ. Note that the covariance matrix σξ2(1kkk2) is singular and positive semidefinite.

If the distribution of a random point (ξi,ηi) is concentrated on two straight lines η=k1ξ+h1 and η=k2ξ+h2 and the distribution on each line is Gaussian, then, due to (25), the conditional distributions are [ξi,ηi∣ν(i)=j]∼Nμjξkjμjξ+hj,σjξ2kjσjξ2kjσjξ2kj2σjξ2=N(μj,Σ0j) for j=1,2 . The matrices Σ0j are positive semidefinite and singular, that is, λmin(Σ01)=λmin(Σ02)=0 , and the points μj are the centers of Gaussian distribution of the points on each line.

The distribution of (ξi,ηi) is a mixture of two singular normal distributions (26) ξiηi∼mixture ofN(μ1,Σ01)with weight p,N(μ2,Σ02)with weight 1−p, where p=P(ν(i)=1)=P(ν(1)=1) is the probability that the point (ξi,ηi) lies of the first line.

The distribution of the observed points is also a mixture of two Gaussian distributions (27) xiyi∼mixture ofN(μ1,Σ1)with weight p,N(μ2,Σ2)with weight 1−p, with Σj=Σ0j+σ2I , where σ2 is the error variance; see (4). Note that λmin(Σ1)=λmin(Σ2)=σ2 .

The likelihood function for the sample of points with a mixture of two normal distributions is (28) L(p,μ1,Σ1,μ2,Σ2)= ∏i=1n(pϕN(μ1,Σ1)(xi,yi)+(1−p)ϕN(μ1,Σ1)(xi,yi)), where ϕN(μ,Σ)(x,y)=12πdetΣexp{−12xy−μ⊤Σ−1xy−μ} is the density of a bivariate normal distribution.

One method of evaluating the maximum likelihood estimator is as follows:

Find the point of conditional minimum (μˆ1,Σˆ1,μˆ2,Σˆ2)=argminμ1,Σ1,μ2,Σ2suchthatλmin(Σ1)=λmin(Σ2)minp∈[0,1]L(p,μ1,Σ1,μ2,Σ2).

Set (29) σˆ2=λmin(Σ1)=12(σˆ1xx+σˆ1yy−(σˆ1xx−σˆ1yy)2+4σˆ1xy2). Here σˆjxx , σˆjxy , and σˆjyy are the entries of the matrix Σˆj , and μˆjx and μˆjy are the elements of the vector μˆj : μˆj=μˆjxμˆjy,Σˆj=σˆjxxσˆjxyσˆjxyσˆjyy.

Find the estimates kˆ1 , hˆ1 , kˆ2 , hˆ2 from the equations μˆj=μjxkˆjμjx+hˆj,Σˆj=σˆjξ2+σˆ2kˆjσˆjξ2kˆjσˆjξ2kˆj2σˆjξ2+σˆ2, that is, set (30) kˆj=σˆjxyσˆjxx−σˆ2, (31) hˆj=μˆjy−kˆjμˆjx.

The denominator σˆjxx−σˆ2 may be equal to 0 with some positive probability. Occurrence of this event means that the estimated figure is a straight line and a single point outside the line rather than two straight lines.

In order to make the statement of consistency easier, assume that k1<k2 and choose the estimator such that kˆ1≤kˆ2 .

2.4 RBAN moment estimator

The regular best asymptotically normal (RBAN) estimators were developed by Chiang [4]. Our RBAN moment estimator differs from the original RBAN so that not only the observed points (xi,yi) , but also monomials xipyiq , p+q≤4 , are averaged.

Introduce the 14-dimensional vectors whose elements are the monomials of coordinates of observed points: (32) m(x,y)=(x4,x3y,x2y2,xy3,y4,x3,x2y,xy2,y3,x2,xy,y2,x,y)⊤,mi=m(xi,yi).

Evaluate the average and sample covariance matrix of the vectors mi : m‾=1n ∑i=1nmi,Σm=1n ∑i=1n(mi−m‾)(mi−m‾)⊤.

Denote f1(k,h,σ2;(μp)q=14)=Em(ξ+δ,kξ+h+ε), where ξ, δ, and ε are independent random variables such that Eξq=μqand(δ,ε)⊤∼N(0,σ2I). Basically, the function f1 is defined for all μp , p=1,…,4 , that comprise possible 4-tuples of moments of a random variable, that is, satisfy μ2−μ12≥0,(μ4−4μ3μ1+6μ2μ12−3μ14)(μ2−μ12)−(μ3−3μ2μ1+2μ13)2−(μ2−μ12)3≥0; see [10]. However, since the elements of the vector-function f1 are polynomials of its arguments, it can be extended to R7 .

Denote f2(k1,h1,k2,h2,σ2;p,(μq(j))j=1,2q=14)=pf1(k1,h1,σ2;(μq(1))q=14)+(1−p)f1(k2,h2,σ2;(μq(2))q=14).

In the structural model, Emi=f2(k1,h1,k2,h2,σ2;P(ν(1)=1),(E[ξ1q∣ν(1)=j])j=1,2q=14).

Consider the equation (33) f2(kˆ1,hˆ1,kˆ2,hˆ2,σˆ2;pˆ,(μˆq(j))j=1,2q=14)=m‾. It is a system of 14 equations in 14 variables. If (33) has a solution, then the moment estimator can be defined as one of the solutions. However, (33) may have no solution.

In the rest of Section 2.4, μ∙(∙)=(μq(j))j=1,2q=14 is a 2×4 matrix.

The estimator is defined as a point where (f2(…)−m‾)⊤Σm−1(f2(…)−m‾) attains its minimum: (kˆ1,hˆ1,kˆ2,hˆ2,σˆ2)=argmink1,…,σ2minp,μ∙(∙)(f2(k1,…,σ2;p,μ∙(∙))−m‾)⊤Σm−1(f2(k1,…,σ2;p,μ∙(∙))−m‾). This minimization problem is similar to that in Theorem 6 in [4]. The minimum minp∈Rminμ∙(∙)∈R2×4(f2(k1,…,σ2;p,μ∙(∙))−m‾)⊤Σm−1(f2(k1,…,σ2;p,μ∙(∙))−m‾)⊤ can be evaluated explicitly, and this allows us to reduce the dimension of minimization problem. The reduction of dimension of the optimization problem was used, for example, in [8].

The routines evaluating the RBAN-moment estimator and the estimator for its covariance matrix are developed without rigid theoretical basis; see Section 4.3.

3 Equivariance 3.1 Two definitions of equivariance

The similarity transformation of R2 is (34) g(z)=KUz+Δz,z∈R2, where U is an orthogonal matrix, K≠0 is a scaling coefficient, and Δz∈R2 is an intercept.

The transformation of a sample of points acting elementwise is also denoted g(Z) : if Z={zi,i=1,…,n} , then g(Z)={g(zi),i=1,…,n} .

Hereafter, we use vector notation: the observed points are denoted zi=(xi,yi)⊤ , and the true points are denoted ζi=(ξi,ηi)⊤ .

The underlying statistical structure is (Rn×2,B(Rn×2),PZ∣θ,θ∈Θ) , where Z∈Rn×2 is the observed sample, Z={zi,i=1,…,n} , B(Rn×2) is the Borel σ-field, and θ is a parameter that uniquely identifies the distribution of the observed points; θ=(ζ1,…,ζn;σ2) in the functional model, and θ=(Pζ;σ2) in the structural model. Here ζ1,…,ζn are points located on two strait lines, and Pζ is a probability measure concentrated on two straight lines.

The statistical structure is invariant with respect to transformation g if the change of the probability measure induced by the transformation of the sample can be obtained by some transformation g˜ of parameters, that is, if there exists a bijection g˜:Θ→Θ such that ∀θ∈Θ:Pg(Z)∣θ=PZ∣g˜(θ). Here Pg(Z)∣θ is the induced probability measure; it is sometimes denoted Pg(Z)∣θ=PZ∣θg−1 .

The statistical structure is similarity invariant if it is invariant with respect to all similarity transformations of the form (34).

In order to become similarity invariant, the underlying statistical structure needs some extension. We assume that the true points lie on two lines, which may be parallel to the y-axis. The following restrictions do not ruin the invariance:

•

The true lines ℓ1 and ℓ2 intersect each other but do not coincide.

•

The true points ζ1…,ζn in the functional model or the set supp(Pζ) where the true points are concentrated in the structural model can be covered with two lines uniquely. In the structural model, this means that the nonidentifiability conditions (13) and (14) do not hold.

With these restrictions, the statistical structure is invariant with g˜(ζ1,…,ζn;σ2)=(g(ζ1),…,g(ζn);K2σ2) in the functional model and g˜(Pζ;σ2)=(Pg(ζ);K2σ2) in the structural model.

Let ℓℓ(θ) and sigma2(θ) be functions that extract the parameters of interest. If, in the functional model, θ=(ζ1,…,ζn;σ2) and points ζ1,…,ζn lie on the lines ℓ1 and ℓ2 or if, in the structural model, θ=(Pζ;σ2) and the probability measure is concentrated on the union of two lines ℓ1∪ℓ2 , then ℓℓ(θ)={ℓ1,ℓ2} . If θ=(…;σ2) , then sigma2(θ)=σ2 .

We treat {ℓ1,ℓ2} as an unordered couple, that is, {ℓ1,ℓ2}={ℓ2,ℓ1} .

The transformation of the lines parameters and the transformation of σ2 do not interfere each other, and these transformations are not interfered by a particular location or distribution of true points on the lines, that is, the parameter transformation g˜ is such that there exists transformations g˜ℓℓ and g˜σ2 such that ℓℓ(g˜(θ))=g˜ℓℓ(ℓℓ(θ)), sigma2(g˜(θ))=g˜σ2(sigma2(θ)). These transformations are (35) g˜ℓℓ({ℓ1,ℓ2})={g(ℓ1),g(ℓ2)}, g˜σ2(σ2)=K2σ2.

The estimator is called equivariant with respect to the transformation g if, when the data are transformed, the estimator follows the inducing transformation of parameters. The estimator ℓℓˆ(Z) for two lines and the estimator σˆ2(Z) for error variance are equivariant with respect to similarity transformation g if (36) ℓℓˆ(g(Z))=g˜ℓℓ(ℓℓˆ(Z)), (37) σˆ2(g(Z))=g˜σ2(σˆ2(Z)). The estimator is called similarity equivariant if it is equivariant with respect to any similarity transformation g.

In a fitting problem, an estimator for a “true figure” is called fitting equivariant with respect to transformation g(z) , g:R→R if, when the sample is transformed, the estimated “true figure” follows the same transformation g. An estimator is called similarity fitting equivariant if it is fitting equivariant with respect to any similarity transformation.

In the two-line fitting problem, denote by ∪{ℓ1,ℓ2}=ℓ1∪ℓ2 the union of a pair of two lines. An estimator ℓℓˆ(Z) is similarity fitting equivariant if and only if for any similarity transformation g(z) , (38) ∪ℓℓˆ(g(Z))=g(∪ℓℓˆ(Z)).

The similarity fitting equivariant estimator depends on geometry of the plane and does not depend on the Cartesian coordinate system used.

Because of (35), in the two-line fitting model, the estimator for two lines ℓℓˆ(Z) is similarity equivariant if and only if it is similarity fitting equivariant.

3.2 Similarity equivariance of the five estimators

Some troubles, which may arise during estimation, are not addressed yet.

•

The estimation may fail with small positive probability. For example, the conic section estimated with the ALS2 estimator is an ellipse with some positive probability, and if it is, then the “ignore- Fˆ ” estimator fails. (If the estimator is consistent, then the failure probability tends to 0 as n→∞ ).

•

The estimation may fail, for example, because the estimated line should be parallel to the y-axis, but the estimating procedure does not handle such case.

•

The optimization problem may have multiple extremal points. For the ALS2 estimator, it may occur that dim{β:Ψn(σˆ2)β=0}>1 1$]]>.

In order to define the equivariance of an unreliable estimator, we allow that the estimators fail simultaneously in both sides of (36), (37), or (38). Also, we allow that for fixed similarity transformation g(z) , equation (36), (37), or (38) does not hold with probability 0.

The equivariance of the ALS2 estimator in the conic section fitting problem is verified in [11, Section 5.5] (see Theorem 30 there for similarity fitting equivariance). That implies the equivariance of the “ignore- Fˆ ” estimator.

In order to make the updated before ignore- Fˆ step estimator equivariant, we use normalization of the ALS2 estimator (9) rather than ‖βˆ‖=1 .

The orthogonal regression estimator and the parametric maximum likelihood estimator are maximum likelihood estimators, but in different models. Thus, they are equivariant.

The criterion function for the RBAN-moment estimator is similarity invariant. This means that the criterion function does not change when the data sample follows a similarity transformation and the parameters follow the inducing transformation. Thus, the RBAN-moment estimator is equivariant.

3.3 An example of equivariant but not fitting equivariant estimator

Consider a further restriction of the mixture-of-two-normal-distributions model from Section 2.3. Assume that covariance matrices of Σ1 and Σ2 have the same diagonal entries but additive inverse off-diagonal entries: Σ1=σξ2+σ2−kσξ2−kσξ2k2σξ2+σ2andΣ2=σξ2+σ2kσξ2kσξ2k2σξ2+σ2.

The statistical structure is invariant in scaling of the y-coordinate, (xnew,ynew)=(xold,ryold) , r>0 0$]]>. This transformation maps the lines y=−kx+h1 and y=kx+h2 onto the lines y=−rkx+rh1 and y=rkx+rh2 , respectively. The maximum likelihood estimator in this model is equivariant. However, this equivariance is somewhat strange. The transformation of parameters that induces the scaling of the y-coordinate of the observed points does not induce the same transformation of the true points nor the same mapping of the true lines. The estimated lines follow the transformation of parameters rather than the transformation of observed points. This is illustrated in Fig. 1.

Fig. 1.

Two samples, one of points (x,yold) (the unmodified sample) and one of points (x,ynew)=(x,12yold) (the shrunken-in-y sample), are fitted with two lines (the estimated lines are the solid lines on the figures). The estimated lines for the unmodified sample (blue solid lines on the left figure) when scaled with the same transformation as the observed points are scaled (blue dashed line on the right figure) do not coincide with the actually estimated lines for the shrunken-in-y sample (red solid lines on the right figure). The ellipsoids are the 40% ellipsoids of the estimated normal distributions (the compound distributions of the estimated mixture distributions) (color figure online)

Let kold be the true value of the parameter k before the transformation. Then after the transformation, the value of the parameter is knew=t+t2+4r2kold2σξold42rkoldσξold2 with t=(r2kold2−1)σξold2+(r2−1)σold2 . If 0<r2≠1 , kold≠0 , and σ2>0 0$]]>, then knew≠rkold . Hence, the maximum likelihood estimator is not fitting equivariant with respect to scaling of the y-coordinate here.

4 Simulations 4.1 Simulation setup

A sample of the true points (ξi,ηi) , i=1,…,n , is generated from a random distribution concentrated on (a subset of) two lines. Three distributions of the true points are used; see Fig. 2:

•

a mixture of two singular normal distributions,

•

a discrete distribution,

•

a uniform distribution on two line segments.

Fig. 2.

Three distributions of the true points: a mixture of two singular normal distributions, a discrete distribution, and a uniform distribution on two line segments. For the first case, a sample of 1000 points is plotted, whereas for the second and third cases, the support of the distribution of the true points is plotted. For the first case, the distribution of the observed points is a mixture of normal distributions, and 40% ellipsoids for its components are plotted

These three distributions of true points are concentrated on the same two lines 4y=1−3xand12y=16x+5, which intersect one another at the point (−0.08,0.31) .

For the same sample of true points {(ξi,ηi),i=1,…,n} , 100 samples of the measurement errors {(δi,εi),i=1,…,n} , (δi,εi)⊤∼N(0,σ2) , are simulated, and 100 samples of the observed points (xi,yi) are obtained; see (2) and (3). For each sample of the observed points, the estimates of the parameters of the true lines were evaluated with the following five methods: two ALS2-based estimators (the ignore- Fˆ estimator and the estimator with one-step update of the ALS2 estimator before the ignore- Fˆ step), the orthogonal regression estimator, the parametric maximum likelihood estimator, and the RBAN moment estimator.

For each estimated couple of lines, the point of their intersection is found. The 100 estimates of intersection points are averaged, and their sample standard deviations are evaluated. For the ALS2-based estimators and the RBAN moment estimator, the standard errors of the estimators are also evaluated.

4.2 Notes on computation of particular estimators

For computation of the orthogonal regression estimator, the k-means method is used. Initially, two lines were chosen randomly. Then classification and mean steps are alternated. On the classification step, the observed points are split into two clusters based on which line is closer to the point. (The first cluster contains all the observed points that are closer to the first line than to the second line, and the second cluster contains the other observed points.) On the means step, each cluster is fitted with a straight line by the orthogonal regression method (the two lines are updated). The algorithm is completed when the classification step does now change the clusters. The obtained parameters of the two lines deliver a local minimum to the criterion function Q(k1,h1,k2,h2) (24). Trying to obtain the global minimum, the algorithm is restarted several times with different initial two lines.

For computation of the parametric maximum likelihood estimator, the expectation–maximization algorithm [2] is used. The equation for optimization problem of finding a minimum of the likelihood function L(p,μ1,Σ1,μ2,Σ2) (28) such that λmin(Σ1)=λmin(Σ2) is ∑i=1nϕN(μ1,Σ1)(xi,yi)−ϕN(μ2,Σ2)(xi,yi)pϕN(μ1,Σ1)(xi,yi)+(1−p)ϕN(μ2,Σ2)(xi,yi)=0, ∑i=1np∂ϕN(μ1,Σ1)(xi,yi)∂par+(1−p)∂ϕN(μ2,Σ2)(xi,yi)∂parpϕN(μ1,Σ1)(xi,yi)+(1−p)ϕN(μ2,Σ2)(xi,yi)=0, where par is a vector parameterization of (μ1,Σ1,μ2,Σ2) such that λmin(Σ1)=λmin(Σ2) . Hence, the maximum likelihood estimator is a stationary point of the function Qw(p,par)= ∑i=1n(wiln(pϕN(μ1,Σ1)(xi,yi))+(1−wi)ln((1−p)ϕN(μ1,Σ1)(xi,yi))) with fixed wi=pˆϕN(μˆ1,Σˆ1)(xi,yi)pˆϕN(μˆ1,Σˆ1)(xi,yi)+(1−pˆ)ϕN(μˆ2,Σˆ2)(xi,yi).

The EM algorithm is iterative. Once the (m−1) th approximation (p(m−1),μ1(m−1),Σ1(m−1),μ2(m−1),Σ2(m−1)) is obtained, the weights are evaluated: wi(m−1)=p(m−1)ϕN(μ1(m−1),Σ1(m−1))(xi,yi)p(m−1)ϕN(μ1(m−1),Σ1(m−1))(xi,yi)+(1−p(m−1))ϕN(μ2(m−1),Σ2(m−1))(xi,yi). Then mth approximation (p(m),μ1(m),Σ1(m),μ2(m),Σ2(m)) is obtained by minimizing ∑i=1n(wi(m−1)ln(pϕN(μ1,Σ1)(xi,yi))+(1−wi(m−1))ln((1−p)ϕN(μ2,Σ2)(xi,yi))) under the constraint λmin(Σ1)=λmin(Σ2) . The point where the minimum is attained can be explicitly expressed in wi(m−1) , xi , and yi , i=1,…,n .

4.3 RBAN-moment estimator

In case the criterion function Q(θ)=Q(k1,h1,k2,h2,σ2)=minp∈RminM∈R2×4(f2(θ;p,M)−m‾)⊤Σm−1(f2(θ;p,M)−m‾) has multiple minima, a consistent estimator—that is, the “ignore- Fˆ ” estimator—is used as the initial point, and the criterion function Q(θ) is searched for a local minimum nearby. Here θ=(k1,h1,k2,h2,σ2)⊤ is a vector meaning the parameters of interest.

The knowledge or misspecification of the parameter p does not affect the estimator for the parameters of interest k1 , …, σ2 . Thus, for estimation of the asymptotic covariance matrix, assume p=0.5 to be known. The estimator of the asymptotic covariance matrix of (θ,M) is Σˆθ,M=(f2′(θˆ;0.5,Mˆ)⊤Σm−1f2′(θˆ;0.5,M)ˆ)−1, where f2′(θ;0.5,M)=∂f2(θ;0.5,M)∂θ⊤,∂f2(θ;0.5,M)∂(vecM)⊤. The estimator of the asymptotic covariance matrix of θ is the principal submatrix of Σˆθ,M .

4.4 Simulation results

Average of estimated centers over 100 simulations, standard deviations over 100 simulations, and medians of estimated standard errors are presented in Tables 1–3.

Table 1.

Means, standard deviations, and median standard errors of the estimates of intersection points for true points having mixture of singular normal distributions

Method	Means		Standard deviations		Standard errors
True value	−0.08	0.31
n=1000,σ=0.1(σ2=0.01)
Ignore- Fˆ	−0.1098	0.2753	0.8611	0.3866	0.1918	0.1125
Update	−0.0820	0.2912	0.0706	0.0620	0.0437	0.0479
OR	0.6533	3.4524	0.0877	0.6783
ML	−0.0795	0.3077	0.0326	0.0269
RBAN	0.0647	0.3759	0.3563	0.2606	0.0350	0.0438
n=10000,σ=0.1(σ2=0.01)
Ignore- Fˆ	−0.0909	0.3052	0.0646	0.0308	0.0601	0.0303
Update	−0.0796	0.3080	0.0127	0.0156	0.0124	0.0155
OR	0.5492	3.1488	0.0175	0.1701
ML	−0.0776	0.3083	0.0103	0.0088
RBAN	−0.0789	0.3100	0.0126	0.0154	0.0127	0.0154
n=100000,σ=0.1
Ignore- Fˆ	−0.0799	0.3101	0.0211	0.0095	0.0188	0.0093
Update	−0.0801	0.3098	0.0037	0.0042	0.0039	0.0047
OR	0.5606	3.2041	0.0063	0.0484
ML	−0.0801	0.3101	0.0030	0.0025
RBAN	−0.0801	0.3101	0.0038	0.0042	0.0039	0.0048
n=1000,σ=0.02
Ignore- Fˆ	−0.0799	0.3099	0.0151	0.0075	0.0147	0.0072
Update	−0.0795	0.3097	0.0052	0.0051	0.0052	0.0049
OR	−0.0792	0.3092	0.0050	0.0044
ML	−0.0794	0.3093	0.0048	0.0043
RBAN	−0.0797	0.3098	0.0063	0.0057	0.0052	0.0049

Table 2.

Means, standard deviations, and median standard errors of the estimates of intersection points for discrete distribution of the true points

Method	Means		Standard deviations		Standard errors
True value	−0.08	0.31
n=1000,σ=0.1
Ignore- Fˆ	−0.0699	0.3077	0.0263	0.0290	0.0241	0.0263
Update	−0.0722	0.3116	0.0197	0.0186	0.0203	0.0175
OR	−0.0755	0.3188	0.0148	0.0144
ML	−0.0958	0.3315	0.0131	0.0120
RBAN	−0.0717	0.3105	0.0209	0.0175	0.0205	0.0178
n=10000,σ=0.1
Ignore- Fˆ	−0.0783	0.3109	0.0092	0.0078	0.0080	0.0083
Update	−0.0785	0.3114	0.0071	0.0054	0.0065	0.0061
OR	−0.0721	0.3157	0.0048	0.0046
ML	−0.0931	0.3278	0.0043	0.0035
RBAN	−0.0786	0.3113	0.0071	0.0053	0.0065	0.0061
n=100000,σ=0.1
Ignore- Fˆ	−0.0798	0.3098	0.0031	0.0024	0.0026	0.0027
Update	−0.0799	0.3099	0.0025	0.0016	0.0021	0.0019
OR	−0.0715	0.3151	0.0017	0.0013
ML	−0.0932	0.3283	0.0013	0.0011
RBAN	−0.0799	0.3099	0.0024	0.0017	0.0021	0.0019
n=1000,σ=0.02
Ignore- Fˆ	−0.0796	0.3094	0.0033	0.0032	0.0036	0.0033
Update	−0.0798	0.3097	0.0030	0.0024	0.0033	0.0023
OR	−0.0782	0.3086	0.0021	0.0018
ML	−0.0786	0.3087	0.0019	0.0018
RBAN	−0.0796	0.3092	0.0030	0.0028	0.0033	0.0024

Table 3.

Means, standard deviations, and median standard errors of the estimates of intersection points for uniform distribution of the true points on two line segments

Method	Means		Standard deviations		Standard errors
True value	−0.08	0.31
n=1000,σ=0.1
Ignore- Fˆ	−0.0785	0.3122	0.0363	0.0274	0.0318	0.0301
Update	−0.0794	0.3140	0.0216	0.0258	0.0205	0.0290
OR	−0.0616	0.3127	0.0185	0.0167
ML	−0.0934	0.3118	0.0116	0.0111
RBAN	−0.0807	0.3126	0.0219	0.0293	0.0193	0.0292
n=10000,σ=0.1
Ignore- Fˆ	−0.0796	0.3107	0.0103	0.0103	0.0099	0.0095
Update	−0.0796	0.3110	0.0067	0.0103	0.0065	0.0094
OR	−0.0639	0.3087	0.0064	0.0049
ML	−0.0904	0.3106	0.0042	0.0033
RBAN	−0.0797	0.3107	0.0066	0.0104	0.0064	0.0095
n=100000,σ=0.1
Ignore- Fˆ	−0.0798	0.3098	0.0035	0.0030	0.0032	0.0030
Update	−0.0798	0.3098	0.0021	0.0029	0.0020	0.0030
OR	−0.0625	0.3085	0.0015	0.0014
ML	−0.0891	0.3107	0.0012	0.0011
RBAN	−0.0796	0.3097	0.0023	0.0030	0.0020	0.0030
n=1000,σ=0.02
Ignore- Fˆ	−0.0799	0.3100	0.0041	0.0032	0.0041	0.0035
Update	−0.0798	0.3101	0.0033	0.0032	0.0032	0.0034
OR	−0.0803	0.3103	0.0023	0.0021
ML	−0.0805	0.3101	0.0022	0.0020
RBAN	−0.0798	0.3100	0.0035	0.0033	0.0032	0.0034

Using the estimator Fˆ (by one-step update before ignore- Fˆ step), we improve the precision of estimation. The precision of the RBAN-moment estimator approximates the precision of the updated before ignore- Fˆ step estimator.

The parametric maximum likelihood estimator is the best when the normality condition, which was assumed during construction of the estimator, is satisfied. Otherwise, it is biased.

The orthogonal regression and the maximum likelihood estimators are good for small error variance ( σ2=0.022 ). For σ2=0.12 , the orthogonal regression estimator is broken down when the distribution of true points is a mixture of two normal distributions and is biased for the two other distributions of true points.

Mean-square deviance of the intersection of the estimated lines from the true intersection point is presented in Table 4.

Table 4.

Mean-square distances between estimated and true intersection points

n	σ	Ignore- Fˆ	Update	OR	ML	RBAN
Distribution of true points is a mixture of normals
1000	0.1	0.9403	0.0954	3.2978	0.0421	0.6124
10000	0.1	0.0722	0.0201	2.9127	0.0138	0.0199
100000	0.1	0.0230	0.0056	2.9645	0.0038	0.0056
1000	0.02	0.0168	0.0073	0.0067	0.0065	0.0084
Discrete distribution of true points
1000	0.1	0.0403	0.0281	0.0228	0.0320	0.0284
10000	0.1	0.0121	0.0091	0.0118	0.0228	0.0090
100000	0.1	0.0039	0.0029	0.0101	0.0226	0.0029
1000	0.02	0.0046	0.0038	0.0036	0.0032	0.0042
Uniform distribution of true points
1000	0.1	0.0453	0.0367	0.0310	0.0209	0.0365
10000	0.1	0.0145	0.0123	0.0181	0.0117	0.0123
100000	0.1	0.0046	0.0036	0.0177	0.0093	0.0037
1000	0.02	0.0052	0.0046	0.0031	0.0030	0.0048

For small errors, the RBAN-moment estimator is a bit less accurate than the updated before ignore- Fˆ step estimator. For σ2=0.12 , the difference is negligible.

The parametric maximum likelihood estimator has the smallest deviation from the true value, except for the discrete distribution of true points and σ2=0.01 .

For small errors ( σ2=0.022 ), the orthogonal regression estimator outperforms the consistent estimators and has the deviation approximately as small as the parametric maximum likelihood estimator.

Normalization of the estimator of β affects the ALS2-based estimator of two lines with one-step update before the ignore- Fˆ step. With normalization ‖βˆ‖=1 , the derived estimator of two lines is not equivariant, whereas with normalization β˜⊤Ψn′(σˆ2)β˜=−n , the derived estimator is equivariant. Comparison of equivariant and nonequivariant versions of the estimator is displayed in Table 5.

Table 5.

Comparison of two versions (equivariant (ev) and nonequivariant (ne)) of the updated before ignore- Fˆ step estimator

n	σ	Ver.	Means		Standard deviations		Standard errors
True value:			−0.08	0.31
Distribution of true points is a mixture of normals
1000	0.1	ev	−0.082046	0.291175	0.070617	0.062003	0.043713	0.047939
		ne	−0.044038	0.247382	0.251372	0.514473	0.038853	0.050167
10000	0.1	ev	−0.079623	0.308039	0.012652	0.015582	0.012403	0.015471
		ne	−0.085055	0.304177	0.015924	0.018695	0.012506	0.015550
100000	0.1	ev	−0.080137	0.309780	0.003710	0.004173	0.003925	0.004749
		ne	−0.080991	0.309386	0.003880	0.004255	0.003926	0.004742
1000	0.02	ev	−0.079548	0.309703	0.005156	0.005131	0.005174	0.004891
		ne	−0.079918	0.309508	0.005247	0.005149	0.005179	0.004918
Discrete distribution of true points
1000	0.1	ev	−0.072202	0.311648	0.019709	0.018553	0.020266	0.017500
		ne	−0.071460	0.312230	0.020049	0.018740	0.020213	0.017457
10000	0.1	ev	−0.078482	0.311377	0.007087	0.005371	0.006518	0.006066
		ne	−0.078418	0.311436	0.007090	0.005387	0.006520	0.006054
100000	0.1	ev	−0.079868	0.309929	0.002460	0.001647	0.002060	0.001900
		ne	−0.079863	0.309934	0.002461	0.001647	0.002060	0.001901
1000	0.02	ev	−0.079772	0.309728	0.002967	0.002376	0.003320	0.002344
		ne	−0.079755	0.309732	0.002963	0.002375	0.003319	0.002339
Uniform distribution of true points
1000	0.1	ev	−0.079405	0.313977	0.021551	0.025759	0.020451	0.028992
		ne	−0.078507	0.315350	0.022219	0.026091	0.020512	0.029115
10000	0.1	ev	−0.079604	0.311024	0.006673	0.010337	0.006467	0.009389
		ne	−0.079576	0.311176	0.006685	0.010349	0.006456	0.009372
100000	0.1	ev	−0.079795	0.309802	0.002075	0.002919	0.001974	0.002994
		ne	−0.079794	0.309818	0.002076	0.002921	0.001972	0.002992
1000	0.02	ev	−0.079833	0.310081	0.003252	0.003249	0.003172	0.003418
		ne	−0.079825	0.310097	0.003250	0.003249	0.003169	0.003411

Table 6.

Coverage probability and area of confidence ellipsoids (c.e.) for centers by the ALS2 estimator

n	σ	Σˆtrue -based estimator			Σˆsample -based estimator
		Coverage probab.		Area of	Coverage probab.		Area of
		80%,	95%,	95% c.e.,	80%,	95%,	95% c.e.,
		%	%	×10−4	%	%	×10−4
Distribution of true points is a mixture of normals
1000	0.1	70.6	80.2	1449.	70.0	79.2	1562.
10000	0.1	79.4	93.8	236.2	79.6	92.9	259.4
100000	0.1	80.7	94.9	15.38	80.6	94.9	15.17
1000	0.02	80.4	95.1	15.95	78.0	94.1	19.86
Discrete distribution of true points
1000	0.1	78.1	93.9	81.80	77.4	93.4	78.94
10000	0.1	81.1	95.6	12.34	80.9	95.8	12.39
100000	0.1	80.1	94.6	1.205	79.9	94.7	1.204
1000	0.02	81.0	94.9	1.984	81.3	95.2	1.988
Uniform distribution of true points
1000	0.1	82.1	94.3	152.9	81.5	94.3	138.4
10000	0.1	81.0	96.6	20.36	80.3	96.3	18.92
100000	0.1	78.4	95.0	1.823	78.6	95.0	1.842
1000	0.02	78.7	94.7	2.926	78.5	94.1	3.041

There is a tendency that the equivariant version of the estimator is more accurate for small samples than the nonequivariant version. The two versions of the estimator are consistent and asymptotically equivalent. When the estimation is precise, the difference between the versions is negligible. When the estimation is imprecise, it is impossible to make inference which version is more accurate.

4.5 Comparison of two estimators for asymptotic covariance matrix in the conic section fitting model

In [13] a conic section fitting model is considered, and two estimators ( Σˆtrue and Σˆsample ) for the asymptotic covariance matrix of the ALS2 estimator are constructed.

The software developed here can be used to make numerical comparison of the estimates of the asymptotic covariance matrices. The data are generated as described in Section 4.1 with 1000 simulations for each set of true points. Thus, the true conic unnecessarily was chosen degenerate. For each simulation, the parameters of the conic section were estimated; its center is found, and two confidence ellipsoids for the center were constructed using two different estimators of the asymptotic covariance matrix.

The sample coverage probability and median (over 1000 ellipsoids) area of the confidence ellipsoids is presented in Table 6. The ellipsoids were constructed for confidence levels 0.8 and 0.95. The area of 95% confidence ellipsoids is displayed in Table 6, and the area of 80% confidence ellipsoid is log20(5)=0.5372 of the area of 95% confidence ellipsoids.

Note that standard errors for coverage probability are 1.3% for 80% confidence ellipsoids and 0.7% for 95% confidence ellipsoids. The simulations do not allow us to make an inference which estimator is better. Thus, Σˆsample -based estimator updated before ignore- Fˆ step is compared with other estimators in simulations in Section 4.4 because of simpler explicit expression for Σˆsample .

A ProofsProof of Proposition 1.

The strong consistency of the estimator follows from [11, Theorem 17]. Under the conditions of Proposition 1, (39) 1nΨn(σ2)→Ψ‾∞a.s. (40) 1nΨ‾n→Ψ‾∞(a.s. in the structural model). By Lemma 5 in [11], lim sup1nβ⊤Ψ‾n′β<0 , which, together with (40), implies β⊤Ψ‾∞′β<0 (see the proof of Theorem 2 in [12]). Then (41) βˆ⊤Ψn′(σˆ2)βˆn‖βˆ‖2=βˆ⊤Ψn′(σ2)βˆn‖βˆ‖2+(σˆ2−σ2)βˆ⊤Ψ1″βˆ‖βˆ‖2→β⊤Ψ‾∞′β‖β‖2a.s. Eventually, the left-hand side of (41) in negative. □

Proof of Proposition 2.

The strong consistency of β˜ follows from (7) and (41). The proof of asymptotic normality and consistency of the estimator of the asymptotic covariance matrix can be obtained by modification of the proofs of Theorem 2 in [12] and Theorem 3 in [13]. □

Proof of Proposition 3.

The conditions of consistency Theorem 1 in [12] can be verified, and the consistency follows.

The most tedious is the condition (42) lim infn→∞1nλmin,2(Ψ‾n)>0. 0.\]]]>

Denote Kj=0010hjkj010hjkj0100,j=1,2. Then K1K1⊤+K2K2⊤ is a positive semidefinite matrix, and det(K1K1⊤+K2K2⊤)=2(h1−h2)4+2(h1−h2)2(k1−k2)2+2(k1−k2)4>0. 0.\]]]> Thus, λmin(K1K1⊤+K2K2⊤)>0 0$]]>.

The matrix K1∑i=1,…,nν(i)=11ξiξi2ξiξi2ξi3ξi2ξi3ξi4K1⊤+K2∑i=1,…,nν(i)=21ξiξi2ξiξi2ξi3ξi2ξi3ξi4K2⊤ is the principal submatrix of Ψ‾n . By the Cauchy interlacing theorem, λmin,2(Ψ‾n)≥λmin ∑j=12Kj∑i=1,…,nν(i)=j1ξiξi2ξiξi2ξi3ξi2ξi3ξi4Kj⊤≥λmin(K1K1⊤+K2K2⊤)minj=1,2λmin∑i=1,…,nν(i)=j1ξiξi2ξiξi2ξi3ξi2ξi3ξi4, and then inequality (42) can easily be proved. □

Proof of Proposition 4.

Proposition 4 follows from Proposition 25 in [11]. The identifiability condition (S5-) in [11] holds because the intersection of a couple of lines and a conic section may be a finite set with not more than four points, a straight line, a straight line and a point outside the line, or the couple of lines and the conic section coincide; in the last case, the coefficients of the equations for the lines and the conic section satisfy relations (12). □

Proofs of Propositions 5, 6, 7, and 8.

Consistence of the “ignore- Fˆ ” estimator follows from the consistency of the ALS2 estimator βˆ and from the continuity of the function lob(β) at the point of the true value of the parameter β . The asymptotic normality of the “ignore- Fˆ ” follows from the asymptotic normality of β˜ and the differentiability of lob(β) at the point βtn . □

Proof of Proposition 9.

The consistency and asymptotic normality of the β˜ estimator, the differentiability of the functional Δ(β) at point βtn (note that Δ(βtn)=0 ), and the convergence 1Δ′(β˜)Σˆβ˜Δ′(β˜)⊤Σˆβ˜Δ′(β˜)⊤→1Δ′(βtn)Σβ˜Δ′(βtn)⊤Σβ˜Δ′(βtn)⊤ imply the convergence and asymptotic normality of the updated estimator β˜1st . Thus, the consistency and asymptotic normality of (kˆ1,1st,hˆ1,1st,kˆ2,1st,hˆ2,1st)⊤ can be proved similarly to those of the “ignore- Fˆ ” estimator. □

References [1]

Ahn, S.J.: Least Squares Orthogonal Distance Fitting of Curves and Surfaces in Space. Springer, Heidelberg (2004). doi:10.1007/b104017

[2]

Bilmes, J.A.: A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical report TR-97-021, International Computer Science Institute (1998)

[3]

Cheng, C.-L., Van Ness, J.W.: Statistical Regression with Measurement Error. Arnold, London (1999). MR1719513

[4]

Chiang, C.L.: On regular best asymptotically normal estimates. Ann. Math. Stat. 27(2), 336–351 (1956). doi:10.1214/aoms/1177728262

MR0089558

[5]

Fazekas, I., Kukush, A., Zwanzig, S.: Correction of nonlinear orthogonal regression estimator. Ukr. Math. J. 56(8), 1308–1330 (2004). MR2136312. doi:10.1007/s11253-005-0059-0

[6]

Kukush, A., Markovsky, I., Van Huffel, S.: Consistent fundamental matrix estimation in a quadratic measurement error model arising in motion analysis. Comput. Stat. Data Anal. 41(1), 3–18 (2002). MR1944689. doi:10.1016/S0167-9473(02)00068-3

[7]

Kukush, A., Markovsky, I., Van Huffel, S.: Correction of nonlinear orthogonal regression estimator. Comput. Stat. Data Anal. 47(1), 123–147 (2004). MR2087933. doi:10.1016/j.csda.2003.10.022

[8]

Markovsky, I., Van Huffel, S., Kukush, A.: On the computation of the multivariate structured total least squares estimator. Numer. Linear Algebra Appl. 11(5–6), 591–608 (2004). MR2067822. doi:10.1002/nla.361

[9]

Repetatska, G.: An improved orthogonal regression estimator for the implicit functional errors-in-variables model. Bull. Kyiv Natl. Taras Shevchenko Univ. Math. Mech. 23, 37–45 (2010)

[10]

Rohatgi, V.K., Székely, G.J.: Sharp inequalities between skewness and kurtosis. Stat. Probab. Lett. 8(4), 296–299 (1989). MR1028986. doi:10.1016/0167-7152(89)90035-7

[11]

Shklyar, S., Kukush, A., Markovsky, I., Van Huffel, S.: On the conic section fitting problem. J. Multivar. Anal. 98(3), 588–624 (2007). MR2293016. doi:10.1016/j.jmva.2005.12.003

[12]

Shklyar, S.V.: Singular asymptotic normality of an estimator in the conic section fitting problem. I. Teor. Imovir. Mat. Stat. 92, 137–150 (2015)

[13]

Shklyar, S.V.: Singular asymptotic normality of an estimator in the conic section fitting problem. II. Teor. Imovir. Mat. Stat. 93, 163–180 (2015)

[14]

Vidal, R., Ma, Y., Sastry, S.: Generalized principal component analysis (GPCA). IEEE Trans. Pattern Anal. Mach. Intell. 27(12), 1945–1959 (2005). doi:10.1109/TPAMI.2005.244

[15]

Waibel, P., Matthes, J., Gröll, L.: Constrained ellipse fitting with center on a line. J. Math. Imaging Vis. 53(3), 364–382 (2015). MR3397105. doi:10.1007/s10851-015-0584-x

[16]

Zelnik-Manor, L., Irani, M.: Multi-view subspace constraints on homographies. In: Proceedings of the 7th IEEE International Conference on Computer Vision, 1999, vol. 2, pp. 710–7152 (1999). doi:10.1109/ICCV.1999.790291