Equivariant adjusted least squares estimator in two-line fitting model

Shklyar, Sergiy

doi:10.15559/16-VMSTA47

Modern Stochastics: Theory and Applications

Equivariant adjusted least squares estimator in two-line fitting model

Volume 3, Issue 1 (2016), pp. 19–45

Sergiy Shklyar

https://doi.org/10.15559/16-VMSTA47

Pub. online: 21 March 2016 Type: Research Article

Open Access

Received
30 January 2016

Revised
19 February 2016

Accepted
19 February 2016

Published
21 March 2016

Abstract

We consider the two-line fitting problem. True points lie on two straight lines and are observed with Gaussian perturbations. For each observed point, it is not known on which line the corresponding true point lies. The parameters of the lines are estimated.

This model is a restriction of the conic section fitting model because a couple of two lines is a degenerate conic section. The following estimators are constructed: two projections of the adjusted least squares estimator in the conic section fitting model, orthogonal regression estimator, parametric maximum likelihood estimator in the Gaussian model, and regular best asymptotically normal moment estimator.

The conditions for the consistency and asymptotic normality of the projections of the adjusted least squares estimator are provided. All the estimators constructed in the paper are equivariant. The estimators are compared numerically.

1 Introduction

1.1 Two-line fitting model

Consider a problem of estimation of two lines by perturbed observations of points that lie on the lines. Let the true points $(\xi _{i},\eta _{i})$ lie on the union of two different lines $\eta =k_{1}\xi +h_{1}$ and $\eta =k_{2}\xi +h_{2}$, that is,

(1)

\[\bigg[\begin{array}{l@{\hskip10.0pt}l}\text{either}\hspace{1em}& \eta _{i}=k_{1}\xi _{i}+h_{1},\\{} \text{or}\hspace{1em}& \eta _{i}=k_{2}\xi _{i}+h_{2},\end{array}\hspace{1em}i=1,2,\dots \hspace{0.1667em}.\]

Let these points be observed with perturbations $(\delta _{i},\varepsilon _{i})$, $i=1,\dots ,n$, that is, the observed points are $(x_{i},y_{i})$, $i=1,\dots ,n$, with

(2)

\[x_{i}=\xi _{i}+\delta _{i},\]

(3)

\[y_{i}=\eta _{i}+\varepsilon _{i}.\]

The perturbations are assumed to be independent and identically normally distributed,

(4)

\[\left(\begin{array}{c}\delta _{i}\\{} \varepsilon _{i}\end{array}\right)\sim N\big(0,{\sigma }^{2}\mathbf{I}\big),\]

where I is the $2\times 2$ identity matrix.

The parameters $k_{1}$, $h_{1}$, $k_{2}$, $h_{2}$, and ${\sigma }^{2}$ are to be estimated.

We consider both functional and structural models. In functional model, the true points are assumed to be nonrandom. In structural model, the true points are assumed to be independent and identically distributed (i.i.d.). The errors $(\delta _{i},\varepsilon _{i})$ are i.i.d. and independent of the true points.

In the structural model, $(\xi _{i},\eta _{i},\delta _{i},\varepsilon _{i})$ are i.i.d. random vectors, and thus, the observed points $(x_{i},y_{i})$ are i.i.d. In the functional model, the observed points are independent, Gaussian, with different means but with common covariance matrix.

Remark 1.

The true lines defined by Eqs. (1) cannot be parallel to the y-axis. In order to avoid overflows during evaluation of the estimators (except of RBAN-moment estimator), another parameterization is used internally: ${\vec{\tau }}^{\top }(\boldsymbol{\zeta }-\boldsymbol{\zeta }_{0})=0$, where $\vec{\tau }$ is a unit vector orthogonal to the line, and $\boldsymbol{\zeta }_{0}$ is a point on the line. The computation of the RBAN-moment estimator (see Section 2.4) is implemented for explicit parameterization only. Computational optimization of the RBAN-moment estimator is a matter of further work.

The explicit parameterization has the advantage that the number of parameters is equal to the dimension of parameter space. (In [12], the second-order equation (5) has six unknown coefficients, but the conic section can be parameterized with five parameters. The parameter space for the parameters of the conic section was the five-dimensional unit sphere in the six-dimensional Euclidean space. Mismatch between the number of parameters and the dimension of the parameter space made the asymptotic covariance matrix of the estimator singular.)

In simulations, the confidence intervals for the coordinates of the intersection point of the two lines are obtained based on the asymptotic covariance matrix for the intersection point. For the projections the ALS2 estimator, that asymptotic covariance matrix can be evaluated without use of explicit line parameterization.

1.2 Conic section fitting model

Let the true points $(\xi _{i},\eta _{i})$ lie on the second-order algebraic curve

(5)

\[A{\xi _{i}^{2}}+2B\xi _{i}\eta _{i}+C{\eta _{i}^{2}}+2D\xi _{i}+2E\eta _{i}+F=0,\hspace{1em}i=1,2,\dots .\]

Hereafter, a second-order algebraic curve is called a “conic section” or a “conic.”

The points are observed with Gaussian perturbations, and the perturbed points are denoted as $(x_{i},y_{i})$. We have the same equations

\[\begin{array}{l}x_{i}=\xi _{i}+\delta _{i},\\{} y_{i}=\eta _{i}+\varepsilon _{i},\end{array}\hspace{1em}\left(\begin{array}{c}\delta _{i}\\{} \varepsilon _{i}\end{array}\right)\sim N\big(0,{\sigma }^{2}\mathbf{I}\big),\]

as (2)–(4) in the two-line fitting model.

The vector of coefficients in (5) is denoted by $\boldsymbol{\beta }={(A,2B,C,2D,2E,F)}^{\top }$. The nonzero vector $\boldsymbol{\beta }$ and the error variance ${\sigma }^{2}$ are the parameters of interest.

Similarly to the two-line fitting model, the functional and the structural models are distinguished.

A couple of lines is a degenerate case of a conic section. Therefore, the conic section fitting model is an extension of the two-line fitting model.

1.3 ALS2 estimator in conic section fitting model

We consider the adjusted least squares (ALS) estimator for unknown ${\sigma }^{2}$. The estimator is constructed in [7]. Introduce the $6\times 6$ symmetric matrix

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \psi (x,y;v)\\{} & \displaystyle \hspace{1em}=\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c@{\hskip10.0pt}c@{\hskip10.0pt}c@{\hskip10.0pt}c}{x}^{4}-6{x}^{2}v+3{v}^{2}& ({x}^{3}-3xv)y& ({x}^{2}-v)({y}^{2}-v)& \ast & \ast & \ast \\{} ({x}^{3}-3xv)y& ({x}^{2}-v)({y}^{2}-v)& x\hspace{0.1667em}({y}^{3}-3yv)& \ast & \ast & \ast \\{} ({x}^{2}-v)({y}^{2}-v)& x\hspace{0.1667em}({y}^{3}-3yv)& {y}^{4}-6{y}^{2}v+3{v}^{2}& \ast & \ast & \ast \\{} {x}^{3}-3xv& ({x}^{2}-v)y& x\hspace{0.1667em}({y}^{2}-v)& {x}^{2}-v& xy& x\\{} ({x}^{2}-v)y& x\hspace{0.1667em}({y}^{2}-v)& {y}^{3}-3yv& xy& {y}^{2}-v& y\\{} {x}^{2}-v& xy& {y}^{2}-v& x& y& 1\end{array}\right).\end{array}\]

Asterisks are typed instead of some entries above the diagonal of a symmetric matrix. The entries of the matrix $\psi (x,y;v)$ are generalized Hermite polynomials in x and y. The matrix $\psi (x,y;v)$ is constructed such that $\operatorname{\mathsf{E}}\psi (x_{i},y_{i};{\sigma }^{2})=\psi (\xi _{i},\eta _{i};0)$ in the functional model and $\psi (\xi _{i},\eta _{i};0)\boldsymbol{\beta }=0$ for the true points and true parameters.

Denote

\[\boldsymbol{\varPsi }_{n}(v)=\sum \limits_{i=1}^{n}\psi (x_{i},y_{i};v).\]

The estimator ${\hat{\sigma }}^{2}$ of the error variance ${\sigma }^{2}$ is obtained from the equation

(6)

\[\lambda _{\min }\big(\boldsymbol{\varPsi }_{n}\big({\hat{\sigma }}^{2}\big)\big)=0.\]

Equation (6) always has a unique nonnegative solution. If $n\ge 6$, then the solution to (6) is positive almost surely.

The matrix $\boldsymbol{\varPsi }_{n}({\hat{\sigma }}^{2})$ is singular. Define the estimator $\hat{\boldsymbol{\beta }}$ of the vector $\boldsymbol{\beta }$ as a nonzero solution to the equation

\[\boldsymbol{\varPsi }_{n}\big({\hat{\sigma }}^{2}\big)\hat{\boldsymbol{\beta }}=0.\]

The strong consistency of the ALS2 estimator is proved in [7] and [11] under somewhat different conditions. The asymptotic normality is proved in [12] for the functional model and in [13] for the structural model. Two consistent estimators of the asymptotic covariance matrix are constructed in [13].

Denote

\[\begin{array}{r@{\hskip0pt}l@{\hskip0pt}r@{\hskip0pt}l}& \displaystyle {\psi ^{\prime }_{v}}(x,y;v)=\frac{\partial }{\partial v}\psi (x,y;v),\hspace{2em}& & \displaystyle {\boldsymbol{\varPsi }^{\prime\prime }_{1}}=\frac{{\partial }^{2}}{\partial {v}^{2}}\psi (x,y;v),\\{} & \hspace{2em}& & \displaystyle {\overline{\boldsymbol{\varPsi }}^{\prime }_{n}}(v)=\frac{\mathrm{d}}{\mathrm{d}v}\boldsymbol{\varPsi }_{n}(v),\\{} & \displaystyle \overline{\boldsymbol{\varPsi }}_{n}=\sum \limits_{i=1}^{n}\psi (\xi _{i},\eta _{i};0),\hspace{2em}& & \displaystyle {\overline{\boldsymbol{\varPsi }}^{\prime }_{n}}=\sum \limits_{i=1}^{n}{\psi ^{\prime }_{v}}(\xi _{i},\eta _{i};0),\\{} & \displaystyle \overline{\boldsymbol{\varPsi }}_{\infty }=\underset{n\to \infty }{\lim }\frac{1}{n}\overline{\boldsymbol{\varPsi }}_{n}=\underset{n\to \infty }{\lim }\frac{1}{n}\boldsymbol{\varPsi }_{n}\big({\sigma }^{2}\big),\hspace{2em}& & \displaystyle {\overline{\boldsymbol{\varPsi }}^{\prime }_{\infty }}=\underset{n\to \infty }{\lim }\frac{1}{n}{\overline{\boldsymbol{\varPsi }}^{\prime }_{n}}=\underset{n\to \infty }{\lim }\frac{1}{n}{\boldsymbol{\varPsi }^{\prime }_{n}}\big({\sigma }^{2}\big).\end{array}\]

Under the conditions of Proposition 1 stated further, the latter limits exist almost surely. See [12] for explicit expressions of the matrices ${\psi ^{\prime }_{v}}(x,y;v)$, ${\boldsymbol{\varPsi }^{\prime\prime }_{1}}$, $\overline{\boldsymbol{\varPsi }}_{\infty }$, and ${\overline{\boldsymbol{\varPsi }}^{\prime }_{\infty }}$. Note that ${\boldsymbol{\varPsi }^{\prime\prime }_{1}}$ is a constant matrix.

Proposition 1.

In the functional model, for all integer $p\ge 0$ and $q\ge 0$ such that $p+q\le 4$, let the following limits exist and be finite:

\[\underset{n\to \infty }{\lim }\frac{1}{n}\sum \limits_{i=1}^{n}{\xi _{i}^{p}}{\eta _{j}^{q}}=:\mu _{p,q},\]

whereas in the structural model, let $\operatorname{\mathsf{E}}{\xi _{1}^{4}}<\infty $ and $\operatorname{\mathsf{E}}{\eta _{1}^{4}}<\infty $. In both models, let $\operatorname{rank}\overline{\boldsymbol{\varPsi }}_{\infty }=5$. Then:

1. The estimator $\hat{\boldsymbol{\beta }}$ is strongly consistent in the following sense:

(7)
\[\min \bigg(\bigg\| \frac{\hat{\boldsymbol{\beta }}}{\| \hat{\boldsymbol{\beta }}\| }-\frac{\boldsymbol{\beta }}{\| \boldsymbol{\beta }\| }\bigg\| ,\hspace{0.2222em}\bigg\| \frac{\hat{\boldsymbol{\beta }}}{\| \hat{\boldsymbol{\beta }}\| }+\frac{\boldsymbol{\beta }}{\| \boldsymbol{\beta }\| }\bigg\| \bigg)\to 0\hspace{1em}\textit{a.s.,}\]

(8)
\[{\hat{\sigma }}^{2}\to {\sigma }^{2}\hspace{1em}\textit{a.s.}\]
2. ${\boldsymbol{\beta }}^{\top }{\overline{\boldsymbol{\varPsi }}^{\prime }_{\infty }}\boldsymbol{\beta }<0$.
3. Eventually, ${\hat{\boldsymbol{\beta }}}^{\top }{\boldsymbol{\varPsi }^{\prime }_{n}}({\hat{\sigma }}^{2})\hat{\boldsymbol{\beta }}<0$.

“Eventually” in the previous statement means that almost surely there exists $n_{0}$ such that ${\hat{\boldsymbol{\beta }}}^{\top }{\boldsymbol{\varPsi }^{\prime }_{n}}({\hat{\sigma }}^{2})\hat{\boldsymbol{\beta }}<0$ for all $n\ge n_{0}$. In other words, almost surely, ${\hat{\boldsymbol{\beta }}}^{\top }{\boldsymbol{\varPsi }^{\prime }_{n}}({\hat{\sigma }}^{2})\hat{\boldsymbol{\beta }}\ge 0$ holds only for finitely many n.

Denote the normalized version of the true parameter

\[\boldsymbol{\beta }_{\mathrm{tn}}=\sqrt{\frac{-1}{{\boldsymbol{\beta }}^{\top }{\overline{\boldsymbol{\varPsi }}^{\prime }_{\infty }}\boldsymbol{\beta }}}\boldsymbol{\beta }.\]

Normalize the estimator of $\boldsymbol{\beta }$ in such a way that ${\widetilde{\boldsymbol{\beta }}}^{\top }{\boldsymbol{\varPsi }^{\prime }_{n}}({\hat{\sigma }}^{2})\widetilde{\boldsymbol{\beta }}=-n$ and ${\boldsymbol{\beta }}^{\top }\widetilde{\boldsymbol{\beta }}\ge 0$. Therefore, denote

(9)

\[\widetilde{\boldsymbol{\beta }}=\left\{\begin{array}{l@{\hskip10.0pt}l}\sqrt{\frac{-n}{{\hat{\boldsymbol{\beta }}}^{\top }{\boldsymbol{\varPsi }^{\prime }_{n}}({\hat{\sigma }}^{2})\hat{\boldsymbol{\beta }}}}\hspace{0.1667em}\hat{\boldsymbol{\beta }}\hspace{1em}& \text{if }{\boldsymbol{\beta }}^{\top }\hat{\boldsymbol{\beta }}\ge 0,\\{} -\sqrt{\frac{-n}{{\hat{\boldsymbol{\beta }}}^{\top }{\boldsymbol{\varPsi }^{\prime }_{n}}({\hat{\sigma }}^{2})\hat{\boldsymbol{\beta }}}}\hspace{0.1667em}\hat{\boldsymbol{\beta }}\hspace{1em}& \text{if }{\boldsymbol{\beta }}^{\top }\hat{\boldsymbol{\beta }}<0.\end{array}\right.\]

Proposition 2.

1. Under the conditions of Proposition 1, the estimator $\widetilde{\boldsymbol{\beta }}$ is a strongly consistent estimator of $\boldsymbol{\beta }_{\mathrm{tn}}={(-{\boldsymbol{\beta }}^{\top }{\overline{\boldsymbol{\varPsi }}^{\prime }_{\infty }}\boldsymbol{\beta })}^{-1/2}\boldsymbol{\beta }$, that is, $\widetilde{\boldsymbol{\beta }}\to \boldsymbol{\beta }_{\mathrm{tn}}$ a.s.

2. In the functional model, for all integer $p\ge 0$ and $q\ge 0$ such that $p+q\le 6$, let the following limits exist and be finite:

\[\underset{n\to \infty }{\lim }\frac{1}{n}\sum \limits_{i=1}^{n}{\xi _{i}^{p}}{\eta _{j}^{q}}=:\mu _{p,q},\]

whereas in the structural model, let $\operatorname{\mathsf{E}}{\xi _{1}^{6}}<\infty $ and $\operatorname{\mathsf{E}}{\eta _{1}^{6}}<\infty $. In both models, let $\operatorname{rank}\overline{\boldsymbol{\varPsi }}_{\infty }=5$. Then the estimator $\hat{\boldsymbol{\theta }}={({\widetilde{\boldsymbol{\beta }}}^{\top }\hspace{-0.1667em},{\hat{\sigma }}^{2})}^{\top }$ is asymptotically normal in the following sense:

(10)

\[\sqrt{n}\left(\begin{array}{c}\widetilde{\boldsymbol{\beta }}-\boldsymbol{\beta }_{\mathrm{tn}}\\{} {\hat{\sigma }}^{2}-{\sigma }^{2}\end{array}\right)\stackrel{\mathsf{d}}{\longrightarrow }{N}(0,\hspace{0.1667em}\varSigma _{\hat{\theta }}),\]

where

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \varSigma _{\hat{\theta }}=\left(\begin{array}{c@{\hskip10.0pt}c}\overline{\boldsymbol{\varPsi }}_{\infty }& {\overline{\boldsymbol{\varPsi }}^{\prime }_{\infty }}\boldsymbol{\beta }_{\mathrm{tn}}\\{} {\boldsymbol{\beta }_{\mathrm{tn}}^{\top }}{\overline{\boldsymbol{\varPsi }}^{\prime }_{\infty }}& \frac{1}{2}{\boldsymbol{\beta }_{\mathrm{tn}}^{\top }}{\boldsymbol{\varPsi }^{\prime\prime }_{1}}\boldsymbol{\beta }_{\mathrm{tn}}\end{array}\right){}^{-1}\mathbf{B}{\left(\begin{array}{c@{\hskip10.0pt}c}\overline{\boldsymbol{\varPsi }}_{\infty }& {\overline{\boldsymbol{\varPsi }}^{\prime }_{\infty }}\boldsymbol{\beta }_{\mathrm{tn}}\\{} {\boldsymbol{\beta }_{\mathrm{tn}}^{\top }}{\overline{\boldsymbol{\varPsi }}^{\prime }_{\infty }}& \frac{1}{2}{\boldsymbol{\beta }_{\mathrm{tn}}^{\top }}{\boldsymbol{\varPsi }^{\prime\prime }_{1}}\boldsymbol{\beta }_{\mathrm{tn}}\end{array}\right)}^{-1},\\{} & \displaystyle \hspace{1em}\mathbf{B}=\underset{n\to \infty }{\lim }\frac{1}{n}\sum \limits_{i=1}^{n}\operatorname{\mathsf{E}}{\boldsymbol{s}_{i}^{}}{\boldsymbol{s}_{i}^{\top }},\\{} & \displaystyle \hspace{1em}\boldsymbol{s}_{i}=\left(\begin{array}{c}\psi (x_{i},y_{i};{\sigma }^{2})\boldsymbol{\beta }_{\mathrm{tn}}\\{} \frac{1}{2}{\boldsymbol{\beta }_{\mathrm{tn}}^{\top }}\psi (x_{i},y_{i};{\sigma }^{2})\boldsymbol{\beta }_{\mathrm{tn}}-\frac{1}{2}\end{array}\right).\end{array}\]

3. Under the conditions of part 2 of Proposition 2, the following estimator of the asymptotic covariance matrix is consistent:

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \widehat{\varSigma }_{\hat{\theta }}(n)={\mathbf{A}}^{-1}(n)\mathbf{B}(n){\mathbf{A}}^{-1}(n),\\{} & \displaystyle \hspace{1em}\mathbf{A}(n)=\left(\begin{array}{c@{\hskip10.0pt}c}\frac{1}{n}\boldsymbol{\varPsi }_{n}({\hat{\sigma }}^{2})& \frac{1}{n}{\boldsymbol{\varPsi }^{\prime }_{n}}({\hat{\sigma }}^{2})\widetilde{\boldsymbol{\beta }}\\{} \frac{1}{n}{\widetilde{\boldsymbol{\beta }}}^{\top }{\boldsymbol{\varPsi }^{\prime }_{n}}({\hat{\sigma }}^{2})& \frac{1}{2}{\widetilde{\boldsymbol{\beta }}}^{\top }{\boldsymbol{\varPsi }^{\prime\prime }_{1}}\widetilde{\boldsymbol{\beta }}\end{array}\right),\hspace{1em}\mathbf{B}(n)=\frac{1}{n}\sum \limits_{i=1}^{n}{\hat{\boldsymbol{s}}_{i}^{}}{\hat{\boldsymbol{s}}_{i}^{\top }},\\{} & \displaystyle \hspace{1em}\hat{\boldsymbol{s}}_{i}=\left(\begin{array}{c}\psi (x_{i},y_{i};{\hat{\sigma }}^{2})\tilde{\boldsymbol{\beta }}\\{} \frac{1}{2}{\tilde{\boldsymbol{\beta }}}^{\top }\psi (x_{i},y_{i};{\hat{\sigma }}^{2})\tilde{\boldsymbol{\beta }}-\frac{1}{2}\end{array}\right),\end{array}\]

that is,

\[\widehat{\varSigma }_{\hat{\theta }}(n)\to \varSigma _{\hat{\theta }}\]

in probability.

1.4 Estimation methods

The methods of fitting an algebraic curve (or surface) to observed points can be classified as follows.

Algebraic distance methods, where the residuals in the equations for the algebraic curve are minimized. For example, the minimum point of the sum of squared residuals ${\sum _{i=1}^{n}}{(A{x_{i}^{2}}+2Bx_{i}y_{i}+C{y_{i}^{2}}+2Dx_{i}+2Ey_{i}+F)}^{2}$ (with some normalizing constraint in order to avoid $A=B=\dots =F=0$) in the conic fitting problem and ${\sum _{i=1}^{n}}{(k_{1}x_{i}+h_{1}-y_{i})}^{2}{(k_{2}x_{i}+h_{2}-y_{i})}^{2}$ in the two-line fitting problem is called the ordinary least squares (OLS) estimator.

The criterion function for the OLS estimator is simple enough and can be adjusted so that the resulting estimator is consistent (under some conditions). Such an estimator is called the adjusted least squares (ALS) estimator. The OLS and ALS estimators are method-of-moments estimators, meaning that the criterion functions for the estimators are polynomials whose coefficients are sample moments of coordinates of the observed points. Hence, the OLS and ALS estimators can be computed efficiently.

In order to obtain parameters of two lines, the observed points are fitted with a conic section, and then the parameters of the conic section are used to obtain the parameters of two lines. There are some papers where this idea is used.

The problem of estimating the fundamental matrix for two-camera view is considered in [6]. The fundamental matrix is a singular matrix whose left and right null-vectors are the coordinates of each camera in the coordinate system of the other camera. Initially, the ALS estimator of the fundamental matrix is evaluated. Then it is projected so that the estimated fundamental matrix is singular.

In [14] the problem of segmentation of a finite-dimensional vector space onto linear subspaces is considered, and the generalized principal component analysis method is introduced. The sample is fitted with an algebraic cone (a set of points that satisfy a homogeneous algebraic equation) by the OLS method. Then subspaces are extracted from the algebraic cone with use of a small learning sample. An application of segmentation of a vector space onto hyperplanes for searching planes on binocular image is given in [16].

In [15] an ellipsoid fitting problem with a constraint such that a center of the ellipsoid lies on a given line is considered. The algebraic distance with embedded constraint is minimized. The analytical (behavioral) properties of the optimization problem are studied. We consider a conic section fitting problem but with different constraint—the conic is degenerated to a couple of straight lines.

Geometric distance methods, where distances between the estimated curve and each point are minimized. The sum of squares of those distances is minimized, and the orthogonal regression (OR) estimator is obtained.

A numerical algorithm for evaluation of the orthogonal regression estimator is presented in monograph [1].

The orthogonal regression is consistent in the single straight line fitting problem [3, Section 1.3.2(a)]. In nonlinear models, the estimator may be inconsistent. There is a one-step correction procedure in explicit and implicit models [5, 9] with application in the ellipsoid fitting model [9]. However, in the two-line fitting model, the correction from [9] is unstable.

Probabilistic methods. They are used to obtain the maximum likelihood (ML) estimator and Bayes estimators.

1.5 Notation

Let $\{A_{n},\hspace{0.2778em}n=1,\hspace{0.1667em}2,\hspace{0.1667em}\dots \}$ be a sequence of random events. The random event $A_{n}$ is said to hold eventually if almost surely there exists $n_{0}$ such that $A_{n}$ occurs for all $n\ge n_{0}$. In other words, the random event $A_{n}$ holds eventually if and only if it does not occur only for finitely many n almost surely.

The estimator $\hat{\beta }$ is called asymptotically normal if $\sqrt{n}(\hat{\beta }-\beta _{\mathrm{true}})\to N(0,\varSigma )$ in distribution, were the asymptotic covariance matrix Σ may be singular, and n is the sample size. This definition differs from the conventional one adopted in asymptotic theory because here only $\sqrt{n}$-asymptotic normality is considered.

Let $\boldsymbol{\zeta }\sim N(\boldsymbol{\mu },\varSigma )$ be a bivariate random vector. Then $E=\{\boldsymbol{z}:{(\boldsymbol{z}-\boldsymbol{\mu })}^{\top }{\varSigma }^{-1}(\boldsymbol{z}-\boldsymbol{\mu })\le 1\}$ is called the 40% ellipsoid of the normal distribution because $\operatorname{\mathsf{P}}(\boldsymbol{\zeta }\in E)\approx 0.3935$. This is the ellipsoid where the probability density function is at least 0.3679 of its maximum.

1.6 Outline

In Section 2, we construct five estimators for parameters of the two line fitting model. In Section 3, we propose two definitions of the equivariance of an estimator and state that all of the five estimators are equivariant. The estimators are compared numerically in Section 4. The proofs are given in Appendix A.

2 Estimators

2.1 ALS2 estimator and its projections

The two-line fitting model is a restriction of the conic section fitting model. A couple of lines defined by the equation $(k_{1}\xi -\eta +h_{1})(k_{2}\xi -\eta +h_{2})=0$ is a degenerate conic section

(11)

\[A{\xi }^{2}+2B\xi \eta +C{\eta }^{2}+2D\xi +2E\eta +F=0,\]

with coefficients

(12)

\[\begin{array}{r@{\hskip0pt}l@{\hskip10pt}r@{\hskip0pt}l}& \displaystyle A=Ck_{1}k_{2},& & \displaystyle 2D=C(k_{1}h_{2}+k_{2}h_{1}),\\{} & \displaystyle 2B=-C(k_{1}+k_{2}),\hspace{1em}& & \displaystyle 2E=-C(h_{1}+h_{2}),\\{} & & & \displaystyle F=Ch_{1}h_{2},\end{array}\]

with a constraint $C\ne 0$.

The conic section ALS2 estimator provides estimation of the error variance ${\sigma }^{2}$ and the coefficients $A,B,\dots ,F$.

Denote by $\nu (i)\in \{1,2\}$ the indicator of a line which the true point $(\xi _{i},\eta _{i})$ belongs to. Equation (1) can be rewritten as

\[\eta _{i}=k_{\nu (i)}\xi _{i}+h_{\nu (i)},\hspace{1em}i=1,2,\dots .\]

The indicator $\nu (i)$ is nonrandom in the functional model, and it is a random variable in the structural model.

Proposition 3.

Let, in the functional model,

\[\sum \limits_{n=1}^{\infty }\frac{{\xi _{n}^{6}}}{{n}^{2}}<\infty ;\]

\[\textit{either}\hspace{1em}k_{1}\ne k_{2}\hspace{1em}\textit{or}\hspace{1em}\left\{\begin{array}{l}h_{1}\ne h_{2},\hspace{1em}\\{} \sup _{n\ge 1}\frac{1}{n}{\textstyle\sum _{i=1}^{n}}{\xi _{i}^{2}}<\infty ;\hspace{2.5pt}\textit{and}\hspace{1em}\end{array}\right.\]

\[\underset{n\to \infty }{\liminf }\lambda _{\min }\left(\frac{1}{n}\sum \limits_{\begin{array}{c} i=1,\dots ,n\\{} \nu (i)=j\end{array}}\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}1& {\xi _{i}^{}}& {\xi _{i}^{2}}\\{} {\xi _{i}^{}}& {\xi _{i}^{2}}& {\xi _{i}^{3}}\\{} {\xi _{i}^{2}}& {\xi _{i}^{3}}& {\xi _{i}^{4}}\end{array}\right)\right)>0\hspace{1em}\textit{for }j=1,\hspace{0.1667em}2.\]

Then the ASL2 estimators $\hat{\boldsymbol{\beta }}$ and ${\hat{\sigma }}^{2}$ are strongly consistent in the sense of (7) and (8).

There are two cases where the structural model is not identifiable. If the common distribution of the true points is concentrated on a straight line and on a single point (presumably not on the line), that is,

(13)

\[\exists \hspace{0.1667em}\text{line}\hspace{0.1667em}\ell \subset {\mathbb{R}}^{2}\hspace{0.2778em}\exists z\in {\mathbb{R}}^{2}:\operatorname{supp}(\xi _{1},\eta _{1})\subset \ell \cup \{z\},\]

then there are many ways to fit the true points with two lines. If the common distribution of the true points is concentrated in four points, that is,

(14)

\[\mathrm{\# }\operatorname{supp}(\xi _{1},\eta _{1})=4,\]

then there are three ways to fit the true points with two lines (unless three of the four points lie on a straight line, which is a particular case of (13)).

Proposition 4.

In the structural model, assume that $\operatorname{\mathsf{E}}|\xi _{1}{|}^{3}<\infty $ and that nonidentifiability conditions (13) and (14) do not hold. Then the ALS2 estimator is strongly consistent in the sense of (7) and (8).

In order to estimate the parameters $k_{1}$, $h_{1}$, $k_{2}$, and $h_{2}$, we can solve Eqs. (12). With ignoring the last equation $F=Ch_{1}h_{2}$, the solution is

(15)

\[k_{1,2}=\frac{-B\pm \sqrt{{B}^{2}-AC}}{C},\]

(16)

\[h_{1}=\frac{2(D+k_{1}E)}{C(k_{2}-k_{1})},\]

(17)

\[h_{2}=\frac{2(D+k_{2}E)}{C(k_{1}-k_{2})}.\]

Substituting the elements of the ALS2 estimator $\hat{\boldsymbol{\beta }}={(\hat{A},2\hat{B},\hat{C},2\hat{D},2\hat{E},\hat{F})}^{\top }$ into the right-hand side of (15)–(17), we obtain an “ignore-$\widehat{F}$” estimator:

(18)

\[\hat{k}_{1,2}=\frac{-\hat{B}\pm \sqrt{{\hat{B}}^{2}-\hat{A}\hat{C}}}{\hat{C}},\]

(19)

\[\hat{h}_{1}=\frac{2(\hat{D}+\hat{k}_{1}\hat{E})}{\hat{C}(\hat{k}_{2}-\hat{k}_{1})},\]

(20)

\[\hat{h}_{2}=\frac{2(\hat{D}+\hat{k}_{2}\hat{E})}{\hat{C}(\hat{k}_{1}-\hat{k}_{2})}.\]

If the conic section estimated by the ALS2 estimator is a hyperbola, then the “ignore-$\widehat{F}$” estimate of the two lines comprises the asymptotes of the hyperbola.

Choose the sign ± in (18) such that $\hat{k}_{1}<\hat{k}_{2}$.

We need the notation

\[{(k_{1},h_{1},k_{2},h_{2})}^{\top }=\operatorname{lob}(\boldsymbol{\beta })\]

for the function that expresses the line parameters $k_{1}$, $h_{1}$, $k_{2}$, $h_{2}$ in elements of $\boldsymbol{\beta }$ and is defined by (15)–(17). With this notation, we can write

\[{(\hat{k}_{1},\hat{h}_{1},\hat{k}_{2},\hat{h}_{2})}^{\top }=\operatorname{lob}(\hat{\boldsymbol{\beta }})=\operatorname{lob}(\widetilde{\boldsymbol{\beta }}).\]

Proposition 5.

In the functional model, assume the following:

\[k_{1}<k_{2},\]

\[\sum \limits_{n=1}^{\infty }\frac{{\xi _{n}^{6}}}{{n}^{2}}<\infty ,\hspace{1em}\textit{and}\]

Then the “ignore-$\widehat{F}$” estimator of the parameters of two lines is strongly consistent, that is,

\[\hat{k}_{j}\to k_{j},\hspace{2em}\hat{h}_{j}\to h_{j},\hspace{1em}j=1,\hspace{0.1667em}2,\]

as $n\to \infty $ almost surely.

Proposition 6.

If in the structural model, $k_{1}<k_{2}$, $\operatorname{\mathsf{E}}|\xi _{1}{|}^{3}<\infty $, and neither condition (13) nor condition (14) holds, then the “ignore-$\widehat{F}$” estimator is consistent.

Now, we state the asymptotic normality of the “ignore-$\widehat{F}$” estimator.

Proposition 7.

In the functional model, assume the following:

• $k_{1}<k_{2}$,
• for $j=1,2$ and $p=0,1,\dots ,6$, the following limits exist and are finite:
\[{\mu _{p}^{(j)}}:=\underset{n\to \infty }{\lim }\frac{1}{n}\sum \limits_{\begin{array}{c}i=1,\dots ,n\\{} \nu (i)=j\end{array}}{\xi _{i}^{p}}.\]
• for $j=1$ and $j=2$, the matrices
\[\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}{\mu _{0}^{(j)}}& {\mu _{1}^{(j)}}& {\mu _{2}^{(j)}}\\{} {\mu _{1}^{(j)}}& {\mu _{2}^{(j)}}& {\mu _{3}^{(j)}}\\{} {\mu _{2}^{(j)}}& {\mu _{3}^{(j)}}& {\mu _{4}^{(j)}}\end{array}\right)\]
are nonsingular.

Then the “ignore-$\widehat{F}$” estimator ${(\hat{k}_{1},\hat{h}_{1},\hat{k}_{2},\hat{h}_{2})}^{\top }$ is asymptotically normal, namely

(21)

\[\sqrt{n}\left(\begin{array}{c}\hat{k}_{1}-k_{1}\\{} \hat{h}_{1}-h_{1}\\{} \hat{k}_{2}-k_{2}\\{} \hat{h}_{2}-h_{2}\end{array}\right)\stackrel{\mathsf{d}}{\longrightarrow }{N}\big(0,K\varSigma _{\tilde{\beta }}{K}^{\top }\big),\]

where $\varSigma _{\tilde{\beta }}$ is the asymptotic covariance matrix of $\tilde{\beta }$, and K is the $4\times 6$ matrix of derivatives of the mapping ${(A,2B,C,2D,2E,F)}^{\top }\mapsto {(k_{1},h_{1},k_{2},h_{2})}^{\top }$ defined in (15)–(17) at the true parameters $\boldsymbol{\beta }_{\mathrm{tn}}$, that is,

\[K=\frac{\mathrm{d}\operatorname{lob}(\boldsymbol{\beta })}{\mathrm{d}{\boldsymbol{\beta }}^{\top }}\bigg|_{\boldsymbol{\beta }=\boldsymbol{\beta }_{\mathrm{tn}}}.\]

The matrix $K\varSigma _{\tilde{\beta }}{K}^{\top }$ is nonsingular.

Proposition 8.

If, in the structural model, $k_{1}<k_{2}$, $\operatorname{\mathsf{E}}{\xi _{1}^{6}}<\infty $, and neither (13) nor (14) holds, then the “ignore-$\widehat{F}$” estimator is asymptotically normal, that is, (21) holds.

Remark 2.

The estimators $\hat{k}_{1}$, $\hat{h}_{1}$, $\hat{k}_{2}$, and $\hat{h}_{2}$ obtained in (18)–(20) do not change if $\hat{A}$, $\widehat{B}$, …, $\widehat{E}$ are multiplied by a common factor. So it does not matter which normalization of $\boldsymbol{\beta }$ is used.

Equation (11) represents a couple of intersecting straight lines if and only if

(22)

\[\left|\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}A& B& D\\{} B& C& E\\{} D& E& F\end{array}\right|=0\hspace{1em}\text{and}\hspace{1em}AC<{B}^{2}.\]

Denote

\[\Delta (\boldsymbol{\beta })=\left|\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}A& B& D\\{} B& C& E\\{} D& E& F\end{array}\right|=ACF+2BDE-A{E}^{2}-C{D}^{2}-{B}^{2}F,\]

\[{\Delta ^{\prime }}(\boldsymbol{\beta })=\frac{\mathrm{d}\Delta (\boldsymbol{\beta })}{\mathrm{d}{\boldsymbol{\beta }}^{\top }}=\big(CF-{E}^{2},DE-BF,AF-{D}^{2},BE-CD,BD-AE,AC-{B}^{2}\big),\]

where the function $\Delta (\boldsymbol{\beta })$ and its derivative ${\Delta ^{\prime }}(\boldsymbol{\beta })$ are evaluated at the point $\boldsymbol{\beta }={(A,2B,C,2D,2E,F)}^{\top }$.

Perform one-step update of the estimator $\widetilde{\boldsymbol{\beta }}$ to make it closer to the surface $\Delta (\boldsymbol{\beta })=0$:

(23)

\[\widetilde{\boldsymbol{\beta }}_{\mathrm{1}\mathrm{st}}=\widetilde{\boldsymbol{\beta }}-\frac{\Delta (\widetilde{\boldsymbol{\beta }})}{{\Delta ^{\prime }}(\widetilde{\boldsymbol{\beta }})\widehat{\varSigma }_{\tilde{\beta }}{\Delta ^{\prime }}{(\widetilde{\boldsymbol{\beta }})}^{\top }}\widehat{\varSigma }_{\tilde{\beta }}{\Delta ^{\prime }}{(\widetilde{\boldsymbol{\beta }})}^{\top }.\]

Then use expressions (15)–(17) to estimate $k_{1},h_{1},k_{2},h_{2}$:

\[{(\hat{k}_{1,\mathrm{1}\mathrm{st}},\hat{h}_{1,\mathrm{1}\mathrm{st}},\hat{k}_{2,\mathrm{1}\mathrm{st}},\hat{h}_{2,\mathrm{1}\mathrm{st}})}^{\top }=\operatorname{lob}(\widetilde{\boldsymbol{\beta }}_{\mathrm{1}\mathrm{st}}).\]

Proposition 9.

Under the conditions of Proposition 7 in the functional model or under the conditions of Proposition 8 in the structural model, the estimator ${(\hat{k}_{1,\mathrm{1}\mathrm{st}},\hat{h}_{1,\mathrm{1}\mathrm{st}},\hat{k}_{2,\mathrm{1}\mathrm{st}},\hat{h}_{2,\mathrm{1}\mathrm{st}})}^{\top }$ is consistent and asymptotically normal, and its asymptotic covariance matrix is equal to

\[K\bigg(\varSigma _{\tilde{\beta }}-\frac{\varSigma _{\tilde{\beta }}{\Delta ^{\prime }}{(\boldsymbol{\beta }_{\mathrm{tn}})}^{\top }{\Delta ^{\prime }}(\boldsymbol{\beta }_{\mathrm{tn}})\varSigma _{\tilde{\beta }}}{{\Delta ^{\prime }}(\boldsymbol{\beta }_{\mathrm{tn}})\varSigma _{\tilde{\beta }}{\Delta ^{\prime }}{(\boldsymbol{\beta }_{\mathrm{tn}})}^{\top }}\bigg){K}^{\top }.\]

Remark 3.

The normalization of the estimator $\hat{\boldsymbol{\beta }}$ affects its asymptotic covariance matrix, and hence has effect on the estimates ${(\hat{k}_{1,\mathrm{1}\mathrm{st}},\hat{h}_{1,\mathrm{1}\mathrm{st}},\hat{k}_{2,\mathrm{1}\mathrm{st}},\hat{h}_{2,\mathrm{1}\mathrm{st}})}^{\top }$. However, the normalization does not affect the asymptotic covariance matrix of ${(\hat{k}_{1,\mathrm{1}\mathrm{st}},\hat{h}_{1,\mathrm{1}\mathrm{st}},\hat{k}_{2,\mathrm{1}\mathrm{st}},\hat{h}_{2,\mathrm{1}\mathrm{st}})}^{\top }$.

2.2 Orthogonal regression estimator

The sum of squared distances between each observed point and the closer of two lines is equal to

(24)

\[Q(k_{1},h_{1},k_{2},h_{2})=\sum \limits_{i=1}^{n}\min \bigg(\frac{{(y_{i}-k_{1}x_{i}-h_{1})}^{2}}{{k_{1}^{2}}+1},\hspace{0.1667em}\frac{{(y_{i}-k_{2}x_{i}-h_{2})}^{2}}{{k_{2}^{2}}+1}\bigg).\]

The orthogonal regression estimator is a Borel-measurable function of observations such that

\[(\hat{k}_{1,\mathrm{OR}},\hat{h}_{1,\mathrm{OR}},\hat{k}_{2,\mathrm{OR}},\hat{h}_{2,\mathrm{OR}})\in \underset{(k_{1},h_{1},k_{2},h_{2})\in {\mathbb{R}}^{4}}{\operatorname{argmax}}Q(k_{1},h_{1},k_{2},h_{2}).\]

In the functional model, the orthogonal regression estimator is the maximum likelihood estimator. However, because the dimension of parameter space grows as the sample size is increasing, the orthogonal regression estimator may be inconsistent.

2.3 Parametric maximum likelihood estimator

The estimator is constructed in the structural model, so it should be called the structural maximum likelihood estimator.

If a Gaussian distribution of a random point $(\xi ,\eta )$ is concentrated on a straight line $\eta =k\xi +h$, then it is a singular normal distribution:

(25)

\[(\xi ,\eta )\sim N\left(\left(\begin{array}{c}\mu _{\xi }\\{} k\mu _{\xi }+h\end{array}\right),\hspace{0.1667em}\left(\begin{array}{c@{\hskip10.0pt}c}{\sigma _{\xi }^{2}}& k{\sigma _{\xi }^{2}}\\{} k{\sigma _{\xi }^{2}}& {k}^{2}{\sigma _{\xi }^{2}}\end{array}\right)\right),\]

where $\mu _{\xi }$ and ${\sigma _{\xi }^{2}}$ are the expectation and variance of the random variable ξ. Note that the covariance matrix ${\sigma _{\xi }^{2}}\big(\begin{array}{c@{\hskip3.33pt}c}1& k\\{} k& {k}^{2}\end{array}\big)$ is singular and positive semidefinite.

If the distribution of a random point $(\xi _{i},\eta _{i})$ is concentrated on two straight lines $\eta =k_{1}\xi +h_{1}$ and $\eta =k_{2}\xi +h_{2}$ and the distribution on each line is Gaussian, then, due to (25), the conditional distributions are

\[\big[\left(\xi _{i},\eta _{i}\right)\hspace{0.2778em}\mid \hspace{0.2778em}\hspace{-0.1667em}\nu (i)=j\big]\sim N\left(\left(\begin{array}{c}\mu _{j\xi }\\{} k_{j}\mu _{j\xi }+h_{j}\end{array}\right),\hspace{0.1667em}\left(\begin{array}{c@{\hskip10.0pt}c}{\sigma _{j\xi }^{2}}& k_{j}{\sigma _{j\xi }^{2}}\\{} k_{j}{\sigma _{j\xi }^{2}}& {k_{j}^{2}}{\sigma _{j\xi }^{2}}\end{array}\right)\right)=N(\boldsymbol{\mu }_{j},\varSigma _{0j})\]

for $j=1,\hspace{0.1667em}2$. The matrices $\varSigma _{0j}$ are positive semidefinite and singular, that is, $\lambda _{\min }(\varSigma _{01})=\lambda _{\min }(\varSigma _{02})=0$, and the points $\boldsymbol{\mu }_{j}$ are the centers of Gaussian distribution of the points on each line.

The distribution of $(\xi _{i},\eta _{i})$ is a mixture of two singular normal distributions

(26)

\[\left(\begin{array}{c}\xi _{i}\\{} \eta _{i}\end{array}\right)\sim \text{mixture of}\left\{\begin{array}{l@{\hskip10.0pt}l}N(\boldsymbol{\mu }_{1},\varSigma _{01})\hspace{1em}& \text{with weight }p,\\{} N(\boldsymbol{\mu }_{2},\varSigma _{02})\hspace{1em}& \text{with weight }1-p,\end{array}\right.\]

where $p=\operatorname{\mathsf{P}}(\nu (i)=1)=\operatorname{\mathsf{P}}(\nu (1)=1)$ is the probability that the point $(\xi _{i},\eta _{i})$ lies of the first line.

The distribution of the observed points is also a mixture of two Gaussian distributions

(27)

\[\left(\begin{array}{c}x_{i}\\{} y_{i}\end{array}\right)\sim \text{mixture of}\left\{\begin{array}{l@{\hskip10.0pt}l}N(\boldsymbol{\mu }_{1},\varSigma _{1})\hspace{1em}& \text{with weight }p,\\{} N(\boldsymbol{\mu }_{2},\varSigma _{2})\hspace{1em}& \text{with weight }1-p,\end{array}\right.\]

with $\varSigma _{j}=\varSigma _{0j}+{\sigma }^{2}\mathbf{I}$, where ${\sigma }^{2}$ is the error variance; see (4). Note that $\lambda _{\min }(\varSigma _{1})=\lambda _{\min }(\varSigma _{2})={\sigma }^{2}$.

The likelihood function for the sample of points with a mixture of two normal distributions is

(28)

\[L(p,\boldsymbol{\mu }_{1},\varSigma _{1},\boldsymbol{\mu }_{2},\varSigma _{2})=\prod \limits_{i=1}^{n}\big(p\phi _{N(\mu _{1},\varSigma _{1})}(x_{i},y_{i})+(1-p)\phi _{N(\mu _{1},\varSigma _{1})}(x_{i},y_{i})\big),\]

where

\[\phi _{N(\mu ,\varSigma )}(x,y)=\frac{1}{2\pi \sqrt{\det \varSigma }}\exp \bigg\{-\frac{1}{2}{\left(\left(\begin{array}{c}x\\{} y\end{array}\right)-\mu \right)}^{\top }{\varSigma }^{-1}\left(\left(\begin{array}{c}x\\{} y\end{array}\right)-\mu \right)\bigg\}\]

is the density of a bivariate normal distribution.

One method of evaluating the maximum likelihood estimator is as follows:

1. Find the point of conditional minimum
\[(\widehat{\boldsymbol{\mu }}_{1},\widehat{\varSigma }_{1},\widehat{\boldsymbol{\mu }}_{2},\widehat{\varSigma }_{2})=\underset{\begin{array}{c} \mu _{1},\varSigma _{1},\mu _{2},\varSigma _{2}\\{} \mathrm{such}\hspace{2.5pt}\mathrm{that}\hspace{2.5pt}\lambda _{\min }(\varSigma _{1})=\lambda _{\min }(\varSigma _{2})\end{array}}{\operatorname{argmin}}\underset{p\in [0,1]}{\min }L(p,\boldsymbol{\mu }_{1},\varSigma _{1},\boldsymbol{\mu }_{2},\varSigma _{2}).\]
2. Set

(29)
\[{\hat{\sigma }}^{2}=\lambda _{\min }(\varSigma _{1})=\frac{1}{2}\big(\hat{\sigma }_{1xx}+\hat{\sigma }_{1yy}-\sqrt{{(\hat{\sigma }_{1xx}-\hat{\sigma }_{1yy})}^{2}+4{\hat{\sigma }_{1xy}^{2}}}\hspace{0.1667em}\big).\]
Here $\hat{\sigma }_{jxx}$, $\hat{\sigma }_{jxy}$, and $\hat{\sigma }_{jyy}$ are the entries of the matrix $\widehat{\varSigma }_{j}$, and $\hat{\mu }_{jx}$ and $\hat{\mu }_{jy}$ are the elements of the vector $\widehat{\boldsymbol{\mu }}_{j}$:
\[\widehat{\boldsymbol{\mu }}_{j}=\left(\begin{array}{c}\hat{\mu }_{jx}\\{} \hat{\mu }_{jy}\end{array}\right),\hspace{2em}\widehat{\varSigma }_{j}=\left(\begin{array}{c@{\hskip10.0pt}c}\hat{\sigma }_{jxx}& \hat{\sigma }_{jxy}\\{} \hat{\sigma }_{jxy}& \hat{\sigma }_{jyy}\end{array}\right).\]
3. Find the estimates $\hat{k}_{1}$, $\hat{h}_{1}$, $\hat{k}_{2}$, $\hat{h}_{2}$ from the equations
\[\widehat{\boldsymbol{\mu }}_{j}=\left(\begin{array}{c}\mu _{jx}\\{} \hat{k}_{j}\mu _{jx}+\hat{h}_{j}\end{array}\right),\hspace{2em}\widehat{\varSigma }_{j}=\left(\begin{array}{c@{\hskip10.0pt}c}{\hat{\sigma }_{j\xi }^{2}}+{\hat{\sigma }}^{2}& \hat{k}_{j}{\hat{\sigma }_{j\xi }^{2}}\\{} \hat{k}_{j}{\hat{\sigma }_{j\xi }^{2}}& {\hat{k}_{j}^{2}}{\hat{\sigma }_{j\xi }^{2}}+{\hat{\sigma }}^{2}\end{array}\right),\]
that is, set

(30)
\[\hat{k}_{j}=\frac{\hat{\sigma }_{jxy}}{\hat{\sigma }_{jxx}-{\hat{\sigma }}^{2}},\]

(31)
\[\hat{h}_{j}=\hat{\mu }_{jy}-\hat{k}_{j}\hat{\mu }_{jx}.\]

The denominator $\hat{\sigma }_{jxx}-{\hat{\sigma }}^{2}$ may be equal to 0 with some positive probability. Occurrence of this event means that the estimated figure is a straight line and a single point outside the line rather than two straight lines.

In order to make the statement of consistency easier, assume that $k_{1}<k_{2}$ and choose the estimator such that $\hat{k}_{1}\le \hat{k}_{2}$.

2.4 RBAN moment estimator

The regular best asymptotically normal (RBAN) estimators were developed by Chiang [4]. Our RBAN moment estimator differs from the original RBAN so that not only the observed points $(x_{i},y_{i})$, but also monomials ${x_{i}^{p}}{y_{i}^{q}}$, $p+q\le 4$, are averaged.

Introduce the 14-dimensional vectors whose elements are the monomials of coordinates of observed points:

(32)

\[\begin{array}{r@{\hskip0pt}l}\displaystyle m(x,y)& \displaystyle ={\big({x}^{4},{x}^{3}y,{x}^{2}{y}^{2},x{y}^{3},{y}^{4},{x}^{3},{x}^{2}y,x{y}^{2},{y}^{3},{x}^{2},xy,{y}^{2},x,y\big)}^{\top },\\{} \displaystyle \boldsymbol{m}_{i}& \displaystyle =m(x_{i},y_{i}).\end{array}\]

Evaluate the average and sample covariance matrix of the vectors $\boldsymbol{m}_{i}$:

\[\overline{\boldsymbol{m}}=\frac{1}{n}\sum \limits_{i=1}^{n}\boldsymbol{m}_{i},\hspace{2em}\varSigma _{m}=\frac{1}{n}\sum \limits_{i=1}^{n}(\boldsymbol{m}_{i}-\overline{\boldsymbol{m}}){(\boldsymbol{m}_{i}-\overline{\boldsymbol{m}})}^{\top }.\]

Denote

\[f_{1}\big(k,h,{\sigma }^{2};{(\mu _{p})_{q=1}^{4}}\big)=\operatorname{\mathsf{E}}m(\xi +\delta ,k\xi +h+\varepsilon ),\]

where ξ, δ, and ε are independent random variables such that

\[\operatorname{\mathsf{E}}{\xi }^{q}=\mu _{q}\hspace{1em}\text{and}\hspace{1em}{(\delta ,\varepsilon )}^{\top }\sim N(0,{\sigma }^{2}\mathbf{I}).\]

Basically, the function $f_{1}$ is defined for all $\mu _{p}$, $p=1,\dots ,4$, that comprise possible 4-tuples of moments of a random variable, that is, satisfy

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \hspace{2em}\hspace{2em}\hspace{2em}\hspace{2em}\hspace{2em}\hspace{2em}\hspace{2em}\mu _{2}-{\mu _{1}^{2}}\ge 0,\hspace{2em}\\{} & \displaystyle \big(\mu _{4}\hspace{0.1667em}-\hspace{0.1667em}4\mu _{3}\mu _{1}\hspace{0.1667em}+\hspace{0.1667em}6\mu _{2}{\mu _{1}^{2}}\hspace{0.1667em}-\hspace{0.1667em}3{\mu _{1}^{4}}\big)\big(\mu _{2}\hspace{0.1667em}-\hspace{0.1667em}{\mu _{1}^{2}}\big)\hspace{0.1667em}-\hspace{0.1667em}{\big(\mu _{3}\hspace{0.1667em}-\hspace{0.1667em}3\mu _{2}\mu _{1}\hspace{0.1667em}+\hspace{0.1667em}2{\mu _{1}^{3}}\big)}^{2}\hspace{0.1667em}-\hspace{0.1667em}{\big(\mu _{2}-\hspace{0.1667em}{\mu _{1}^{2}}\big)}^{3}\hspace{0.1667em}\ge \hspace{0.1667em}0;\hspace{2em}\end{array}\]

see [10]. However, since the elements of the vector-function $f_{1}$ are polynomials of its arguments, it can be extended to ${\mathbb{R}}^{7}$.

Denote

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle f_{2}\big(k_{1},h_{1},k_{2},h_{2},{\sigma }^{2};p,{\big({\mu _{q}^{(j)}}\big)_{j=1,}^{2}}{_{q=1}^{4}}\big)\\{} & \displaystyle \hspace{1em}=pf_{1}\big(k_{1},h_{1},{\sigma }^{2};{\big({\mu _{q}^{(1)}}\big)_{q=1}^{4}}\big)+(1-p)f_{1}\big(k_{2},h_{2},{\sigma }^{2};{\big({\mu _{q}^{(2)}}\big)_{q=1}^{4}}\big).\end{array}\]

In the structural model,

\[\operatorname{\mathsf{E}}\boldsymbol{m}_{i}=f_{2}\big(k_{1},h_{1},k_{2},h_{2},{\sigma }^{2};\operatorname{\mathsf{P}}\big(\nu (1)=1\big),{\big(\operatorname{\mathsf{E}}\big[{\xi _{1}^{q}}\mid \nu (1)=j\big]\big)_{j=1,}^{2}}{_{q=1}^{4}}\big).\]

Consider the equation

(33)

\[f_{2}\big(\hat{k}_{1},\hat{h}_{1},\hat{k}_{2},\hat{h}_{2},{\hat{\sigma }}^{2};\hat{p},{\big({\hat{\mu }_{q}^{(j)}}\big)_{j=1,}^{2}}{_{q=1}^{4}}\big)=\overline{\boldsymbol{m}}.\]

It is a system of 14 equations in 14 variables. If (33) has a solution, then the moment estimator can be defined as one of the solutions. However, (33) may have no solution.

In the rest of Section 2.4, ${\mu _{\bullet }^{(\bullet )}}={({\mu _{q}^{(j)}})_{j=1,}^{2}}{_{q=1}^{4}}$ is a $2\times 4$ matrix.

The estimator is defined as a point where ${(f_{2}(\dots )-\overline{\boldsymbol{m}})}^{\top }{\varSigma _{m}^{-1}}(f_{2}(\dots )-\overline{\boldsymbol{m}})$ attains its minimum:

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \big(& \displaystyle \hat{k}_{1},\hat{h}_{1},\hat{k}_{2},\hat{h}_{2},{\hat{\sigma }}^{2}\big)\\{} & \displaystyle =\underset{k_{1},\dots ,{\sigma }^{2}}{\operatorname{argmin}}\underset{p,{\mu _{\bullet }^{(\bullet )}}}{\min }{\big(f_{2}\big(k_{1},\dots ,{\sigma }^{2};p,{\mu _{\bullet }^{(\bullet )}}\big)\hspace{0.1667em}-\hspace{0.1667em}\overline{\boldsymbol{m}}\big)}^{\top }{\varSigma _{m}^{-1}}\big(f_{2}\big(k_{1},\dots ,{\sigma }^{2};p,{\mu _{\bullet }^{(\bullet )}}\big)\hspace{0.1667em}-\hspace{0.1667em}\overline{\boldsymbol{m}}\big).\end{array}\]

This minimization problem is similar to that in Theorem 6 in [4]. The minimum

\[\underset{p\in \mathbb{R}}{\min }\underset{{\mu _{\bullet }^{(\bullet )}}\in {\mathbb{R}}^{2\times 4}}{\min }{\big(f_{2}\big(k_{1},\dots ,{\sigma }^{2};p,{\mu _{\bullet }^{(\bullet )}}\big)-\overline{\boldsymbol{m}}\big)}^{\top }{\varSigma _{m}^{-1}}{\big(f_{2}\big(k_{1},\dots ,{\sigma }^{2};p,{\mu _{\bullet }^{(\bullet )}}\big)-\overline{\boldsymbol{m}}\big)}^{\top }\]

can be evaluated explicitly, and this allows us to reduce the dimension of minimization problem. The reduction of dimension of the optimization problem was used, for example, in [8].

The routines evaluating the RBAN-moment estimator and the estimator for its covariance matrix are developed without rigid theoretical basis; see Section 4.3.

3 Equivariance

3.1 Two definitions of equivariance

The similarity transformation of ${\mathbb{R}}^{2}$ is

(34)

\[g(\boldsymbol{z})=K\mathbf{U}\boldsymbol{z}+\Delta \boldsymbol{z},\hspace{1em}\boldsymbol{z}\in {\mathbb{R}}^{2},\]

where U is an orthogonal matrix, $K\ne 0$ is a scaling coefficient, and $\Delta \boldsymbol{z}\in {\mathbb{R}}^{2}$ is an intercept.

The transformation of a sample of points acting elementwise is also denoted $g(Z)$: if $Z=\{\boldsymbol{z}_{i},\hspace{0.2778em}i=1,\dots ,n\}$, then $g(Z)=\{g(\boldsymbol{z}_{i}),\hspace{0.2778em}i=1,\dots ,n\}$.

Hereafter, we use vector notation: the observed points are denoted $\boldsymbol{z}_{i}={(x_{i},y_{i})}^{\top }$, and the true points are denoted $\boldsymbol{\zeta }_{i}={(\xi _{i},\eta _{i})}^{\top }$.

The underlying statistical structure is $({\mathbb{R}}^{n\times 2},\mathcal{B}({\mathbb{R}}^{n\times 2}),P_{Z\mid \theta },\theta \in \varTheta )$, where $Z\in {\mathbb{R}}^{n\times 2}$ is the observed sample, $Z=\{\boldsymbol{z}_{i},\hspace{0.2778em}i=1,\dots ,n\}$, $\mathcal{B}({\mathbb{R}}^{n\times 2})$ is the Borel σ-field, and θ is a parameter that uniquely identifies the distribution of the observed points; $\theta =(\boldsymbol{\zeta }_{1},\dots ,\boldsymbol{\zeta }_{n};{\sigma }^{2})$ in the functional model, and $\theta =(P_{\zeta };{\sigma }^{2})$ in the structural model. Here $\boldsymbol{\zeta }_{1},\dots ,\boldsymbol{\zeta }_{n}$ are points located on two strait lines, and $P_{\zeta }$ is a probability measure concentrated on two straight lines.

The statistical structure is invariant with respect to transformation g if the change of the probability measure induced by the transformation of the sample can be obtained by some transformation $\tilde{g}$ of parameters, that is, if there exists a bijection $\tilde{g}:\varTheta \to \varTheta $ such that

\[\forall \theta \in \varTheta :\hspace{1em}P_{g(Z)\mid \theta }=P_{Z\mid \tilde{g}(\theta )}.\]

Here $P_{g(Z)\mid \theta }$ is the induced probability measure; it is sometimes denoted $P_{g(Z)\mid \theta }=P_{Z\mid \theta }{g}^{-1}$.

The statistical structure is similarity invariant if it is invariant with respect to all similarity transformations of the form (34).

In order to become similarity invariant, the underlying statistical structure needs some extension. We assume that the true points lie on two lines, which may be parallel to the y-axis. The following restrictions do not ruin the invariance:

• The true lines $\ell _{1}$ and $\ell _{2}$ intersect each other but do not coincide.
• The true points $\boldsymbol{\zeta }_{1}\dots ,\boldsymbol{\zeta }_{n}$ in the functional model or the set $\operatorname{supp}(P_{\zeta })$ where the true points are concentrated in the structural model can be covered with two lines uniquely. In the structural model, this means that the nonidentifiability conditions (13) and (14) do not hold.

With these restrictions, the statistical structure is invariant with

\[\tilde{g}\big(\boldsymbol{\zeta }_{1},\dots ,\boldsymbol{\zeta }_{n};{\sigma }^{2}\big)=\big(g(\boldsymbol{\zeta }_{1}),\dots ,g(\boldsymbol{\zeta }_{n});{K}^{2}{\sigma }^{2}\big)\]

in the functional model and

\[\tilde{g}\big(P_{\zeta };{\sigma }^{2}\big)=\big(P_{g(\zeta )};{K}^{2}{\sigma }^{2}\big)\]

in the structural model.

Let $\ell \ell (\theta )$ and $\operatorname{sigma2}(\theta )$ be functions that extract the parameters of interest. If, in the functional model, $\theta =(\boldsymbol{\zeta }_{1},\dots ,\boldsymbol{\zeta }_{n};{\sigma }^{2})$ and points $\boldsymbol{\zeta }_{1},\dots ,\boldsymbol{\zeta }_{n}$ lie on the lines $\ell _{1}$ and $\ell _{2}$ or if, in the structural model, $\theta =(P_{\zeta };{\sigma }^{2})$ and the probability measure is concentrated on the union of two lines $\ell _{1}\cup \ell _{2}$, then $\ell \ell (\theta )=\{\ell _{1},\ell _{2}\}$. If $\theta =(\dots ;{\sigma }^{2})$, then $\operatorname{sigma2}(\theta )={\sigma }^{2}$.

We treat $\{\ell _{1},\ell _{2}\}$ as an unordered couple, that is, $\{\ell _{1},\ell _{2}\}=\{\ell _{2},\ell _{1}\}$.

The transformation of the lines parameters and the transformation of ${\sigma }^{2}$ do not interfere each other, and these transformations are not interfered by a particular location or distribution of true points on the lines, that is, the parameter transformation $\tilde{g}$ is such that there exists transformations $\tilde{g}_{\ell \ell }$ and $\tilde{g}_{{\sigma }^{2}}$ such that

\[\ell \ell \big(\tilde{g}(\theta )\big)=\tilde{g}_{\ell \ell }\big(\ell \ell (\theta )\big),\]

\[\operatorname{sigma2}\big(\tilde{g}(\theta )\big)=\tilde{g}_{{\sigma }^{2}}\big(\operatorname{sigma2}(\theta )\big).\]

These transformations are

(35)

\[\tilde{g}_{\ell \ell }\big(\{\ell _{1},\ell _{2}\}\big)=\big\{g(\ell _{1}),g(\ell _{2})\big\},\]

\[\tilde{g}_{{\sigma }^{2}}\big({\sigma }^{2}\big)={K}^{2}{\sigma }^{2}.\]

The estimator is called equivariant with respect to the transformation g if, when the data are transformed, the estimator follows the inducing transformation of parameters. The estimator $\widehat{\ell \ell }(Z)$ for two lines and the estimator ${\hat{\sigma }}^{2}(Z)$ for error variance are equivariant with respect to similarity transformation g if

(36)

\[\widehat{\ell \ell }\big(g(Z)\big)=\tilde{g}_{\ell \ell }\big(\widehat{\ell \ell }(Z)\big),\]

(37)

\[{\hat{\sigma }}^{2}\big(g(Z)\big)=\tilde{g}_{{\sigma }^{2}}\big({\hat{\sigma }}^{2}(Z)\big).\]

The estimator is called similarity equivariant if it is equivariant with respect to any similarity transformation g.

In a fitting problem, an estimator for a “true figure” is called fitting equivariant with respect to transformation $g(\boldsymbol{z})$, $g:\mathbb{R}\to \mathbb{R}$ if, when the sample is transformed, the estimated “true figure” follows the same transformation g. An estimator is called similarity fitting equivariant if it is fitting equivariant with respect to any similarity transformation.

In the two-line fitting problem, denote by $\cup \{\ell _{1},\ell _{2}\}=\ell _{1}\cup \ell _{2}$ the union of a pair of two lines. An estimator $\widehat{\ell \ell }(Z)$ is similarity fitting equivariant if and only if for any similarity transformation $g(\boldsymbol{z})$,

(38)

\[\cup \widehat{\ell \ell }\big(g(Z)\big)=g\big(\cup \widehat{\ell \ell }(Z)\big).\]

The similarity fitting equivariant estimator depends on geometry of the plane and does not depend on the Cartesian coordinate system used.

Because of (35), in the two-line fitting model, the estimator for two lines $\widehat{\ell \ell }(Z)$ is similarity equivariant if and only if it is similarity fitting equivariant.

3.2 Similarity equivariance of the five estimators

Some troubles, which may arise during estimation, are not addressed yet.

• The estimation may fail with small positive probability. For example, the conic section estimated with the ALS2 estimator is an ellipse with some positive probability, and if it is, then the “ignore-$\widehat{F}$” estimator fails. (If the estimator is consistent, then the failure probability tends to 0 as $n\to \infty $).
• The estimation may fail, for example, because the estimated line should be parallel to the y-axis, but the estimating procedure does not handle such case.
• The optimization problem may have multiple extremal points. For the ALS2 estimator, it may occur that $\dim \{\beta :\varPsi _{n}({\hat{\sigma }}^{2})\beta =0\}>1$.

In order to define the equivariance of an unreliable estimator, we allow that the estimators fail simultaneously in both sides of (36), (37), or (38). Also, we allow that for fixed similarity transformation $g(\boldsymbol{z})$, equation (36), (37), or (38) does not hold with probability 0.

The equivariance of the ALS2 estimator in the conic section fitting problem is verified in [11, Section 5.5] (see Theorem 30 there for similarity fitting equivariance). That implies the equivariance of the “ignore-$\widehat{F}$” estimator.

In order to make the updated before ignore-$\widehat{F}$ step estimator equivariant, we use normalization of the ALS2 estimator (9) rather than $\| \hat{\boldsymbol{\beta }}\| =1$.

The orthogonal regression estimator and the parametric maximum likelihood estimator are maximum likelihood estimators, but in different models. Thus, they are equivariant.

The criterion function for the RBAN-moment estimator is similarity invariant. This means that the criterion function does not change when the data sample follows a similarity transformation and the parameters follow the inducing transformation. Thus, the RBAN-moment estimator is equivariant.

3.3 An example of equivariant but not fitting equivariant estimator

Consider a further restriction of the mixture-of-two-normal-distributions model from Section 2.3. Assume that covariance matrices of $\varSigma _{1}$ and $\varSigma _{2}$ have the same diagonal entries but additive inverse off-diagonal entries:

\[\varSigma _{1}=\left(\begin{array}{c@{\hskip10.0pt}c}{\sigma _{\xi }^{2}}+{\sigma }^{2}& -k{\sigma _{\xi }^{2}}\\{} -k{\sigma _{\xi }^{2}}& {k}^{2}{\sigma _{\xi }^{2}}+{\sigma }^{2}\end{array}\right)\hspace{1em}\text{and}\hspace{1em}\varSigma _{2}=\left(\begin{array}{c@{\hskip10.0pt}c}{\sigma _{\xi }^{2}}+{\sigma }^{2}& k{\sigma _{\xi }^{2}}\\{} k{\sigma _{\xi }^{2}}& {k}^{2}{\sigma _{\xi }^{2}}+{\sigma }^{2}\end{array}\right).\]

The statistical structure is invariant in scaling of the y-coordinate, $(x_{\mathrm{new}},y_{\mathrm{new}})=(x_{\mathrm{old}},\hspace{0.2222em}ry_{\mathrm{old}})$, $r>0$. This transformation maps the lines $y=-kx+h_{1}$ and $y=kx+h_{2}$ onto the lines $y=-rkx+rh_{1}$ and $y=rkx+rh_{2}$, respectively. The maximum likelihood estimator in this model is equivariant. However, this equivariance is somewhat strange. The transformation of parameters that induces the scaling of the y-coordinate of the observed points does not induce the same transformation of the true points nor the same mapping of the true lines. The estimated lines follow the transformation of parameters rather than the transformation of observed points. This is illustrated in Fig. 1.

Fig. 1.

Two samples, one of points $(x,y_{\mathrm{old}})$ (the unmodified sample) and one of points $(x,y_{\mathrm{new}})=(x,\frac{1}{2}y_{\mathrm{old}})$ (the shrunken-in-y sample), are fitted with two lines (the estimated lines are the solid lines on the figures). The estimated lines for the unmodified sample (blue solid lines on the left figure) when scaled with the same transformation as the observed points are scaled (blue dashed line on the right figure) do not coincide with the actually estimated lines for the shrunken-in-y sample (red solid lines on the right figure). The ellipsoids are the 40% ellipsoids of the estimated normal distributions (the compound distributions of the estimated mixture distributions) (color figure online)

Let $k_{\mathrm{old}}$ be the true value of the parameter k before the transformation. Then after the transformation, the value of the parameter is

\[k_{\mathrm{new}}=\frac{t+\sqrt{{t}^{2}+4{r}^{2}{k_{\mathrm{old}}^{2}}{\sigma _{\xi \hspace{0.1667em}\mathrm{old}}^{4}}}}{2r{k_{\mathrm{old}}^{}}{\sigma _{\xi \hspace{0.1667em}\mathrm{old}}^{2}}}\]

with $t=({r}^{2}{k_{\mathrm{old}}^{2}}-1)\hspace{0.1667em}{\sigma _{\xi \hspace{0.1667em}\mathrm{old}}^{2}}+({r}^{2}-1)\hspace{0.1667em}{\sigma _{\mathrm{old}}^{2}}$. If $0<{r}^{2}\ne 1$, $k_{\mathrm{old}}\ne 0$, and ${\sigma }^{2}>0$, then $k_{\mathrm{new}}\ne rk_{\mathrm{old}}$. Hence, the maximum likelihood estimator is not fitting equivariant with respect to scaling of the y-coordinate here.

4 Simulations

4.1 Simulation setup

A sample of the true points $(\xi _{i},\eta _{i})$, $i=1,\dots ,n$, is generated from a random distribution concentrated on (a subset of) two lines. Three distributions of the true points are used; see Fig. 2:

• a mixture of two singular normal distributions,
• a discrete distribution,
• a uniform distribution on two line segments.

Fig. 2.

Three distributions of the true points: a mixture of two singular normal distributions, a discrete distribution, and a uniform distribution on two line segments. For the first case, a sample of 1000 points is plotted, whereas for the second and third cases, the support of the distribution of the true points is plotted. For the first case, the distribution of the observed points is a mixture of normal distributions, and 40% ellipsoids for its components are plotted

These three distributions of true points are concentrated on the same two lines

\[4y=1-3x\hspace{1em}\text{and}\hspace{1em}12y=16x+5,\]

which intersect one another at the point $(-0.08,0.31)$.

For the same sample of true points $\{(\xi _{i},\eta _{i}),\hspace{0.2778em}i=1,\dots ,n\}$, 100 samples of the measurement errors $\{(\delta _{i},\varepsilon _{i}),\hspace{0.2778em}i=1,\dots ,n\}$, ${(\delta _{i},\varepsilon _{i})}^{\top }\sim N(0,{\sigma }^{2})$, are simulated, and 100 samples of the observed points $(x_{i},y_{i})$ are obtained; see (2) and (3). For each sample of the observed points, the estimates of the parameters of the true lines were evaluated with the following five methods: two ALS2-based estimators (the ignore-$\widehat{F}$ estimator and the estimator with one-step update of the ALS2 estimator before the ignore-$\widehat{F}$ step), the orthogonal regression estimator, the parametric maximum likelihood estimator, and the RBAN moment estimator.

For each estimated couple of lines, the point of their intersection is found. The 100 estimates of intersection points are averaged, and their sample standard deviations are evaluated. For the ALS2-based estimators and the RBAN moment estimator, the standard errors of the estimators are also evaluated.

4.2 Notes on computation of particular estimators

For computation of the orthogonal regression estimator, the k-means method is used. Initially, two lines were chosen randomly. Then classification and mean steps are alternated. On the classification step, the observed points are split into two clusters based on which line is closer to the point. (The first cluster contains all the observed points that are closer to the first line than to the second line, and the second cluster contains the other observed points.) On the means step, each cluster is fitted with a straight line by the orthogonal regression method (the two lines are updated). The algorithm is completed when the classification step does now change the clusters. The obtained parameters of the two lines deliver a local minimum to the criterion function $Q(k_{1},h_{1},k_{2},h_{2})$ (24). Trying to obtain the global minimum, the algorithm is restarted several times with different initial two lines.

For computation of the parametric maximum likelihood estimator, the expectation–maximization algorithm [2] is used. The equation for optimization problem of finding a minimum of the likelihood function $L(p,\boldsymbol{\mu }_{1},\varSigma _{1},\boldsymbol{\mu }_{2},\varSigma _{2})$ (28) such that $\lambda _{\min }(\varSigma _{1})=\lambda _{\min }(\varSigma _{2})$ is

\[\sum \limits_{i=1}^{n}\frac{\phi _{N(\mu _{1},\varSigma _{1})}(x_{i},y_{i})-\phi _{N(\mu _{2},\varSigma _{2})}(x_{i},y_{i})}{p\phi _{N(\mu _{1},\varSigma _{1})}(x_{i},y_{i})+(1-p)\phi _{N(\mu _{2},\varSigma _{2})}(x_{i},y_{i})}=0,\]

\[\sum \limits_{i=1}^{n}\frac{p\frac{\partial \phi _{N(\mu _{1},\varSigma _{1})}(x_{i},y_{i})}{\partial \textit{par}}+(1-p)\frac{\partial \phi _{N(\mu _{2},\varSigma _{2})}(x_{i},y_{i})}{\partial \textit{par}}}{p\phi _{N(\mu _{1},\varSigma _{1})}(x_{i},y_{i})+(1-p)\phi _{N(\mu _{2},\varSigma _{2})}(x_{i},y_{i})}=0,\]

where $\textit{par}$ is a vector parameterization of $(\boldsymbol{\mu }_{1},\varSigma _{1},\boldsymbol{\mu }_{2},\varSigma _{2})$ such that $\lambda _{\min }(\varSigma _{1})=\lambda _{\min }(\varSigma _{2})$. Hence, the maximum likelihood estimator is a stationary point of the function

\[Q_{\mathbf{w}}(p,\mathrm{par})=\sum \limits_{i=1}^{n}\big(w_{i}\ln \big(p\phi _{N(\mu _{1},\varSigma _{1})}(x_{i},y_{i})\big)+(1-w_{i})\ln \big((1-p)\phi _{N(\mu _{1},\varSigma _{1})}(x_{i},y_{i})\big)\big)\]

with fixed

\[w_{i}=\frac{\hat{p}\phi _{N(\hat{\mu }_{1},\widehat{\varSigma }_{1})}(x_{i},y_{i})}{\hat{p}\phi _{N(\hat{\mu }_{1},\widehat{\varSigma }_{1})}(x_{i},y_{i})+(1-\hat{p})\phi _{N(\hat{\mu }_{2},\widehat{\varSigma }_{2})}(x_{i},y_{i})}.\]

The EM algorithm is iterative. Once the $(m-1)$th approximation $({p}^{(m-1)},{\mu _{1}^{(m-1)}},{\varSigma _{1}^{(m-1)}},{\mu _{2}^{(m-1)}},{\varSigma _{2}^{(m-1)}})$ is obtained, the weights are evaluated:

\[{w_{i}^{(m-1)}}=\frac{{p}^{(m-1)}\phi _{N({\mu _{1}^{(m-1)}},{\varSigma _{1}^{(m-1)}})}(x_{i},y_{i})}{{p}^{(m-1)}\phi _{N({\mu _{1}^{(m-1)}},{\varSigma _{1}^{(m-1)}})}(x_{i},y_{i})+(1-{p}^{(m-1)})\phi _{N({\mu _{2}^{(m-1)}},{\varSigma _{2}^{(m-1)}})}(x_{i},y_{i})}.\]

Then mth approximation $({p}^{(m)},{\mu _{1}^{(m)}},{\varSigma _{1}^{(m)}},{\mu _{2}^{(m)}},{\varSigma _{2}^{(m)}})$ is obtained by minimizing

\[\sum \limits_{i=1}^{n}\big({w_{i}^{(m-1)}}\ln \big(p\phi _{N(\mu _{1},\varSigma _{1})}(x_{i},y_{i})\big)+\big(1-{w_{i}^{(m-1)}}\big)\ln \big((1-p)\phi _{N(\mu _{2},\varSigma _{2})}(x_{i},y_{i})\big)\big)\]

under the constraint $\lambda _{\min }(\varSigma _{1})=\lambda _{\min }(\varSigma _{2})$. The point where the minimum is attained can be explicitly expressed in ${w_{i}^{(m-1)}}$, $x_{i}$, and $y_{i}$, $i=1,\dots ,n$.

4.3 RBAN-moment estimator

In case the criterion function

\[\begin{array}{r@{\hskip0pt}l}\displaystyle Q(\theta )& \displaystyle =Q\big(k_{1},h_{1},k_{2},h_{2},{\sigma }^{2}\big)\\{} & \displaystyle =\underset{p\in \mathbb{R}}{\min }\underset{\mathbf{M}\in {\mathbb{R}}^{2\times 4}}{\min }{\big(f_{2}(\theta ;p,\mathbf{M})-\overline{\boldsymbol{m}}\big)}^{\top }{\varSigma _{m}^{-1}}\big(f_{2}(\theta ;p,\mathbf{M})-\overline{\boldsymbol{m}}\big)\end{array}\]

has multiple minima, a consistent estimator—that is, the “ignore-$\widehat{F}$” estimator—is used as the initial point, and the criterion function $Q(\theta )$ is searched for a local minimum nearby. Here $\theta ={(k_{1},h_{1},k_{2},h_{2},{\sigma }^{2})}^{\top }$ is a vector meaning the parameters of interest.

The knowledge or misspecification of the parameter p does not affect the estimator for the parameters of interest $k_{1}$, …, ${\sigma }^{2}$. Thus, for estimation of the asymptotic covariance matrix, assume $p=0.5$ to be known. The estimator of the asymptotic covariance matrix of $(\theta ,\mathbf{M})$ is

\[\widehat{\varSigma }_{\theta ,\mathbf{M}}=\big({f^{\prime }_{2}}{(\hat{\theta };\hspace{0.1667em}0.5,\widehat{\mathbf{M}})}^{\top }{\varSigma _{m}^{-1}}{f^{\prime }_{2}}{(\hat{\theta };\hspace{0.1667em}0.5,\widehat{\mathbf{M})}\big)}^{-1},\]

where

\[{f^{\prime }_{2}}(\theta ;0.5,\mathbf{M})=\left(\begin{array}{c@{\hskip10.0pt}c}\frac{\partial f_{2}(\theta ;\hspace{0.1667em}0.5,\mathbf{M})}{\partial {\theta }^{\top }},& \frac{\partial f_{2}(\theta ;\hspace{0.1667em}0.5,\mathbf{M})}{\partial {(\operatorname{vec}\mathbf{M})}^{\top }}\end{array}\right).\]

The estimator of the asymptotic covariance matrix of θ is the principal submatrix of $\widehat{\varSigma }_{\theta ,\mathbf{M}}$.

4.4 Simulation results

Average of estimated centers over 100 simulations, standard deviations over 100 simulations, and medians of estimated standard errors are presented in Tables 1–3.

Table 1.

Means, standard deviations, and median standard errors of the estimates of intersection points for true points having mixture of singular normal distributions

Method	Means		Standard deviations		Standard errors
True value	−0.08	0.31
$n=1000,\hspace{2em}\sigma =0.1\hspace{1em}({\sigma }^{2}=0.01)$
Ignore-$\widehat{F}$	−0.1098	0.2753	0.8611	0.3866	0.1918	0.1125
Update	−0.0820	0.2912	0.0706	0.0620	0.0437	0.0479
OR	0.6533	3.4524	0.0877	0.6783
ML	−0.0795	0.3077	0.0326	0.0269
RBAN	0.0647	0.3759	0.3563	0.2606	0.0350	0.0438
$n=10000,\hspace{2em}\sigma =0.1\hspace{1em}({\sigma }^{2}=0.01)$
Ignore-$\widehat{F}$	−0.0909	0.3052	0.0646	0.0308	0.0601	0.0303
Update	−0.0796	0.3080	0.0127	0.0156	0.0124	0.0155
OR	0.5492	3.1488	0.0175	0.1701
ML	−0.0776	0.3083	0.0103	0.0088
RBAN	−0.0789	0.3100	0.0126	0.0154	0.0127	0.0154
$n=100000,\hspace{2em}\sigma =0.1$
Ignore-$\widehat{F}$	−0.0799	0.3101	0.0211	0.0095	0.0188	0.0093
Update	−0.0801	0.3098	0.0037	0.0042	0.0039	0.0047
OR	0.5606	3.2041	0.0063	0.0484
ML	−0.0801	0.3101	0.0030	0.0025
RBAN	−0.0801	0.3101	0.0038	0.0042	0.0039	0.0048
$n=1000,\hspace{2em}\sigma =0.02$
Ignore-$\widehat{F}$	−0.0799	0.3099	0.0151	0.0075	0.0147	0.0072
Update	−0.0795	0.3097	0.0052	0.0051	0.0052	0.0049
OR	−0.0792	0.3092	0.0050	0.0044
ML	−0.0794	0.3093	0.0048	0.0043
RBAN	−0.0797	0.3098	0.0063	0.0057	0.0052	0.0049

Table 2.

Means, standard deviations, and median standard errors of the estimates of intersection points for discrete distribution of the true points

Method	Means		Standard deviations		Standard errors
True value	−0.08	0.31
$n=1000,\hspace{2em}\sigma =0.1$
Ignore-$\widehat{F}$	−0.0699	0.3077	0.0263	0.0290	0.0241	0.0263
Update	−0.0722	0.3116	0.0197	0.0186	0.0203	0.0175
OR	−0.0755	0.3188	0.0148	0.0144
ML	−0.0958	0.3315	0.0131	0.0120
RBAN	−0.0717	0.3105	0.0209	0.0175	0.0205	0.0178
$n=10000,\hspace{2em}\sigma =0.1$
Ignore-$\widehat{F}$	−0.0783	0.3109	0.0092	0.0078	0.0080	0.0083
Update	−0.0785	0.3114	0.0071	0.0054	0.0065	0.0061
OR	−0.0721	0.3157	0.0048	0.0046
ML	−0.0931	0.3278	0.0043	0.0035
RBAN	−0.0786	0.3113	0.0071	0.0053	0.0065	0.0061
$n=100000,\hspace{2em}\sigma =0.1$
Ignore-$\widehat{F}$	−0.0798	0.3098	0.0031	0.0024	0.0026	0.0027
Update	−0.0799	0.3099	0.0025	0.0016	0.0021	0.0019
OR	−0.0715	0.3151	0.0017	0.0013
ML	−0.0932	0.3283	0.0013	0.0011
RBAN	−0.0799	0.3099	0.0024	0.0017	0.0021	0.0019
$n=1000,\hspace{2em}\sigma =0.02$
Ignore-$\widehat{F}$	−0.0796	0.3094	0.0033	0.0032	0.0036	0.0033
Update	−0.0798	0.3097	0.0030	0.0024	0.0033	0.0023
OR	−0.0782	0.3086	0.0021	0.0018
ML	−0.0786	0.3087	0.0019	0.0018
RBAN	−0.0796	0.3092	0.0030	0.0028	0.0033	0.0024

Table 3.

Means, standard deviations, and median standard errors of the estimates of intersection points for uniform distribution of the true points on two line segments

Method	Means		Standard deviations		Standard errors
True value	−0.08	0.31
$n=1000,\hspace{2em}\sigma =0.1$
Ignore-$\widehat{F}$	−0.0785	0.3122	0.0363	0.0274	0.0318	0.0301
Update	−0.0794	0.3140	0.0216	0.0258	0.0205	0.0290
OR	−0.0616	0.3127	0.0185	0.0167
ML	−0.0934	0.3118	0.0116	0.0111
RBAN	−0.0807	0.3126	0.0219	0.0293	0.0193	0.0292
$n=10000,\hspace{2em}\sigma =0.1$
Ignore-$\widehat{F}$	−0.0796	0.3107	0.0103	0.0103	0.0099	0.0095
Update	−0.0796	0.3110	0.0067	0.0103	0.0065	0.0094
OR	−0.0639	0.3087	0.0064	0.0049
ML	−0.0904	0.3106	0.0042	0.0033
RBAN	−0.0797	0.3107	0.0066	0.0104	0.0064	0.0095
$n=100000,\hspace{2em}\sigma =0.1$
Ignore-$\widehat{F}$	−0.0798	0.3098	0.0035	0.0030	0.0032	0.0030
Update	−0.0798	0.3098	0.0021	0.0029	0.0020	0.0030
OR	−0.0625	0.3085	0.0015	0.0014
ML	−0.0891	0.3107	0.0012	0.0011
RBAN	−0.0796	0.3097	0.0023	0.0030	0.0020	0.0030
$n=1000,\hspace{2em}\sigma =0.02$
Ignore-$\widehat{F}$	−0.0799	0.3100	0.0041	0.0032	0.0041	0.0035
Update	−0.0798	0.3101	0.0033	0.0032	0.0032	0.0034
OR	−0.0803	0.3103	0.0023	0.0021
ML	−0.0805	0.3101	0.0022	0.0020
RBAN	−0.0798	0.3100	0.0035	0.0033	0.0032	0.0034

Using the estimator $\widehat{F}$ (by one-step update before ignore-$\widehat{F}$ step), we improve the precision of estimation. The precision of the RBAN-moment estimator approximates the precision of the updated before ignore-$\widehat{F}$ step estimator.

The parametric maximum likelihood estimator is the best when the normality condition, which was assumed during construction of the estimator, is satisfied. Otherwise, it is biased.

The orthogonal regression and the maximum likelihood estimators are good for small error variance (${\sigma }^{2}={0.02}^{2}$). For ${\sigma }^{2}=0.{1}^{2}$, the orthogonal regression estimator is broken down when the distribution of true points is a mixture of two normal distributions and is biased for the two other distributions of true points.

Mean-square deviance of the intersection of the estimated lines from the true intersection point is presented in Table 4.

Table 4.

Mean-square distances between estimated and true intersection points

n	σ	Ignore-$\widehat{F}$	Update	OR	ML	RBAN
Distribution of true points is a mixture of normals
1000	0.1	0.9403	0.0954	3.2978	0.0421	0.6124
10000	0.1	0.0722	0.0201	2.9127	0.0138	0.0199
100000	0.1	0.0230	0.0056	2.9645	0.0038	0.0056
1000	0.02	0.0168	0.0073	0.0067	0.0065	0.0084
Discrete distribution of true points
1000	0.1	0.0403	0.0281	0.0228	0.0320	0.0284
10000	0.1	0.0121	0.0091	0.0118	0.0228	0.0090
100000	0.1	0.0039	0.0029	0.0101	0.0226	0.0029
1000	0.02	0.0046	0.0038	0.0036	0.0032	0.0042
Uniform distribution of true points
1000	0.1	0.0453	0.0367	0.0310	0.0209	0.0365
10000	0.1	0.0145	0.0123	0.0181	0.0117	0.0123
100000	0.1	0.0046	0.0036	0.0177	0.0093	0.0037
1000	0.02	0.0052	0.0046	0.0031	0.0030	0.0048

For small errors, the RBAN-moment estimator is a bit less accurate than the updated before ignore-$\widehat{F}$ step estimator. For ${\sigma }^{2}=0.{1}^{2}$, the difference is negligible.

The parametric maximum likelihood estimator has the smallest deviation from the true value, except for the discrete distribution of true points and ${\sigma }^{2}=0.01$.

For small errors (${\sigma }^{2}={0.02}^{2}$), the orthogonal regression estimator outperforms the consistent estimators and has the deviation approximately as small as the parametric maximum likelihood estimator.

Normalization of the estimator of $\boldsymbol{\beta }$ affects the ALS2-based estimator of two lines with one-step update before the ignore-$\widehat{F}$ step. With normalization $\| \hat{\boldsymbol{\beta }}\| =1$, the derived estimator of two lines is not equivariant, whereas with normalization ${\tilde{\boldsymbol{\beta }}}^{\top }{\boldsymbol{\varPsi }^{\prime }_{n}}({\hat{\sigma }}^{2})\tilde{\boldsymbol{\beta }}=-n$, the derived estimator is equivariant. Comparison of equivariant and nonequivariant versions of the estimator is displayed in Table 5.

Table 5.

Comparison of two versions (equivariant (ev) and nonequivariant (ne)) of the updated before ignore-$\widehat{F}$ step estimator

n	σ	Ver.	Means		Standard deviations		Standard errors
True value:			−0.08	0.31
Distribution of true points is a mixture of normals
1000	0.1	ev	−0.082046	0.291175	0.070617	0.062003	0.043713	0.047939
		ne	−0.044038	0.247382	0.251372	0.514473	0.038853	0.050167
10000	0.1	ev	−0.079623	0.308039	0.012652	0.015582	0.012403	0.015471
		ne	−0.085055	0.304177	0.015924	0.018695	0.012506	0.015550
100000	0.1	ev	−0.080137	0.309780	0.003710	0.004173	0.003925	0.004749
		ne	−0.080991	0.309386	0.003880	0.004255	0.003926	0.004742
1000	0.02	ev	−0.079548	0.309703	0.005156	0.005131	0.005174	0.004891
		ne	−0.079918	0.309508	0.005247	0.005149	0.005179	0.004918
Discrete distribution of true points
1000	0.1	ev	−0.072202	0.311648	0.019709	0.018553	0.020266	0.017500
		ne	−0.071460	0.312230	0.020049	0.018740	0.020213	0.017457
10000	0.1	ev	−0.078482	0.311377	0.007087	0.005371	0.006518	0.006066
		ne	−0.078418	0.311436	0.007090	0.005387	0.006520	0.006054
100000	0.1	ev	−0.079868	0.309929	0.002460	0.001647	0.002060	0.001900
		ne	−0.079863	0.309934	0.002461	0.001647	0.002060	0.001901
1000	0.02	ev	−0.079772	0.309728	0.002967	0.002376	0.003320	0.002344
		ne	−0.079755	0.309732	0.002963	0.002375	0.003319	0.002339
Uniform distribution of true points
1000	0.1	ev	−0.079405	0.313977	0.021551	0.025759	0.020451	0.028992
		ne	−0.078507	0.315350	0.022219	0.026091	0.020512	0.029115
10000	0.1	ev	−0.079604	0.311024	0.006673	0.010337	0.006467	0.009389
		ne	−0.079576	0.311176	0.006685	0.010349	0.006456	0.009372
100000	0.1	ev	−0.079795	0.309802	0.002075	0.002919	0.001974	0.002994
		ne	−0.079794	0.309818	0.002076	0.002921	0.001972	0.002992
1000	0.02	ev	−0.079833	0.310081	0.003252	0.003249	0.003172	0.003418
		ne	−0.079825	0.310097	0.003250	0.003249	0.003169	0.003411

Table 6.

Coverage probability and area of confidence ellipsoids (c.e.) for centers by the ALS2 estimator

n	σ	$\widehat{\varSigma }_{\mathrm{true}}$-based estimator			$\widehat{\varSigma }_{\mathrm{sample}}$-based estimator
		Coverage probab.		Area of	Coverage probab.		Area of
		80%,	95%,	95% c.e.,	80%,	95%,	95% c.e.,
		%	%	$\times {10}^{-4}$	%	%	$\times {10}^{-4}$
Distribution of true points is a mixture of normals
1000	0.1	70.6	80.2	1449.	70.0	79.2	1562.
10000	0.1	79.4	93.8	236.2	79.6	92.9	259.4
100000	0.1	80.7	94.9	15.38	80.6	94.9	15.17
1000	0.02	80.4	95.1	15.95	78.0	94.1	19.86
Discrete distribution of true points
1000	0.1	78.1	93.9	81.80	77.4	93.4	78.94
10000	0.1	81.1	95.6	12.34	80.9	95.8	12.39
100000	0.1	80.1	94.6	1.205	79.9	94.7	1.204
1000	0.02	81.0	94.9	1.984	81.3	95.2	1.988
Uniform distribution of true points
1000	0.1	82.1	94.3	152.9	81.5	94.3	138.4
10000	0.1	81.0	96.6	20.36	80.3	96.3	18.92
100000	0.1	78.4	95.0	1.823	78.6	95.0	1.842
1000	0.02	78.7	94.7	2.926	78.5	94.1	3.041

There is a tendency that the equivariant version of the estimator is more accurate for small samples than the nonequivariant version. The two versions of the estimator are consistent and asymptotically equivalent. When the estimation is precise, the difference between the versions is negligible. When the estimation is imprecise, it is impossible to make inference which version is more accurate.

4.5 Comparison of two estimators for asymptotic covariance matrix in the conic section fitting model

In [13] a conic section fitting model is considered, and two estimators ($\widehat{\varSigma }_{\mathrm{true}}$ and $\widehat{\varSigma }_{\mathrm{sample}}$) for the asymptotic covariance matrix of the ALS2 estimator are constructed.

The software developed here can be used to make numerical comparison of the estimates of the asymptotic covariance matrices. The data are generated as described in Section 4.1 with 1000 simulations for each set of true points. Thus, the true conic unnecessarily was chosen degenerate. For each simulation, the parameters of the conic section were estimated; its center is found, and two confidence ellipsoids for the center were constructed using two different estimators of the asymptotic covariance matrix.

The sample coverage probability and median (over 1000 ellipsoids) area of the confidence ellipsoids is presented in Table 6. The ellipsoids were constructed for confidence levels 0.8 and 0.95. The area of 95% confidence ellipsoids is displayed in Table 6, and the area of 80% confidence ellipsoid is $\log _{20}(5)=0.5372$ of the area of 95% confidence ellipsoids.

Note that standard errors for coverage probability are $1.3\% $ for 80% confidence ellipsoids and $0.7\% $ for 95% confidence ellipsoids. The simulations do not allow us to make an inference which estimator is better. Thus, $\widehat{\varSigma }_{\mathrm{sample}}$-based estimator updated before ignore-$\widehat{F}$ step is compared with other estimators in simulations in Section 4.4 because of simpler explicit expression for $\widehat{\varSigma }_{\mathrm{sample}}$.

A Proofs

Proof of Proposition 1.

The strong consistency of the estimator follows from [11, Theorem 17]. Under the conditions of Proposition 1,

(39)

\[\frac{1}{n}\boldsymbol{\varPsi }_{n}\big({\sigma }^{2}\big)\to \overline{\boldsymbol{\varPsi }}_{\infty }\hspace{1em}\text{a.s.}\]

(40)

\[\frac{1}{n}\overline{\boldsymbol{\varPsi }}_{n}\to \overline{\boldsymbol{\varPsi }}_{\infty }\hspace{1em}\text{(a.s. in the structural model).}\]

By Lemma 5 in [11], $\limsup \frac{1}{n}{\boldsymbol{\beta }}^{\top }{\overline{\boldsymbol{\varPsi }}^{\prime }_{n}}\boldsymbol{\beta }<0$, which, together with (40), implies ${\boldsymbol{\beta }}^{\top }{\overline{\boldsymbol{\varPsi }}^{\prime }_{\infty }}\boldsymbol{\beta }<0$ (see the proof of Theorem 2 in [12]). Then

(41)

\[\frac{{\hat{\boldsymbol{\beta }}}^{\top }{\boldsymbol{\varPsi }^{\prime }_{n}}({\hat{\sigma }}^{2})\hat{\boldsymbol{\beta }}}{n\hspace{0.1667em}\| \hat{\boldsymbol{\beta }}{\| }^{2}}=\frac{{\hat{\boldsymbol{\beta }}}^{\top }{\boldsymbol{\varPsi }^{\prime }_{n}}({\sigma }^{2})\hat{\boldsymbol{\beta }}}{n\hspace{0.1667em}\| \hat{\boldsymbol{\beta }}{\| }^{2}}+\frac{({\hat{\sigma }}^{2}-{\sigma }^{2}){\hat{\boldsymbol{\beta }}}^{\top }{\boldsymbol{\varPsi }^{\prime\prime }_{1}}\hat{\boldsymbol{\beta }}}{\| \hat{\boldsymbol{\beta }}{\| }^{2}}\to \frac{{\boldsymbol{\beta }}^{\top }{\overline{\boldsymbol{\varPsi }}^{\prime }_{\infty }}\boldsymbol{\beta }}{\| \boldsymbol{\beta }{\| }^{2}}\hspace{1em}\text{a.s.}\]

Eventually, the left-hand side of (41) in negative. □

Proof of Proposition 2.

The strong consistency of $\widetilde{\boldsymbol{\beta }}$ follows from (7) and (41). The proof of asymptotic normality and consistency of the estimator of the asymptotic covariance matrix can be obtained by modification of the proofs of Theorem 2 in [12] and Theorem 3 in [13]. □

Proof of Proposition 3.

The conditions of consistency Theorem 1 in [12] can be verified, and the consistency follows.

The most tedious is the condition

(42)

\[\underset{n\to \infty }{\liminf }\frac{1}{n}\lambda _{\min ,2}(\overline{\varPsi }_{n})>0.\]

Denote

\[K_{j}=\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}0& 0& 1\\{} 0& h_{j}& k_{j}\\{} 0& 1& 0\\{} h_{j}& k_{j}& 0\\{} 1& 0& 0\end{array}\right),\hspace{1em}j=1,\hspace{0.1667em}2.\]

Then ${K_{1}^{}}{K_{1}^{\top }}+{K_{2}^{}}{K_{2}^{\top }}$ is a positive semidefinite matrix, and

\[\det \big({K_{1}^{}}{K_{1}^{\top }}+{K_{2}^{}}{K_{2}^{\top }}\big)=2{(h_{1}-h_{2})}^{4}+2{(h_{1}-h_{2})}^{2}{(k_{1}-k_{2})}^{2}+2{(k_{1}-k_{2})}^{4}>0.\]

Thus, $\lambda _{\min }({K_{1}^{}}{K_{1}^{\top }}+{K_{2}^{}}{K_{2}^{\top }})>0$.

The matrix

\[{K_{1}^{}}\sum \limits_{\begin{array}{c} i=1,\dots ,n\\{} \nu (i)=1\end{array}}\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}1& {\xi _{i}^{}}& {\xi _{i}^{2}}\\{} {\xi _{i}^{}}& {\xi _{i}^{2}}& {\xi _{i}^{3}}\\{} {\xi _{i}^{2}}& {\xi _{i}^{3}}& {\xi _{i}^{4}}\end{array}\right){K_{1}^{\top }}+K_{2}\sum \limits_{\begin{array}{c}i=1,\dots ,n\\{} \nu (i)=2\end{array}}\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}1& {\xi _{i}^{}}& {\xi _{i}^{2}}\\{} {\xi _{i}^{}}& {\xi _{i}^{2}}& {\xi _{i}^{3}}\\{} {\xi _{i}^{2}}& {\xi _{i}^{3}}& {\xi _{i}^{4}}\end{array}\right){K_{2}^{\top }}\]

is the principal submatrix of $\overline{\varPsi }_{n}$. By the Cauchy interlacing theorem,

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \lambda _{\min ,2}(\overline{\varPsi }_{n})& \displaystyle \ge \lambda _{\min }\left(\sum \limits_{j=1}^{2}{K_{j}^{}}\sum \limits_{\begin{array}{c} i=1,\dots ,n\\{} \nu (i)=j\end{array}}\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}1& {\xi _{i}^{}}& {\xi _{i}^{2}}\\{} {\xi _{i}^{}}& {\xi _{i}^{2}}& {\xi _{i}^{3}}\\{} {\xi _{i}^{2}}& {\xi _{i}^{3}}& {\xi _{i}^{4}}\end{array}\right){K_{j}^{\top }}\right)\\{} & \displaystyle \ge \lambda _{\min }\big({K_{1}^{}}{K_{1}^{\top }}+{K_{2}^{}}{K_{2}^{\top }}\big)\underset{j=1,\hspace{0.1667em}2}{\min }\left(\lambda _{\min }\left(\sum \limits_{\begin{array}{c} i=1,\dots ,n\\{} \nu (i)=j\end{array}}\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}1& {\xi _{i}^{}}& {\xi _{i}^{2}}\\{} {\xi _{i}^{}}& {\xi _{i}^{2}}& {\xi _{i}^{3}}\\{} {\xi _{i}^{2}}& {\xi _{i}^{3}}& {\xi _{i}^{4}}\end{array}\right)\right)\right),\end{array}\]

and then inequality (42) can easily be proved. □

Proof of Proposition 4.

Proposition 4 follows from Proposition 25 in [11]. The identifiability condition (S5-) in [11] holds because the intersection of a couple of lines and a conic section may be a finite set with not more than four points, a straight line, a straight line and a point outside the line, or the couple of lines and the conic section coincide; in the last case, the coefficients of the equations for the lines and the conic section satisfy relations (12). □

Proofs of Propositions 5, 6, 7, and 8.

Consistence of the “ignore-$\widehat{F}$” estimator follows from the consistency of the ALS2 estimator $\hat{\boldsymbol{\beta }}$ and from the continuity of the function $\operatorname{lob}(\boldsymbol{\beta })$ at the point of the true value of the parameter $\boldsymbol{\beta }$. The asymptotic normality of the “ignore-$\widehat{F}$” follows from the asymptotic normality of $\tilde{\boldsymbol{\beta }}$ and the differentiability of $\operatorname{lob}(\boldsymbol{\beta })$ at the point $\boldsymbol{\beta }_{\mathrm{tn}}$. □

Proof of Proposition 9.

The consistency and asymptotic normality of the $\tilde{\boldsymbol{\beta }}$ estimator, the differentiability of the functional $\Delta (\boldsymbol{\beta })$ at point $\boldsymbol{\beta }_{\mathrm{tn}}$ (note that $\Delta (\boldsymbol{\beta }_{\mathrm{tn}})=0$), and the convergence

\[\frac{1}{{\Delta ^{\prime }}(\tilde{\boldsymbol{\beta }})\widehat{\varSigma }_{\tilde{\beta }}{\Delta ^{\prime }}{(\tilde{\boldsymbol{\beta }})}^{\top }}\widehat{\varSigma }_{\tilde{\beta }}{\Delta ^{\prime }}{(\tilde{\boldsymbol{\beta }})}^{\top }\to \frac{1}{{\Delta ^{\prime }}(\boldsymbol{\beta }_{\mathrm{tn}})\varSigma _{\tilde{\beta }}{\Delta ^{\prime }}{(\boldsymbol{\beta }_{\mathrm{tn}})}^{\top }}\varSigma _{\tilde{\beta }}{\Delta ^{\prime }}{(\boldsymbol{\beta }_{\mathrm{tn}})}^{\top }\]

imply the convergence and asymptotic normality of the updated estimator $\tilde{\boldsymbol{\beta }}_{\mathrm{1}\mathrm{st}}$. Thus, the consistency and asymptotic normality of ${(\hat{k}_{1,\mathrm{1}\mathrm{st}},\hat{h}_{1,\mathrm{1}\mathrm{st}},\hat{k}_{2,\mathrm{1}\mathrm{st}},\hat{h}_{2,\mathrm{1}\mathrm{st}})}^{\top }$ can be proved similarly to those of the “ignore-$\widehat{F}$” estimator. □

References

[1]

Ahn, S.J.: Least Squares Orthogonal Distance Fitting of Curves and Surfaces in Space. Springer, Heidelberg (2004). doi:10.1007/b104017

[2]

Bilmes, J.A.: A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical report TR-97-021, International Computer Science Institute (1998)

[3]

Cheng, C.-L., Van Ness, J.W.: Statistical Regression with Measurement Error. Arnold, London (1999). MR1719513

[4]

Chiang, C.L.: On regular best asymptotically normal estimates. Ann. Math. Stat. 27(2), 336–351 (1956). doi:10.1214/aoms/1177728262 MR0089558

[5]

Fazekas, I., Kukush, A., Zwanzig, S.: Correction of nonlinear orthogonal regression estimator. Ukr. Math. J. 56(8), 1308–1330 (2004). MR2136312. doi:10.1007/s11253-005-0059-0

[6]

Kukush, A., Markovsky, I., Van Huffel, S.: Consistent fundamental matrix estimation in a quadratic measurement error model arising in motion analysis. Comput. Stat. Data Anal. 41(1), 3–18 (2002). MR1944689. doi:10.1016/S0167-9473(02)00068-3

[7]

Kukush, A., Markovsky, I., Van Huffel, S.: Correction of nonlinear orthogonal regression estimator. Comput. Stat. Data Anal. 47(1), 123–147 (2004). MR2087933. doi:10.1016/j.csda.2003.10.022

[8]

Markovsky, I., Van Huffel, S., Kukush, A.: On the computation of the multivariate structured total least squares estimator. Numer. Linear Algebra Appl. 11(5–6), 591–608 (2004). MR2067822. doi:10.1002/nla.361

[9]

Repetatska, G.: An improved orthogonal regression estimator for the implicit functional errors-in-variables model. Bull. Kyiv Natl. Taras Shevchenko Univ. Math. Mech. 23, 37–45 (2010)

[10]

Rohatgi, V.K., Székely, G.J.: Sharp inequalities between skewness and kurtosis. Stat. Probab. Lett. 8(4), 296–299 (1989). MR1028986. doi:10.1016/0167-7152(89)90035-7

[11]

Shklyar, S., Kukush, A., Markovsky, I., Van Huffel, S.: On the conic section fitting problem. J. Multivar. Anal. 98(3), 588–624 (2007). MR2293016. doi:10.1016/j.jmva.2005.12.003

[12]

Shklyar, S.V.: Singular asymptotic normality of an estimator in the conic section fitting problem. I. Teor. Imovir. Mat. Stat. 92, 137–150 (2015)

[13]

Shklyar, S.V.: Singular asymptotic normality of an estimator in the conic section fitting problem. II. Teor. Imovir. Mat. Stat. 93, 163–180 (2015)

[14]

Vidal, R., Ma, Y., Sastry, S.: Generalized principal component analysis (GPCA). IEEE Trans. Pattern Anal. Mach. Intell. 27(12), 1945–1959 (2005). doi:10.1109/TPAMI.2005.244

[15]

Waibel, P., Matthes, J., Gröll, L.: Constrained ellipse fitting with center on a line. J. Math. Imaging Vis. 53(3), 364–382 (2015). MR3397105. doi:10.1007/s10851-015-0584-x

[16]

Zelnik-Manor, L., Irani, M.: Multi-view subspace constraints on homographies. In: Proceedings of the 7th IEEE International Conference on Computer Vision, 1999, vol. 2, pp. 710–7152 (1999). doi:10.1109/ICCV.1999.790291

Reading mode

Table of contents

1 Introduction
2 Estimators
3 Equivariance
4 Simulations
A Proofs
References

Open access article under the CC BY license.

Keywords

Conic section fitting curve fitting subspace clustering

MSC2010

62J05 62H12 62H30

Metrics

since March 2018

495

Article info
views

464

Full article
views

355

PDF
downloads

159

XML
downloads

RSS

Figures
2
Tables
6

Fig. 1.

Fig. 2.

Table 1.

Means, standard deviations, and median standard errors of the estimates of intersection points for true points having mixture of singular normal distributions

Table 2.

Means, standard deviations, and median standard errors of the estimates of intersection points for discrete distribution of the true points

Table 3.

Means, standard deviations, and median standard errors of the estimates of intersection points for uniform distribution of the true points on two line segments

Table 4.

Mean-square distances between estimated and true intersection points

Table 5.

Comparison of two versions (equivariant (ev) and nonequivariant (ne)) of the updated before ignore-$\widehat{F}$ step estimator

Table 6.

Coverage probability and area of confidence ellipsoids (c.e.) for centers by the ALS2 estimator