1 Introduction
1.1 Two-line fitting model
Consider a problem of estimation of two lines by perturbed observations of points that lie on the lines. Let the true points $(\xi _{i},\eta _{i})$ lie on the union of two different lines $\eta =k_{1}\xi +h_{1}$ and $\eta =k_{2}\xi +h_{2}$, that is,
Let these points be observed with perturbations $(\delta _{i},\varepsilon _{i})$, $i=1,\dots ,n$, that is, the observed points are $(x_{i},y_{i})$, $i=1,\dots ,n$, with The perturbations are assumed to be independent and identically normally distributed,
where I is the $2\times 2$ identity matrix.
(1)
\[\bigg[\begin{array}{l@{\hskip10.0pt}l}\text{either}\hspace{1em}& \eta _{i}=k_{1}\xi _{i}+h_{1},\\{} \text{or}\hspace{1em}& \eta _{i}=k_{2}\xi _{i}+h_{2},\end{array}\hspace{1em}i=1,2,\dots \hspace{0.1667em}.\](4)
\[\left(\begin{array}{c}\delta _{i}\\{} \varepsilon _{i}\end{array}\right)\sim N\big(0,{\sigma }^{2}\mathbf{I}\big),\]The parameters $k_{1}$, $h_{1}$, $k_{2}$, $h_{2}$, and ${\sigma }^{2}$ are to be estimated.
We consider both functional and structural models. In functional model, the true points are assumed to be nonrandom. In structural model, the true points are assumed to be independent and identically distributed (i.i.d.). The errors $(\delta _{i},\varepsilon _{i})$ are i.i.d. and independent of the true points.
In the structural model, $(\xi _{i},\eta _{i},\delta _{i},\varepsilon _{i})$ are i.i.d. random vectors, and thus, the observed points $(x_{i},y_{i})$ are i.i.d. In the functional model, the observed points are independent, Gaussian, with different means but with common covariance matrix.
Remark 1.
The true lines defined by Eqs. (1) cannot be parallel to the y-axis. In order to avoid overflows during evaluation of the estimators (except of RBAN-moment estimator), another parameterization is used internally: ${\vec{\tau }}^{\top }(\boldsymbol{\zeta }-\boldsymbol{\zeta }_{0})=0$, where $\vec{\tau }$ is a unit vector orthogonal to the line, and $\boldsymbol{\zeta }_{0}$ is a point on the line. The computation of the RBAN-moment estimator (see Section 2.4) is implemented for explicit parameterization only. Computational optimization of the RBAN-moment estimator is a matter of further work.
The explicit parameterization has the advantage that the number of parameters is equal to the dimension of parameter space. (In [12], the second-order equation (5) has six unknown coefficients, but the conic section can be parameterized with five parameters. The parameter space for the parameters of the conic section was the five-dimensional unit sphere in the six-dimensional Euclidean space. Mismatch between the number of parameters and the dimension of the parameter space made the asymptotic covariance matrix of the estimator singular.)
In simulations, the confidence intervals for the coordinates of the intersection point of the two lines are obtained based on the asymptotic covariance matrix for the intersection point. For the projections the ALS2 estimator, that asymptotic covariance matrix can be evaluated without use of explicit line parameterization.
1.2 Conic section fitting model
Let the true points $(\xi _{i},\eta _{i})$ lie on the second-order algebraic curve
Hereafter, a second-order algebraic curve is called a “conic section” or a “conic.”
The points are observed with Gaussian perturbations, and the perturbed points are denoted as $(x_{i},y_{i})$. We have the same equations
\[\begin{array}{l}x_{i}=\xi _{i}+\delta _{i},\\{} y_{i}=\eta _{i}+\varepsilon _{i},\end{array}\hspace{1em}\left(\begin{array}{c}\delta _{i}\\{} \varepsilon _{i}\end{array}\right)\sim N\big(0,{\sigma }^{2}\mathbf{I}\big),\]
as (2)–(4) in the two-line fitting model.The vector of coefficients in (5) is denoted by $\boldsymbol{\beta }={(A,2B,C,2D,2E,F)}^{\top }$. The nonzero vector $\boldsymbol{\beta }$ and the error variance ${\sigma }^{2}$ are the parameters of interest.
Similarly to the two-line fitting model, the functional and the structural models are distinguished.
A couple of lines is a degenerate case of a conic section. Therefore, the conic section fitting model is an extension of the two-line fitting model.
1.3 ALS2 estimator in conic section fitting model
We consider the adjusted least squares (ALS) estimator for unknown ${\sigma }^{2}$. The estimator is constructed in [7]. Introduce the $6\times 6$ symmetric matrix
\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \psi (x,y;v)\\{} & \displaystyle \hspace{1em}=\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c@{\hskip10.0pt}c@{\hskip10.0pt}c@{\hskip10.0pt}c}{x}^{4}-6{x}^{2}v+3{v}^{2}& ({x}^{3}-3xv)y& ({x}^{2}-v)({y}^{2}-v)& \ast & \ast & \ast \\{} ({x}^{3}-3xv)y& ({x}^{2}-v)({y}^{2}-v)& x\hspace{0.1667em}({y}^{3}-3yv)& \ast & \ast & \ast \\{} ({x}^{2}-v)({y}^{2}-v)& x\hspace{0.1667em}({y}^{3}-3yv)& {y}^{4}-6{y}^{2}v+3{v}^{2}& \ast & \ast & \ast \\{} {x}^{3}-3xv& ({x}^{2}-v)y& x\hspace{0.1667em}({y}^{2}-v)& {x}^{2}-v& xy& x\\{} ({x}^{2}-v)y& x\hspace{0.1667em}({y}^{2}-v)& {y}^{3}-3yv& xy& {y}^{2}-v& y\\{} {x}^{2}-v& xy& {y}^{2}-v& x& y& 1\end{array}\right).\end{array}\]
Asterisks are typed instead of some entries above the diagonal of a symmetric matrix. The entries of the matrix $\psi (x,y;v)$ are generalized Hermite polynomials in x and y. The matrix $\psi (x,y;v)$ is constructed such that $\operatorname{\mathsf{E}}\psi (x_{i},y_{i};{\sigma }^{2})=\psi (\xi _{i},\eta _{i};0)$ in the functional model and $\psi (\xi _{i},\eta _{i};0)\boldsymbol{\beta }=0$ for the true points and true parameters.The estimator ${\hat{\sigma }}^{2}$ of the error variance ${\sigma }^{2}$ is obtained from the equation
Equation (6) always has a unique nonnegative solution. If $n\ge 6$, then the solution to (6) is positive almost surely.
The matrix $\boldsymbol{\varPsi }_{n}({\hat{\sigma }}^{2})$ is singular. Define the estimator $\hat{\boldsymbol{\beta }}$ of the vector $\boldsymbol{\beta }$ as a nonzero solution to the equation
The strong consistency of the ALS2 estimator is proved in [7] and [11] under somewhat different conditions. The asymptotic normality is proved in [12] for the functional model and in [13] for the structural model. Two consistent estimators of the asymptotic covariance matrix are constructed in [13].
Denote
\[\begin{array}{r@{\hskip0pt}l@{\hskip0pt}r@{\hskip0pt}l}& \displaystyle {\psi ^{\prime }_{v}}(x,y;v)=\frac{\partial }{\partial v}\psi (x,y;v),\hspace{2em}& & \displaystyle {\boldsymbol{\varPsi }^{\prime\prime }_{1}}=\frac{{\partial }^{2}}{\partial {v}^{2}}\psi (x,y;v),\\{} & \hspace{2em}& & \displaystyle {\overline{\boldsymbol{\varPsi }}^{\prime }_{n}}(v)=\frac{\mathrm{d}}{\mathrm{d}v}\boldsymbol{\varPsi }_{n}(v),\\{} & \displaystyle \overline{\boldsymbol{\varPsi }}_{n}=\sum \limits_{i=1}^{n}\psi (\xi _{i},\eta _{i};0),\hspace{2em}& & \displaystyle {\overline{\boldsymbol{\varPsi }}^{\prime }_{n}}=\sum \limits_{i=1}^{n}{\psi ^{\prime }_{v}}(\xi _{i},\eta _{i};0),\\{} & \displaystyle \overline{\boldsymbol{\varPsi }}_{\infty }=\underset{n\to \infty }{\lim }\frac{1}{n}\overline{\boldsymbol{\varPsi }}_{n}=\underset{n\to \infty }{\lim }\frac{1}{n}\boldsymbol{\varPsi }_{n}\big({\sigma }^{2}\big),\hspace{2em}& & \displaystyle {\overline{\boldsymbol{\varPsi }}^{\prime }_{\infty }}=\underset{n\to \infty }{\lim }\frac{1}{n}{\overline{\boldsymbol{\varPsi }}^{\prime }_{n}}=\underset{n\to \infty }{\lim }\frac{1}{n}{\boldsymbol{\varPsi }^{\prime }_{n}}\big({\sigma }^{2}\big).\end{array}\]
Under the conditions of Proposition 1 stated further, the latter limits exist almost surely. See [12] for explicit expressions of the matrices ${\psi ^{\prime }_{v}}(x,y;v)$, ${\boldsymbol{\varPsi }^{\prime\prime }_{1}}$, $\overline{\boldsymbol{\varPsi }}_{\infty }$, and ${\overline{\boldsymbol{\varPsi }}^{\prime }_{\infty }}$. Note that ${\boldsymbol{\varPsi }^{\prime\prime }_{1}}$ is a constant matrix.Proposition 1.
In the functional model, for all integer $p\ge 0$ and $q\ge 0$ such that $p+q\le 4$, let the following limits exist and be finite:
\[\underset{n\to \infty }{\lim }\frac{1}{n}\sum \limits_{i=1}^{n}{\xi _{i}^{p}}{\eta _{j}^{q}}=:\mu _{p,q},\]
whereas in the structural model, let $\operatorname{\mathsf{E}}{\xi _{1}^{4}}<\infty $ and $\operatorname{\mathsf{E}}{\eta _{1}^{4}}<\infty $. In both models, let $\operatorname{rank}\overline{\boldsymbol{\varPsi }}_{\infty }=5$. Then:
-
1. The estimator $\hat{\boldsymbol{\beta }}$ is strongly consistent in the following sense:
(7)
\[\min \bigg(\bigg\| \frac{\hat{\boldsymbol{\beta }}}{\| \hat{\boldsymbol{\beta }}\| }-\frac{\boldsymbol{\beta }}{\| \boldsymbol{\beta }\| }\bigg\| ,\hspace{0.2222em}\bigg\| \frac{\hat{\boldsymbol{\beta }}}{\| \hat{\boldsymbol{\beta }}\| }+\frac{\boldsymbol{\beta }}{\| \boldsymbol{\beta }\| }\bigg\| \bigg)\to 0\hspace{1em}\textit{a.s.,}\] -
2. ${\boldsymbol{\beta }}^{\top }{\overline{\boldsymbol{\varPsi }}^{\prime }_{\infty }}\boldsymbol{\beta }<0$.
-
3. Eventually, ${\hat{\boldsymbol{\beta }}}^{\top }{\boldsymbol{\varPsi }^{\prime }_{n}}({\hat{\sigma }}^{2})\hat{\boldsymbol{\beta }}<0$.
“Eventually” in the previous statement means that almost surely there exists $n_{0}$ such that ${\hat{\boldsymbol{\beta }}}^{\top }{\boldsymbol{\varPsi }^{\prime }_{n}}({\hat{\sigma }}^{2})\hat{\boldsymbol{\beta }}<0$ for all $n\ge n_{0}$. In other words, almost surely, ${\hat{\boldsymbol{\beta }}}^{\top }{\boldsymbol{\varPsi }^{\prime }_{n}}({\hat{\sigma }}^{2})\hat{\boldsymbol{\beta }}\ge 0$ holds only for finitely many n.
Denote the normalized version of the true parameter
Normalize the estimator of $\boldsymbol{\beta }$ in such a way that ${\widetilde{\boldsymbol{\beta }}}^{\top }{\boldsymbol{\varPsi }^{\prime }_{n}}({\hat{\sigma }}^{2})\widetilde{\boldsymbol{\beta }}=-n$ and ${\boldsymbol{\beta }}^{\top }\widetilde{\boldsymbol{\beta }}\ge 0$. Therefore, denote
(9)
\[\widetilde{\boldsymbol{\beta }}=\left\{\begin{array}{l@{\hskip10.0pt}l}\sqrt{\frac{-n}{{\hat{\boldsymbol{\beta }}}^{\top }{\boldsymbol{\varPsi }^{\prime }_{n}}({\hat{\sigma }}^{2})\hat{\boldsymbol{\beta }}}}\hspace{0.1667em}\hat{\boldsymbol{\beta }}\hspace{1em}& \text{if }{\boldsymbol{\beta }}^{\top }\hat{\boldsymbol{\beta }}\ge 0,\\{} -\sqrt{\frac{-n}{{\hat{\boldsymbol{\beta }}}^{\top }{\boldsymbol{\varPsi }^{\prime }_{n}}({\hat{\sigma }}^{2})\hat{\boldsymbol{\beta }}}}\hspace{0.1667em}\hat{\boldsymbol{\beta }}\hspace{1em}& \text{if }{\boldsymbol{\beta }}^{\top }\hat{\boldsymbol{\beta }}<0.\end{array}\right.\]Proposition 2.
1. Under the conditions of Proposition 1, the estimator $\widetilde{\boldsymbol{\beta }}$ is a strongly consistent estimator of $\boldsymbol{\beta }_{\mathrm{tn}}={(-{\boldsymbol{\beta }}^{\top }{\overline{\boldsymbol{\varPsi }}^{\prime }_{\infty }}\boldsymbol{\beta })}^{-1/2}\boldsymbol{\beta }$, that is, $\widetilde{\boldsymbol{\beta }}\to \boldsymbol{\beta }_{\mathrm{tn}}$ a.s.
2. In the functional model, for all integer $p\ge 0$ and $q\ge 0$ such that $p+q\le 6$, let the following limits exist and be finite:
where
\[\underset{n\to \infty }{\lim }\frac{1}{n}\sum \limits_{i=1}^{n}{\xi _{i}^{p}}{\eta _{j}^{q}}=:\mu _{p,q},\]
whereas in the structural model, let $\operatorname{\mathsf{E}}{\xi _{1}^{6}}<\infty $ and $\operatorname{\mathsf{E}}{\eta _{1}^{6}}<\infty $. In both models, let $\operatorname{rank}\overline{\boldsymbol{\varPsi }}_{\infty }=5$. Then the estimator $\hat{\boldsymbol{\theta }}={({\widetilde{\boldsymbol{\beta }}}^{\top }\hspace{-0.1667em},{\hat{\sigma }}^{2})}^{\top }$ is asymptotically normal in the following sense:
(10)
\[\sqrt{n}\left(\begin{array}{c}\widetilde{\boldsymbol{\beta }}-\boldsymbol{\beta }_{\mathrm{tn}}\\{} {\hat{\sigma }}^{2}-{\sigma }^{2}\end{array}\right)\stackrel{\mathsf{d}}{\longrightarrow }{N}(0,\hspace{0.1667em}\varSigma _{\hat{\theta }}),\]
\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \varSigma _{\hat{\theta }}=\left(\begin{array}{c@{\hskip10.0pt}c}\overline{\boldsymbol{\varPsi }}_{\infty }& {\overline{\boldsymbol{\varPsi }}^{\prime }_{\infty }}\boldsymbol{\beta }_{\mathrm{tn}}\\{} {\boldsymbol{\beta }_{\mathrm{tn}}^{\top }}{\overline{\boldsymbol{\varPsi }}^{\prime }_{\infty }}& \frac{1}{2}{\boldsymbol{\beta }_{\mathrm{tn}}^{\top }}{\boldsymbol{\varPsi }^{\prime\prime }_{1}}\boldsymbol{\beta }_{\mathrm{tn}}\end{array}\right){}^{-1}\mathbf{B}{\left(\begin{array}{c@{\hskip10.0pt}c}\overline{\boldsymbol{\varPsi }}_{\infty }& {\overline{\boldsymbol{\varPsi }}^{\prime }_{\infty }}\boldsymbol{\beta }_{\mathrm{tn}}\\{} {\boldsymbol{\beta }_{\mathrm{tn}}^{\top }}{\overline{\boldsymbol{\varPsi }}^{\prime }_{\infty }}& \frac{1}{2}{\boldsymbol{\beta }_{\mathrm{tn}}^{\top }}{\boldsymbol{\varPsi }^{\prime\prime }_{1}}\boldsymbol{\beta }_{\mathrm{tn}}\end{array}\right)}^{-1},\\{} & \displaystyle \hspace{1em}\mathbf{B}=\underset{n\to \infty }{\lim }\frac{1}{n}\sum \limits_{i=1}^{n}\operatorname{\mathsf{E}}{\boldsymbol{s}_{i}^{}}{\boldsymbol{s}_{i}^{\top }},\\{} & \displaystyle \hspace{1em}\boldsymbol{s}_{i}=\left(\begin{array}{c}\psi (x_{i},y_{i};{\sigma }^{2})\boldsymbol{\beta }_{\mathrm{tn}}\\{} \frac{1}{2}{\boldsymbol{\beta }_{\mathrm{tn}}^{\top }}\psi (x_{i},y_{i};{\sigma }^{2})\boldsymbol{\beta }_{\mathrm{tn}}-\frac{1}{2}\end{array}\right).\end{array}\]
3. Under the conditions of part 2 of Proposition 2, the following estimator of the asymptotic covariance matrix is consistent:
\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \widehat{\varSigma }_{\hat{\theta }}(n)={\mathbf{A}}^{-1}(n)\mathbf{B}(n){\mathbf{A}}^{-1}(n),\\{} & \displaystyle \hspace{1em}\mathbf{A}(n)=\left(\begin{array}{c@{\hskip10.0pt}c}\frac{1}{n}\boldsymbol{\varPsi }_{n}({\hat{\sigma }}^{2})& \frac{1}{n}{\boldsymbol{\varPsi }^{\prime }_{n}}({\hat{\sigma }}^{2})\widetilde{\boldsymbol{\beta }}\\{} \frac{1}{n}{\widetilde{\boldsymbol{\beta }}}^{\top }{\boldsymbol{\varPsi }^{\prime }_{n}}({\hat{\sigma }}^{2})& \frac{1}{2}{\widetilde{\boldsymbol{\beta }}}^{\top }{\boldsymbol{\varPsi }^{\prime\prime }_{1}}\widetilde{\boldsymbol{\beta }}\end{array}\right),\hspace{1em}\mathbf{B}(n)=\frac{1}{n}\sum \limits_{i=1}^{n}{\hat{\boldsymbol{s}}_{i}^{}}{\hat{\boldsymbol{s}}_{i}^{\top }},\\{} & \displaystyle \hspace{1em}\hat{\boldsymbol{s}}_{i}=\left(\begin{array}{c}\psi (x_{i},y_{i};{\hat{\sigma }}^{2})\tilde{\boldsymbol{\beta }}\\{} \frac{1}{2}{\tilde{\boldsymbol{\beta }}}^{\top }\psi (x_{i},y_{i};{\hat{\sigma }}^{2})\tilde{\boldsymbol{\beta }}-\frac{1}{2}\end{array}\right),\end{array}\]
that is,
in probability.
1.4 Estimation methods
The methods of fitting an algebraic curve (or surface) to observed points can be classified as follows.
Algebraic distance methods, where the residuals in the equations for the algebraic curve are minimized. For example, the minimum point of the sum of squared residuals ${\sum _{i=1}^{n}}{(A{x_{i}^{2}}+2Bx_{i}y_{i}+C{y_{i}^{2}}+2Dx_{i}+2Ey_{i}+F)}^{2}$ (with some normalizing constraint in order to avoid $A=B=\dots =F=0$) in the conic fitting problem and ${\sum _{i=1}^{n}}{(k_{1}x_{i}+h_{1}-y_{i})}^{2}{(k_{2}x_{i}+h_{2}-y_{i})}^{2}$ in the two-line fitting problem is called the ordinary least squares (OLS) estimator.
The criterion function for the OLS estimator is simple enough and can be adjusted so that the resulting estimator is consistent (under some conditions). Such an estimator is called the adjusted least squares (ALS) estimator. The OLS and ALS estimators are method-of-moments estimators, meaning that the criterion functions for the estimators are polynomials whose coefficients are sample moments of coordinates of the observed points. Hence, the OLS and ALS estimators can be computed efficiently.
In order to obtain parameters of two lines, the observed points are fitted with a conic section, and then the parameters of the conic section are used to obtain the parameters of two lines. There are some papers where this idea is used.
The problem of estimating the fundamental matrix for two-camera view is considered in [6]. The fundamental matrix is a singular matrix whose left and right null-vectors are the coordinates of each camera in the coordinate system of the other camera. Initially, the ALS estimator of the fundamental matrix is evaluated. Then it is projected so that the estimated fundamental matrix is singular.
In [14] the problem of segmentation of a finite-dimensional vector space onto linear subspaces is considered, and the generalized principal component analysis method is introduced. The sample is fitted with an algebraic cone (a set of points that satisfy a homogeneous algebraic equation) by the OLS method. Then subspaces are extracted from the algebraic cone with use of a small learning sample. An application of segmentation of a vector space onto hyperplanes for searching planes on binocular image is given in [16].
In [15] an ellipsoid fitting problem with a constraint such that a center of the ellipsoid lies on a given line is considered. The algebraic distance with embedded constraint is minimized. The analytical (behavioral) properties of the optimization problem are studied. We consider a conic section fitting problem but with different constraint—the conic is degenerated to a couple of straight lines.
Geometric distance methods, where distances between the estimated curve and each point are minimized. The sum of squares of those distances is minimized, and the orthogonal regression (OR) estimator is obtained.
A numerical algorithm for evaluation of the orthogonal regression estimator is presented in monograph [1].
The orthogonal regression is consistent in the single straight line fitting problem [3, Section 1.3.2(a)]. In nonlinear models, the estimator may be inconsistent. There is a one-step correction procedure in explicit and implicit models [5, 9] with application in the ellipsoid fitting model [9]. However, in the two-line fitting model, the correction from [9] is unstable.
Probabilistic methods. They are used to obtain the maximum likelihood (ML) estimator and Bayes estimators.
1.5 Notation
Let $\{A_{n},\hspace{0.2778em}n=1,\hspace{0.1667em}2,\hspace{0.1667em}\dots \}$ be a sequence of random events. The random event $A_{n}$ is said to hold eventually if almost surely there exists $n_{0}$ such that $A_{n}$ occurs for all $n\ge n_{0}$. In other words, the random event $A_{n}$ holds eventually if and only if it does not occur only for finitely many n almost surely.
The estimator $\hat{\beta }$ is called asymptotically normal if $\sqrt{n}(\hat{\beta }-\beta _{\mathrm{true}})\to N(0,\varSigma )$ in distribution, were the asymptotic covariance matrix Σ may be singular, and n is the sample size. This definition differs from the conventional one adopted in asymptotic theory because here only $\sqrt{n}$-asymptotic normality is considered.
Let $\boldsymbol{\zeta }\sim N(\boldsymbol{\mu },\varSigma )$ be a bivariate random vector. Then $E=\{\boldsymbol{z}:{(\boldsymbol{z}-\boldsymbol{\mu })}^{\top }{\varSigma }^{-1}(\boldsymbol{z}-\boldsymbol{\mu })\le 1\}$ is called the 40% ellipsoid of the normal distribution because $\operatorname{\mathsf{P}}(\boldsymbol{\zeta }\in E)\approx 0.3935$. This is the ellipsoid where the probability density function is at least 0.3679 of its maximum.
1.6 Outline
In Section 2, we construct five estimators for parameters of the two line fitting model. In Section 3, we propose two definitions of the equivariance of an estimator and state that all of the five estimators are equivariant. The estimators are compared numerically in Section 4. The proofs are given in Appendix A.
2 Estimators
2.1 ALS2 estimator and its projections
The two-line fitting model is a restriction of the conic section fitting model. A couple of lines defined by the equation $(k_{1}\xi -\eta +h_{1})(k_{2}\xi -\eta +h_{2})=0$ is a degenerate conic section
with coefficients
with a constraint $C\ne 0$.
(12)
\[\begin{array}{r@{\hskip0pt}l@{\hskip10pt}r@{\hskip0pt}l}& \displaystyle A=Ck_{1}k_{2},& & \displaystyle 2D=C(k_{1}h_{2}+k_{2}h_{1}),\\{} & \displaystyle 2B=-C(k_{1}+k_{2}),\hspace{1em}& & \displaystyle 2E=-C(h_{1}+h_{2}),\\{} & & & \displaystyle F=Ch_{1}h_{2},\end{array}\]The conic section ALS2 estimator provides estimation of the error variance ${\sigma }^{2}$ and the coefficients $A,B,\dots ,F$.
Denote by $\nu (i)\in \{1,2\}$ the indicator of a line which the true point $(\xi _{i},\eta _{i})$ belongs to. Equation (1) can be rewritten as
The indicator $\nu (i)$ is nonrandom in the functional model, and it is a random variable in the structural model.
Proposition 3.
Let, in the functional model,
\[\textit{either}\hspace{1em}k_{1}\ne k_{2}\hspace{1em}\textit{or}\hspace{1em}\left\{\begin{array}{l}h_{1}\ne h_{2},\hspace{1em}\\{} \sup _{n\ge 1}\frac{1}{n}{\textstyle\sum _{i=1}^{n}}{\xi _{i}^{2}}<\infty ;\hspace{2.5pt}\textit{and}\hspace{1em}\end{array}\right.\]
\[\underset{n\to \infty }{\liminf }\lambda _{\min }\left(\frac{1}{n}\sum \limits_{\begin{array}{c} i=1,\dots ,n\\{} \nu (i)=j\end{array}}\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}1& {\xi _{i}^{}}& {\xi _{i}^{2}}\\{} {\xi _{i}^{}}& {\xi _{i}^{2}}& {\xi _{i}^{3}}\\{} {\xi _{i}^{2}}& {\xi _{i}^{3}}& {\xi _{i}^{4}}\end{array}\right)\right)>0\hspace{1em}\textit{for }j=1,\hspace{0.1667em}2.\]
Then the ASL2 estimators $\hat{\boldsymbol{\beta }}$ and ${\hat{\sigma }}^{2}$ are strongly consistent in the sense of (7) and (8).
There are two cases where the structural model is not identifiable. If the common distribution of the true points is concentrated on a straight line and on a single point (presumably not on the line), that is,
then there are many ways to fit the true points with two lines. If the common distribution of the true points is concentrated in four points, that is,
then there are three ways to fit the true points with two lines (unless three of the four points lie on a straight line, which is a particular case of (13)).
(13)
\[\exists \hspace{0.1667em}\text{line}\hspace{0.1667em}\ell \subset {\mathbb{R}}^{2}\hspace{0.2778em}\exists z\in {\mathbb{R}}^{2}:\operatorname{supp}(\xi _{1},\eta _{1})\subset \ell \cup \{z\},\]In order to estimate the parameters $k_{1}$, $h_{1}$, $k_{2}$, and $h_{2}$, we can solve Eqs. (12). With ignoring the last equation $F=Ch_{1}h_{2}$, the solution is
Substituting the elements of the ALS2 estimator $\hat{\boldsymbol{\beta }}={(\hat{A},2\hat{B},\hat{C},2\hat{D},2\hat{E},\hat{F})}^{\top }$ into the right-hand side of (15)–(17), we obtain an “ignore-$\widehat{F}$” estimator:
If the conic section estimated by the ALS2 estimator is a hyperbola, then the “ignore-$\widehat{F}$” estimate of the two lines comprises the asymptotes of the hyperbola.
Choose the sign ± in (18) such that $\hat{k}_{1}<\hat{k}_{2}$.
We need the notation
for the function that expresses the line parameters $k_{1}$, $h_{1}$, $k_{2}$, $h_{2}$ in elements of $\boldsymbol{\beta }$ and is defined by (15)–(17). With this notation, we can write
Proposition 5.
In the functional model, assume the following:
\[\underset{n\to \infty }{\liminf }\lambda _{\min }\left(\frac{1}{n}\sum \limits_{\begin{array}{c} i=1,\dots ,n\\{} \nu (i)=j\end{array}}\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}1& {\xi _{i}^{}}& {\xi _{i}^{2}}\\{} {\xi _{i}^{}}& {\xi _{i}^{2}}& {\xi _{i}^{3}}\\{} {\xi _{i}^{2}}& {\xi _{i}^{3}}& {\xi _{i}^{4}}\end{array}\right)\right)>0\hspace{1em}\textit{for }j=1,\hspace{0.1667em}2.\]
Then the “ignore-$\widehat{F}$” estimator of the parameters of two lines is strongly consistent, that is,
as $n\to \infty $ almost surely.
Now, we state the asymptotic normality of the “ignore-$\widehat{F}$” estimator.
Proposition 7.
In the functional model, assume the following:
Then the “ignore-$\widehat{F}$” estimator ${(\hat{k}_{1},\hat{h}_{1},\hat{k}_{2},\hat{h}_{2})}^{\top }$ is asymptotically normal, namely
where $\varSigma _{\tilde{\beta }}$ is the asymptotic covariance matrix of $\tilde{\beta }$, and K is the $4\times 6$ matrix of derivatives of the mapping ${(A,2B,C,2D,2E,F)}^{\top }\mapsto {(k_{1},h_{1},k_{2},h_{2})}^{\top }$ defined in (15)–(17) at the true parameters $\boldsymbol{\beta }_{\mathrm{tn}}$, that is,
(21)
\[\sqrt{n}\left(\begin{array}{c}\hat{k}_{1}-k_{1}\\{} \hat{h}_{1}-h_{1}\\{} \hat{k}_{2}-k_{2}\\{} \hat{h}_{2}-h_{2}\end{array}\right)\stackrel{\mathsf{d}}{\longrightarrow }{N}\big(0,K\varSigma _{\tilde{\beta }}{K}^{\top }\big),\]The matrix $K\varSigma _{\tilde{\beta }}{K}^{\top }$ is nonsingular.
Equation (11) represents a couple of intersecting straight lines if and only if
Denote
(22)
\[\left|\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}A& B& D\\{} B& C& E\\{} D& E& F\end{array}\right|=0\hspace{1em}\text{and}\hspace{1em}AC<{B}^{2}.\]
\[\Delta (\boldsymbol{\beta })=\left|\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}A& B& D\\{} B& C& E\\{} D& E& F\end{array}\right|=ACF+2BDE-A{E}^{2}-C{D}^{2}-{B}^{2}F,\]
\[{\Delta ^{\prime }}(\boldsymbol{\beta })=\frac{\mathrm{d}\Delta (\boldsymbol{\beta })}{\mathrm{d}{\boldsymbol{\beta }}^{\top }}=\big(CF-{E}^{2},DE-BF,AF-{D}^{2},BE-CD,BD-AE,AC-{B}^{2}\big),\]
where the function $\Delta (\boldsymbol{\beta })$ and its derivative ${\Delta ^{\prime }}(\boldsymbol{\beta })$ are evaluated at the point $\boldsymbol{\beta }={(A,2B,C,2D,2E,F)}^{\top }$.Perform one-step update of the estimator $\widetilde{\boldsymbol{\beta }}$ to make it closer to the surface $\Delta (\boldsymbol{\beta })=0$:
Then use expressions (15)–(17) to estimate $k_{1},h_{1},k_{2},h_{2}$:
(23)
\[\widetilde{\boldsymbol{\beta }}_{\mathrm{1}\mathrm{st}}=\widetilde{\boldsymbol{\beta }}-\frac{\Delta (\widetilde{\boldsymbol{\beta }})}{{\Delta ^{\prime }}(\widetilde{\boldsymbol{\beta }})\widehat{\varSigma }_{\tilde{\beta }}{\Delta ^{\prime }}{(\widetilde{\boldsymbol{\beta }})}^{\top }}\widehat{\varSigma }_{\tilde{\beta }}{\Delta ^{\prime }}{(\widetilde{\boldsymbol{\beta }})}^{\top }.\]
\[{(\hat{k}_{1,\mathrm{1}\mathrm{st}},\hat{h}_{1,\mathrm{1}\mathrm{st}},\hat{k}_{2,\mathrm{1}\mathrm{st}},\hat{h}_{2,\mathrm{1}\mathrm{st}})}^{\top }=\operatorname{lob}(\widetilde{\boldsymbol{\beta }}_{\mathrm{1}\mathrm{st}}).\]
Proposition 9.
Under the conditions of Proposition 7 in the functional model or under the conditions of Proposition 8 in the structural model, the estimator ${(\hat{k}_{1,\mathrm{1}\mathrm{st}},\hat{h}_{1,\mathrm{1}\mathrm{st}},\hat{k}_{2,\mathrm{1}\mathrm{st}},\hat{h}_{2,\mathrm{1}\mathrm{st}})}^{\top }$ is consistent and asymptotically normal, and its asymptotic covariance matrix is equal to
\[K\bigg(\varSigma _{\tilde{\beta }}-\frac{\varSigma _{\tilde{\beta }}{\Delta ^{\prime }}{(\boldsymbol{\beta }_{\mathrm{tn}})}^{\top }{\Delta ^{\prime }}(\boldsymbol{\beta }_{\mathrm{tn}})\varSigma _{\tilde{\beta }}}{{\Delta ^{\prime }}(\boldsymbol{\beta }_{\mathrm{tn}})\varSigma _{\tilde{\beta }}{\Delta ^{\prime }}{(\boldsymbol{\beta }_{\mathrm{tn}})}^{\top }}\bigg){K}^{\top }.\]
Remark 3.
The normalization of the estimator $\hat{\boldsymbol{\beta }}$ affects its asymptotic covariance matrix, and hence has effect on the estimates ${(\hat{k}_{1,\mathrm{1}\mathrm{st}},\hat{h}_{1,\mathrm{1}\mathrm{st}},\hat{k}_{2,\mathrm{1}\mathrm{st}},\hat{h}_{2,\mathrm{1}\mathrm{st}})}^{\top }$. However, the normalization does not affect the asymptotic covariance matrix of ${(\hat{k}_{1,\mathrm{1}\mathrm{st}},\hat{h}_{1,\mathrm{1}\mathrm{st}},\hat{k}_{2,\mathrm{1}\mathrm{st}},\hat{h}_{2,\mathrm{1}\mathrm{st}})}^{\top }$.
2.2 Orthogonal regression estimator
The sum of squared distances between each observed point and the closer of two lines is equal to
The orthogonal regression estimator is a Borel-measurable function of observations such that
In the functional model, the orthogonal regression estimator is the maximum likelihood estimator. However, because the dimension of parameter space grows as the sample size is increasing, the orthogonal regression estimator may be inconsistent.
2.3 Parametric maximum likelihood estimator
The estimator is constructed in the structural model, so it should be called the structural maximum likelihood estimator.
If a Gaussian distribution of a random point $(\xi ,\eta )$ is concentrated on a straight line $\eta =k\xi +h$, then it is a singular normal distribution:
where $\mu _{\xi }$ and ${\sigma _{\xi }^{2}}$ are the expectation and variance of the random variable ξ. Note that the covariance matrix ${\sigma _{\xi }^{2}}\big(\begin{array}{c@{\hskip3.33pt}c}1& k\\{} k& {k}^{2}\end{array}\big)$ is singular and positive semidefinite.
(25)
\[(\xi ,\eta )\sim N\left(\left(\begin{array}{c}\mu _{\xi }\\{} k\mu _{\xi }+h\end{array}\right),\hspace{0.1667em}\left(\begin{array}{c@{\hskip10.0pt}c}{\sigma _{\xi }^{2}}& k{\sigma _{\xi }^{2}}\\{} k{\sigma _{\xi }^{2}}& {k}^{2}{\sigma _{\xi }^{2}}\end{array}\right)\right),\]If the distribution of a random point $(\xi _{i},\eta _{i})$ is concentrated on two straight lines $\eta =k_{1}\xi +h_{1}$ and $\eta =k_{2}\xi +h_{2}$ and the distribution on each line is Gaussian, then, due to (25), the conditional distributions are
\[\big[\left(\xi _{i},\eta _{i}\right)\hspace{0.2778em}\mid \hspace{0.2778em}\hspace{-0.1667em}\nu (i)=j\big]\sim N\left(\left(\begin{array}{c}\mu _{j\xi }\\{} k_{j}\mu _{j\xi }+h_{j}\end{array}\right),\hspace{0.1667em}\left(\begin{array}{c@{\hskip10.0pt}c}{\sigma _{j\xi }^{2}}& k_{j}{\sigma _{j\xi }^{2}}\\{} k_{j}{\sigma _{j\xi }^{2}}& {k_{j}^{2}}{\sigma _{j\xi }^{2}}\end{array}\right)\right)=N(\boldsymbol{\mu }_{j},\varSigma _{0j})\]
for $j=1,\hspace{0.1667em}2$. The matrices $\varSigma _{0j}$ are positive semidefinite and singular, that is, $\lambda _{\min }(\varSigma _{01})=\lambda _{\min }(\varSigma _{02})=0$, and the points $\boldsymbol{\mu }_{j}$ are the centers of Gaussian distribution of the points on each line.The distribution of $(\xi _{i},\eta _{i})$ is a mixture of two singular normal distributions
where $p=\operatorname{\mathsf{P}}(\nu (i)=1)=\operatorname{\mathsf{P}}(\nu (1)=1)$ is the probability that the point $(\xi _{i},\eta _{i})$ lies of the first line.
(26)
\[\left(\begin{array}{c}\xi _{i}\\{} \eta _{i}\end{array}\right)\sim \text{mixture of}\left\{\begin{array}{l@{\hskip10.0pt}l}N(\boldsymbol{\mu }_{1},\varSigma _{01})\hspace{1em}& \text{with weight }p,\\{} N(\boldsymbol{\mu }_{2},\varSigma _{02})\hspace{1em}& \text{with weight }1-p,\end{array}\right.\]The distribution of the observed points is also a mixture of two Gaussian distributions
with $\varSigma _{j}=\varSigma _{0j}+{\sigma }^{2}\mathbf{I}$, where ${\sigma }^{2}$ is the error variance; see (4). Note that $\lambda _{\min }(\varSigma _{1})=\lambda _{\min }(\varSigma _{2})={\sigma }^{2}$.
(27)
\[\left(\begin{array}{c}x_{i}\\{} y_{i}\end{array}\right)\sim \text{mixture of}\left\{\begin{array}{l@{\hskip10.0pt}l}N(\boldsymbol{\mu }_{1},\varSigma _{1})\hspace{1em}& \text{with weight }p,\\{} N(\boldsymbol{\mu }_{2},\varSigma _{2})\hspace{1em}& \text{with weight }1-p,\end{array}\right.\]The likelihood function for the sample of points with a mixture of two normal distributions is
where
is the density of a bivariate normal distribution.
One method of evaluating the maximum likelihood estimator is as follows:
-
1. Find the point of conditional minimum\[(\widehat{\boldsymbol{\mu }}_{1},\widehat{\varSigma }_{1},\widehat{\boldsymbol{\mu }}_{2},\widehat{\varSigma }_{2})=\underset{\begin{array}{c} \mu _{1},\varSigma _{1},\mu _{2},\varSigma _{2}\\{} \mathrm{such}\hspace{2.5pt}\mathrm{that}\hspace{2.5pt}\lambda _{\min }(\varSigma _{1})=\lambda _{\min }(\varSigma _{2})\end{array}}{\operatorname{argmin}}\underset{p\in [0,1]}{\min }L(p,\boldsymbol{\mu }_{1},\varSigma _{1},\boldsymbol{\mu }_{2},\varSigma _{2}).\]
-
2. Set
(29)
\[{\hat{\sigma }}^{2}=\lambda _{\min }(\varSigma _{1})=\frac{1}{2}\big(\hat{\sigma }_{1xx}+\hat{\sigma }_{1yy}-\sqrt{{(\hat{\sigma }_{1xx}-\hat{\sigma }_{1yy})}^{2}+4{\hat{\sigma }_{1xy}^{2}}}\hspace{0.1667em}\big).\] -
3. Find the estimates $\hat{k}_{1}$, $\hat{h}_{1}$, $\hat{k}_{2}$, $\hat{h}_{2}$ from the equations\[\widehat{\boldsymbol{\mu }}_{j}=\left(\begin{array}{c}\mu _{jx}\\{} \hat{k}_{j}\mu _{jx}+\hat{h}_{j}\end{array}\right),\hspace{2em}\widehat{\varSigma }_{j}=\left(\begin{array}{c@{\hskip10.0pt}c}{\hat{\sigma }_{j\xi }^{2}}+{\hat{\sigma }}^{2}& \hat{k}_{j}{\hat{\sigma }_{j\xi }^{2}}\\{} \hat{k}_{j}{\hat{\sigma }_{j\xi }^{2}}& {\hat{k}_{j}^{2}}{\hat{\sigma }_{j\xi }^{2}}+{\hat{\sigma }}^{2}\end{array}\right),\]that is, set
The denominator $\hat{\sigma }_{jxx}-{\hat{\sigma }}^{2}$ may be equal to 0 with some positive probability. Occurrence of this event means that the estimated figure is a straight line and a single point outside the line rather than two straight lines.
In order to make the statement of consistency easier, assume that $k_{1}<k_{2}$ and choose the estimator such that $\hat{k}_{1}\le \hat{k}_{2}$.
2.4 RBAN moment estimator
The regular best asymptotically normal (RBAN) estimators were developed by Chiang [4]. Our RBAN moment estimator differs from the original RBAN so that not only the observed points $(x_{i},y_{i})$, but also monomials ${x_{i}^{p}}{y_{i}^{q}}$, $p+q\le 4$, are averaged.
Introduce the 14-dimensional vectors whose elements are the monomials of coordinates of observed points:
Evaluate the average and sample covariance matrix of the vectors $\boldsymbol{m}_{i}$:
Denote
\[f_{1}\big(k,h,{\sigma }^{2};{(\mu _{p})_{q=1}^{4}}\big)=\operatorname{\mathsf{E}}m(\xi +\delta ,k\xi +h+\varepsilon ),\]
where ξ, δ, and ε are independent random variables such that
\[\operatorname{\mathsf{E}}{\xi }^{q}=\mu _{q}\hspace{1em}\text{and}\hspace{1em}{(\delta ,\varepsilon )}^{\top }\sim N(0,{\sigma }^{2}\mathbf{I}).\]
Basically, the function $f_{1}$ is defined for all $\mu _{p}$, $p=1,\dots ,4$, that comprise possible 4-tuples of moments of a random variable, that is, satisfy
\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \hspace{2em}\hspace{2em}\hspace{2em}\hspace{2em}\hspace{2em}\hspace{2em}\hspace{2em}\mu _{2}-{\mu _{1}^{2}}\ge 0,\hspace{2em}\\{} & \displaystyle \big(\mu _{4}\hspace{0.1667em}-\hspace{0.1667em}4\mu _{3}\mu _{1}\hspace{0.1667em}+\hspace{0.1667em}6\mu _{2}{\mu _{1}^{2}}\hspace{0.1667em}-\hspace{0.1667em}3{\mu _{1}^{4}}\big)\big(\mu _{2}\hspace{0.1667em}-\hspace{0.1667em}{\mu _{1}^{2}}\big)\hspace{0.1667em}-\hspace{0.1667em}{\big(\mu _{3}\hspace{0.1667em}-\hspace{0.1667em}3\mu _{2}\mu _{1}\hspace{0.1667em}+\hspace{0.1667em}2{\mu _{1}^{3}}\big)}^{2}\hspace{0.1667em}-\hspace{0.1667em}{\big(\mu _{2}-\hspace{0.1667em}{\mu _{1}^{2}}\big)}^{3}\hspace{0.1667em}\ge \hspace{0.1667em}0;\hspace{2em}\end{array}\]
see [10]. However, since the elements of the vector-function $f_{1}$ are polynomials of its arguments, it can be extended to ${\mathbb{R}}^{7}$.Denote
\[\begin{array}{r@{\hskip0pt}l}& \displaystyle f_{2}\big(k_{1},h_{1},k_{2},h_{2},{\sigma }^{2};p,{\big({\mu _{q}^{(j)}}\big)_{j=1,}^{2}}{_{q=1}^{4}}\big)\\{} & \displaystyle \hspace{1em}=pf_{1}\big(k_{1},h_{1},{\sigma }^{2};{\big({\mu _{q}^{(1)}}\big)_{q=1}^{4}}\big)+(1-p)f_{1}\big(k_{2},h_{2},{\sigma }^{2};{\big({\mu _{q}^{(2)}}\big)_{q=1}^{4}}\big).\end{array}\]
In the structural model,
Consider the equation
It is a system of 14 equations in 14 variables. If (33) has a solution, then the moment estimator can be defined as one of the solutions. However, (33) may have no solution.
(33)
\[f_{2}\big(\hat{k}_{1},\hat{h}_{1},\hat{k}_{2},\hat{h}_{2},{\hat{\sigma }}^{2};\hat{p},{\big({\hat{\mu }_{q}^{(j)}}\big)_{j=1,}^{2}}{_{q=1}^{4}}\big)=\overline{\boldsymbol{m}}.\]In the rest of Section 2.4, ${\mu _{\bullet }^{(\bullet )}}={({\mu _{q}^{(j)}})_{j=1,}^{2}}{_{q=1}^{4}}$ is a $2\times 4$ matrix.
The estimator is defined as a point where ${(f_{2}(\dots )-\overline{\boldsymbol{m}})}^{\top }{\varSigma _{m}^{-1}}(f_{2}(\dots )-\overline{\boldsymbol{m}})$ attains its minimum:
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \big(& \displaystyle \hat{k}_{1},\hat{h}_{1},\hat{k}_{2},\hat{h}_{2},{\hat{\sigma }}^{2}\big)\\{} & \displaystyle =\underset{k_{1},\dots ,{\sigma }^{2}}{\operatorname{argmin}}\underset{p,{\mu _{\bullet }^{(\bullet )}}}{\min }{\big(f_{2}\big(k_{1},\dots ,{\sigma }^{2};p,{\mu _{\bullet }^{(\bullet )}}\big)\hspace{0.1667em}-\hspace{0.1667em}\overline{\boldsymbol{m}}\big)}^{\top }{\varSigma _{m}^{-1}}\big(f_{2}\big(k_{1},\dots ,{\sigma }^{2};p,{\mu _{\bullet }^{(\bullet )}}\big)\hspace{0.1667em}-\hspace{0.1667em}\overline{\boldsymbol{m}}\big).\end{array}\]
This minimization problem is similar to that in Theorem 6 in [4]. The minimum
\[\underset{p\in \mathbb{R}}{\min }\underset{{\mu _{\bullet }^{(\bullet )}}\in {\mathbb{R}}^{2\times 4}}{\min }{\big(f_{2}\big(k_{1},\dots ,{\sigma }^{2};p,{\mu _{\bullet }^{(\bullet )}}\big)-\overline{\boldsymbol{m}}\big)}^{\top }{\varSigma _{m}^{-1}}{\big(f_{2}\big(k_{1},\dots ,{\sigma }^{2};p,{\mu _{\bullet }^{(\bullet )}}\big)-\overline{\boldsymbol{m}}\big)}^{\top }\]
can be evaluated explicitly, and this allows us to reduce the dimension of minimization problem. The reduction of dimension of the optimization problem was used, for example, in [8].The routines evaluating the RBAN-moment estimator and the estimator for its covariance matrix are developed without rigid theoretical basis; see Section 4.3.
3 Equivariance
3.1 Two definitions of equivariance
The similarity transformation of ${\mathbb{R}}^{2}$ is
where U is an orthogonal matrix, $K\ne 0$ is a scaling coefficient, and $\Delta \boldsymbol{z}\in {\mathbb{R}}^{2}$ is an intercept.
(34)
\[g(\boldsymbol{z})=K\mathbf{U}\boldsymbol{z}+\Delta \boldsymbol{z},\hspace{1em}\boldsymbol{z}\in {\mathbb{R}}^{2},\]The transformation of a sample of points acting elementwise is also denoted $g(Z)$: if $Z=\{\boldsymbol{z}_{i},\hspace{0.2778em}i=1,\dots ,n\}$, then $g(Z)=\{g(\boldsymbol{z}_{i}),\hspace{0.2778em}i=1,\dots ,n\}$.
Hereafter, we use vector notation: the observed points are denoted $\boldsymbol{z}_{i}={(x_{i},y_{i})}^{\top }$, and the true points are denoted $\boldsymbol{\zeta }_{i}={(\xi _{i},\eta _{i})}^{\top }$.
The underlying statistical structure is $({\mathbb{R}}^{n\times 2},\mathcal{B}({\mathbb{R}}^{n\times 2}),P_{Z\mid \theta },\theta \in \varTheta )$, where $Z\in {\mathbb{R}}^{n\times 2}$ is the observed sample, $Z=\{\boldsymbol{z}_{i},\hspace{0.2778em}i=1,\dots ,n\}$, $\mathcal{B}({\mathbb{R}}^{n\times 2})$ is the Borel σ-field, and θ is a parameter that uniquely identifies the distribution of the observed points; $\theta =(\boldsymbol{\zeta }_{1},\dots ,\boldsymbol{\zeta }_{n};{\sigma }^{2})$ in the functional model, and $\theta =(P_{\zeta };{\sigma }^{2})$ in the structural model. Here $\boldsymbol{\zeta }_{1},\dots ,\boldsymbol{\zeta }_{n}$ are points located on two strait lines, and $P_{\zeta }$ is a probability measure concentrated on two straight lines.
The statistical structure is invariant with respect to transformation g if the change of the probability measure induced by the transformation of the sample can be obtained by some transformation $\tilde{g}$ of parameters, that is, if there exists a bijection $\tilde{g}:\varTheta \to \varTheta $ such that
Here $P_{g(Z)\mid \theta }$ is the induced probability measure; it is sometimes denoted $P_{g(Z)\mid \theta }=P_{Z\mid \theta }{g}^{-1}$.
The statistical structure is similarity invariant if it is invariant with respect to all similarity transformations of the form (34).
In order to become similarity invariant, the underlying statistical structure needs some extension. We assume that the true points lie on two lines, which may be parallel to the y-axis. The following restrictions do not ruin the invariance:
-
• The true lines $\ell _{1}$ and $\ell _{2}$ intersect each other but do not coincide.
-
• The true points $\boldsymbol{\zeta }_{1}\dots ,\boldsymbol{\zeta }_{n}$ in the functional model or the set $\operatorname{supp}(P_{\zeta })$ where the true points are concentrated in the structural model can be covered with two lines uniquely. In the structural model, this means that the nonidentifiability conditions (13) and (14) do not hold.
With these restrictions, the statistical structure is invariant with
in the functional model and
in the structural model.
Let $\ell \ell (\theta )$ and $\operatorname{sigma2}(\theta )$ be functions that extract the parameters of interest. If, in the functional model, $\theta =(\boldsymbol{\zeta }_{1},\dots ,\boldsymbol{\zeta }_{n};{\sigma }^{2})$ and points $\boldsymbol{\zeta }_{1},\dots ,\boldsymbol{\zeta }_{n}$ lie on the lines $\ell _{1}$ and $\ell _{2}$ or if, in the structural model, $\theta =(P_{\zeta };{\sigma }^{2})$ and the probability measure is concentrated on the union of two lines $\ell _{1}\cup \ell _{2}$, then $\ell \ell (\theta )=\{\ell _{1},\ell _{2}\}$. If $\theta =(\dots ;{\sigma }^{2})$, then $\operatorname{sigma2}(\theta )={\sigma }^{2}$.
We treat $\{\ell _{1},\ell _{2}\}$ as an unordered couple, that is, $\{\ell _{1},\ell _{2}\}=\{\ell _{2},\ell _{1}\}$.
The transformation of the lines parameters and the transformation of ${\sigma }^{2}$ do not interfere each other, and these transformations are not interfered by a particular location or distribution of true points on the lines, that is, the parameter transformation $\tilde{g}$ is such that there exists transformations $\tilde{g}_{\ell \ell }$ and $\tilde{g}_{{\sigma }^{2}}$ such that
\[\operatorname{sigma2}\big(\tilde{g}(\theta )\big)=\tilde{g}_{{\sigma }^{2}}\big(\operatorname{sigma2}(\theta )\big).\]
These transformations are
The estimator is called equivariant with respect to the transformation g if, when the data are transformed, the estimator follows the inducing transformation of parameters. The estimator $\widehat{\ell \ell }(Z)$ for two lines and the estimator ${\hat{\sigma }}^{2}(Z)$ for error variance are equivariant with respect to similarity transformation g if The estimator is called similarity equivariant if it is equivariant with respect to any similarity transformation g.
In a fitting problem, an estimator for a “true figure” is called fitting equivariant with respect to transformation $g(\boldsymbol{z})$, $g:\mathbb{R}\to \mathbb{R}$ if, when the sample is transformed, the estimated “true figure” follows the same transformation g. An estimator is called similarity fitting equivariant if it is fitting equivariant with respect to any similarity transformation.
In the two-line fitting problem, denote by $\cup \{\ell _{1},\ell _{2}\}=\ell _{1}\cup \ell _{2}$ the union of a pair of two lines. An estimator $\widehat{\ell \ell }(Z)$ is similarity fitting equivariant if and only if for any similarity transformation $g(\boldsymbol{z})$,
The similarity fitting equivariant estimator depends on geometry of the plane and does not depend on the Cartesian coordinate system used.
Because of (35), in the two-line fitting model, the estimator for two lines $\widehat{\ell \ell }(Z)$ is similarity equivariant if and only if it is similarity fitting equivariant.
3.2 Similarity equivariance of the five estimators
Some troubles, which may arise during estimation, are not addressed yet.
-
• The estimation may fail with small positive probability. For example, the conic section estimated with the ALS2 estimator is an ellipse with some positive probability, and if it is, then the “ignore-$\widehat{F}$” estimator fails. (If the estimator is consistent, then the failure probability tends to 0 as $n\to \infty $).
-
• The estimation may fail, for example, because the estimated line should be parallel to the y-axis, but the estimating procedure does not handle such case.
-
• The optimization problem may have multiple extremal points. For the ALS2 estimator, it may occur that $\dim \{\beta :\varPsi _{n}({\hat{\sigma }}^{2})\beta =0\}>1$.
In order to define the equivariance of an unreliable estimator, we allow that the estimators fail simultaneously in both sides of (36), (37), or (38). Also, we allow that for fixed similarity transformation $g(\boldsymbol{z})$, equation (36), (37), or (38) does not hold with probability 0.
The equivariance of the ALS2 estimator in the conic section fitting problem is verified in [11, Section 5.5] (see Theorem 30 there for similarity fitting equivariance). That implies the equivariance of the “ignore-$\widehat{F}$” estimator.
In order to make the updated before ignore-$\widehat{F}$ step estimator equivariant, we use normalization of the ALS2 estimator (9) rather than $\| \hat{\boldsymbol{\beta }}\| =1$.
The orthogonal regression estimator and the parametric maximum likelihood estimator are maximum likelihood estimators, but in different models. Thus, they are equivariant.
The criterion function for the RBAN-moment estimator is similarity invariant. This means that the criterion function does not change when the data sample follows a similarity transformation and the parameters follow the inducing transformation. Thus, the RBAN-moment estimator is equivariant.
3.3 An example of equivariant but not fitting equivariant estimator
Consider a further restriction of the mixture-of-two-normal-distributions model from Section 2.3. Assume that covariance matrices of $\varSigma _{1}$ and $\varSigma _{2}$ have the same diagonal entries but additive inverse off-diagonal entries:
\[\varSigma _{1}=\left(\begin{array}{c@{\hskip10.0pt}c}{\sigma _{\xi }^{2}}+{\sigma }^{2}& -k{\sigma _{\xi }^{2}}\\{} -k{\sigma _{\xi }^{2}}& {k}^{2}{\sigma _{\xi }^{2}}+{\sigma }^{2}\end{array}\right)\hspace{1em}\text{and}\hspace{1em}\varSigma _{2}=\left(\begin{array}{c@{\hskip10.0pt}c}{\sigma _{\xi }^{2}}+{\sigma }^{2}& k{\sigma _{\xi }^{2}}\\{} k{\sigma _{\xi }^{2}}& {k}^{2}{\sigma _{\xi }^{2}}+{\sigma }^{2}\end{array}\right).\]
The statistical structure is invariant in scaling of the y-coordinate, $(x_{\mathrm{new}},y_{\mathrm{new}})=(x_{\mathrm{old}},\hspace{0.2222em}ry_{\mathrm{old}})$, $r>0$. This transformation maps the lines $y=-kx+h_{1}$ and $y=kx+h_{2}$ onto the lines $y=-rkx+rh_{1}$ and $y=rkx+rh_{2}$, respectively. The maximum likelihood estimator in this model is equivariant. However, this equivariance is somewhat strange. The transformation of parameters that induces the scaling of the y-coordinate of the observed points does not induce the same transformation of the true points nor the same mapping of the true lines. The estimated lines follow the transformation of parameters rather than the transformation of observed points. This is illustrated in Fig. 1.
Fig. 1.
Two samples, one of points $(x,y_{\mathrm{old}})$ (the unmodified sample) and one of points $(x,y_{\mathrm{new}})=(x,\frac{1}{2}y_{\mathrm{old}})$ (the shrunken-in-y sample), are fitted with two lines (the estimated lines are the solid lines on the figures). The estimated lines for the unmodified sample (blue solid lines on the left figure) when scaled with the same transformation as the observed points are scaled (blue dashed line on the right figure) do not coincide with the actually estimated lines for the shrunken-in-y sample (red solid lines on the right figure). The ellipsoids are the 40% ellipsoids of the estimated normal distributions (the compound distributions of the estimated mixture distributions) (color figure online)
Let $k_{\mathrm{old}}$ be the true value of the parameter k before the transformation. Then after the transformation, the value of the parameter is
\[k_{\mathrm{new}}=\frac{t+\sqrt{{t}^{2}+4{r}^{2}{k_{\mathrm{old}}^{2}}{\sigma _{\xi \hspace{0.1667em}\mathrm{old}}^{4}}}}{2r{k_{\mathrm{old}}^{}}{\sigma _{\xi \hspace{0.1667em}\mathrm{old}}^{2}}}\]
with $t=({r}^{2}{k_{\mathrm{old}}^{2}}-1)\hspace{0.1667em}{\sigma _{\xi \hspace{0.1667em}\mathrm{old}}^{2}}+({r}^{2}-1)\hspace{0.1667em}{\sigma _{\mathrm{old}}^{2}}$. If $0<{r}^{2}\ne 1$, $k_{\mathrm{old}}\ne 0$, and ${\sigma }^{2}>0$, then $k_{\mathrm{new}}\ne rk_{\mathrm{old}}$. Hence, the maximum likelihood estimator is not fitting equivariant with respect to scaling of the y-coordinate here.4 Simulations
4.1 Simulation setup
A sample of the true points $(\xi _{i},\eta _{i})$, $i=1,\dots ,n$, is generated from a random distribution concentrated on (a subset of) two lines. Three distributions of the true points are used; see Fig. 2:
Fig. 2.
Three distributions of the true points: a mixture of two singular normal distributions, a discrete distribution, and a uniform distribution on two line segments. For the first case, a sample of 1000 points is plotted, whereas for the second and third cases, the support of the distribution of the true points is plotted. For the first case, the distribution of the observed points is a mixture of normal distributions, and 40% ellipsoids for its components are plotted
These three distributions of true points are concentrated on the same two lines
which intersect one another at the point $(-0.08,0.31)$.
For the same sample of true points $\{(\xi _{i},\eta _{i}),\hspace{0.2778em}i=1,\dots ,n\}$, 100 samples of the measurement errors $\{(\delta _{i},\varepsilon _{i}),\hspace{0.2778em}i=1,\dots ,n\}$, ${(\delta _{i},\varepsilon _{i})}^{\top }\sim N(0,{\sigma }^{2})$, are simulated, and 100 samples of the observed points $(x_{i},y_{i})$ are obtained; see (2) and (3). For each sample of the observed points, the estimates of the parameters of the true lines were evaluated with the following five methods: two ALS2-based estimators (the ignore-$\widehat{F}$ estimator and the estimator with one-step update of the ALS2 estimator before the ignore-$\widehat{F}$ step), the orthogonal regression estimator, the parametric maximum likelihood estimator, and the RBAN moment estimator.
For each estimated couple of lines, the point of their intersection is found. The 100 estimates of intersection points are averaged, and their sample standard deviations are evaluated. For the ALS2-based estimators and the RBAN moment estimator, the standard errors of the estimators are also evaluated.
4.2 Notes on computation of particular estimators
For computation of the orthogonal regression estimator, the k-means method is used. Initially, two lines were chosen randomly. Then classification and mean steps are alternated. On the classification step, the observed points are split into two clusters based on which line is closer to the point. (The first cluster contains all the observed points that are closer to the first line than to the second line, and the second cluster contains the other observed points.) On the means step, each cluster is fitted with a straight line by the orthogonal regression method (the two lines are updated). The algorithm is completed when the classification step does now change the clusters. The obtained parameters of the two lines deliver a local minimum to the criterion function $Q(k_{1},h_{1},k_{2},h_{2})$ (24). Trying to obtain the global minimum, the algorithm is restarted several times with different initial two lines.
For computation of the parametric maximum likelihood estimator, the expectation–maximization algorithm [2] is used. The equation for optimization problem of finding a minimum of the likelihood function $L(p,\boldsymbol{\mu }_{1},\varSigma _{1},\boldsymbol{\mu }_{2},\varSigma _{2})$ (28) such that $\lambda _{\min }(\varSigma _{1})=\lambda _{\min }(\varSigma _{2})$ is
\[\sum \limits_{i=1}^{n}\frac{\phi _{N(\mu _{1},\varSigma _{1})}(x_{i},y_{i})-\phi _{N(\mu _{2},\varSigma _{2})}(x_{i},y_{i})}{p\phi _{N(\mu _{1},\varSigma _{1})}(x_{i},y_{i})+(1-p)\phi _{N(\mu _{2},\varSigma _{2})}(x_{i},y_{i})}=0,\]
\[\sum \limits_{i=1}^{n}\frac{p\frac{\partial \phi _{N(\mu _{1},\varSigma _{1})}(x_{i},y_{i})}{\partial \textit{par}}+(1-p)\frac{\partial \phi _{N(\mu _{2},\varSigma _{2})}(x_{i},y_{i})}{\partial \textit{par}}}{p\phi _{N(\mu _{1},\varSigma _{1})}(x_{i},y_{i})+(1-p)\phi _{N(\mu _{2},\varSigma _{2})}(x_{i},y_{i})}=0,\]
where $\textit{par}$ is a vector parameterization of $(\boldsymbol{\mu }_{1},\varSigma _{1},\boldsymbol{\mu }_{2},\varSigma _{2})$ such that $\lambda _{\min }(\varSigma _{1})=\lambda _{\min }(\varSigma _{2})$. Hence, the maximum likelihood estimator is a stationary point of the function
with fixed
The EM algorithm is iterative. Once the $(m-1)$th approximation $({p}^{(m-1)},{\mu _{1}^{(m-1)}},{\varSigma _{1}^{(m-1)}},{\mu _{2}^{(m-1)}},{\varSigma _{2}^{(m-1)}})$ is obtained, the weights are evaluated:
\[{w_{i}^{(m-1)}}=\frac{{p}^{(m-1)}\phi _{N({\mu _{1}^{(m-1)}},{\varSigma _{1}^{(m-1)}})}(x_{i},y_{i})}{{p}^{(m-1)}\phi _{N({\mu _{1}^{(m-1)}},{\varSigma _{1}^{(m-1)}})}(x_{i},y_{i})+(1-{p}^{(m-1)})\phi _{N({\mu _{2}^{(m-1)}},{\varSigma _{2}^{(m-1)}})}(x_{i},y_{i})}.\]
Then mth approximation $({p}^{(m)},{\mu _{1}^{(m)}},{\varSigma _{1}^{(m)}},{\mu _{2}^{(m)}},{\varSigma _{2}^{(m)}})$ is obtained by minimizing
\[\sum \limits_{i=1}^{n}\big({w_{i}^{(m-1)}}\ln \big(p\phi _{N(\mu _{1},\varSigma _{1})}(x_{i},y_{i})\big)+\big(1-{w_{i}^{(m-1)}}\big)\ln \big((1-p)\phi _{N(\mu _{2},\varSigma _{2})}(x_{i},y_{i})\big)\big)\]
under the constraint $\lambda _{\min }(\varSigma _{1})=\lambda _{\min }(\varSigma _{2})$. The point where the minimum is attained can be explicitly expressed in ${w_{i}^{(m-1)}}$, $x_{i}$, and $y_{i}$, $i=1,\dots ,n$.4.3 RBAN-moment estimator
In case the criterion function
\[\begin{array}{r@{\hskip0pt}l}\displaystyle Q(\theta )& \displaystyle =Q\big(k_{1},h_{1},k_{2},h_{2},{\sigma }^{2}\big)\\{} & \displaystyle =\underset{p\in \mathbb{R}}{\min }\underset{\mathbf{M}\in {\mathbb{R}}^{2\times 4}}{\min }{\big(f_{2}(\theta ;p,\mathbf{M})-\overline{\boldsymbol{m}}\big)}^{\top }{\varSigma _{m}^{-1}}\big(f_{2}(\theta ;p,\mathbf{M})-\overline{\boldsymbol{m}}\big)\end{array}\]
has multiple minima, a consistent estimator—that is, the “ignore-$\widehat{F}$” estimator—is used as the initial point, and the criterion function $Q(\theta )$ is searched for a local minimum nearby. Here $\theta ={(k_{1},h_{1},k_{2},h_{2},{\sigma }^{2})}^{\top }$ is a vector meaning the parameters of interest.The knowledge or misspecification of the parameter p does not affect the estimator for the parameters of interest $k_{1}$, …, ${\sigma }^{2}$. Thus, for estimation of the asymptotic covariance matrix, assume $p=0.5$ to be known. The estimator of the asymptotic covariance matrix of $(\theta ,\mathbf{M})$ is
\[\widehat{\varSigma }_{\theta ,\mathbf{M}}=\big({f^{\prime }_{2}}{(\hat{\theta };\hspace{0.1667em}0.5,\widehat{\mathbf{M}})}^{\top }{\varSigma _{m}^{-1}}{f^{\prime }_{2}}{(\hat{\theta };\hspace{0.1667em}0.5,\widehat{\mathbf{M})}\big)}^{-1},\]
where
\[{f^{\prime }_{2}}(\theta ;0.5,\mathbf{M})=\left(\begin{array}{c@{\hskip10.0pt}c}\frac{\partial f_{2}(\theta ;\hspace{0.1667em}0.5,\mathbf{M})}{\partial {\theta }^{\top }},& \frac{\partial f_{2}(\theta ;\hspace{0.1667em}0.5,\mathbf{M})}{\partial {(\operatorname{vec}\mathbf{M})}^{\top }}\end{array}\right).\]
The estimator of the asymptotic covariance matrix of θ is the principal submatrix of $\widehat{\varSigma }_{\theta ,\mathbf{M}}$.4.4 Simulation results
Average of estimated centers over 100 simulations, standard deviations over 100 simulations, and medians of estimated standard errors are presented in Tables 1–3.
Table 1.
Means, standard deviations, and median standard errors of the estimates of intersection points for true points having mixture of singular normal distributions
Method | Means | Standard deviations | Standard errors | |||
True value | −0.08 | 0.31 | ||||
$n=1000,\hspace{2em}\sigma =0.1\hspace{1em}({\sigma }^{2}=0.01)$ | ||||||
Ignore-$\widehat{F}$ | −0.1098 | 0.2753 | 0.8611 | 0.3866 | 0.1918 | 0.1125 |
Update | −0.0820 | 0.2912 | 0.0706 | 0.0620 | 0.0437 | 0.0479 |
OR | 0.6533 | 3.4524 | 0.0877 | 0.6783 | ||
ML | −0.0795 | 0.3077 | 0.0326 | 0.0269 | ||
RBAN | 0.0647 | 0.3759 | 0.3563 | 0.2606 | 0.0350 | 0.0438 |
$n=10000,\hspace{2em}\sigma =0.1\hspace{1em}({\sigma }^{2}=0.01)$ | ||||||
Ignore-$\widehat{F}$ | −0.0909 | 0.3052 | 0.0646 | 0.0308 | 0.0601 | 0.0303 |
Update | −0.0796 | 0.3080 | 0.0127 | 0.0156 | 0.0124 | 0.0155 |
OR | 0.5492 | 3.1488 | 0.0175 | 0.1701 | ||
ML | −0.0776 | 0.3083 | 0.0103 | 0.0088 | ||
RBAN | −0.0789 | 0.3100 | 0.0126 | 0.0154 | 0.0127 | 0.0154 |
$n=100000,\hspace{2em}\sigma =0.1$ | ||||||
Ignore-$\widehat{F}$ | −0.0799 | 0.3101 | 0.0211 | 0.0095 | 0.0188 | 0.0093 |
Update | −0.0801 | 0.3098 | 0.0037 | 0.0042 | 0.0039 | 0.0047 |
OR | 0.5606 | 3.2041 | 0.0063 | 0.0484 | ||
ML | −0.0801 | 0.3101 | 0.0030 | 0.0025 | ||
RBAN | −0.0801 | 0.3101 | 0.0038 | 0.0042 | 0.0039 | 0.0048 |
$n=1000,\hspace{2em}\sigma =0.02$ | ||||||
Ignore-$\widehat{F}$ | −0.0799 | 0.3099 | 0.0151 | 0.0075 | 0.0147 | 0.0072 |
Update | −0.0795 | 0.3097 | 0.0052 | 0.0051 | 0.0052 | 0.0049 |
OR | −0.0792 | 0.3092 | 0.0050 | 0.0044 | ||
ML | −0.0794 | 0.3093 | 0.0048 | 0.0043 | ||
RBAN | −0.0797 | 0.3098 | 0.0063 | 0.0057 | 0.0052 | 0.0049 |
Table 2.
Means, standard deviations, and median standard errors of the estimates of intersection points for discrete distribution of the true points
Method | Means | Standard deviations | Standard errors | |||
True value | −0.08 | 0.31 | ||||
$n=1000,\hspace{2em}\sigma =0.1$ | ||||||
Ignore-$\widehat{F}$ | −0.0699 | 0.3077 | 0.0263 | 0.0290 | 0.0241 | 0.0263 |
Update | −0.0722 | 0.3116 | 0.0197 | 0.0186 | 0.0203 | 0.0175 |
OR | −0.0755 | 0.3188 | 0.0148 | 0.0144 | ||
ML | −0.0958 | 0.3315 | 0.0131 | 0.0120 | ||
RBAN | −0.0717 | 0.3105 | 0.0209 | 0.0175 | 0.0205 | 0.0178 |
$n=10000,\hspace{2em}\sigma =0.1$ | ||||||
Ignore-$\widehat{F}$ | −0.0783 | 0.3109 | 0.0092 | 0.0078 | 0.0080 | 0.0083 |
Update | −0.0785 | 0.3114 | 0.0071 | 0.0054 | 0.0065 | 0.0061 |
OR | −0.0721 | 0.3157 | 0.0048 | 0.0046 | ||
ML | −0.0931 | 0.3278 | 0.0043 | 0.0035 | ||
RBAN | −0.0786 | 0.3113 | 0.0071 | 0.0053 | 0.0065 | 0.0061 |
$n=100000,\hspace{2em}\sigma =0.1$ | ||||||
Ignore-$\widehat{F}$ | −0.0798 | 0.3098 | 0.0031 | 0.0024 | 0.0026 | 0.0027 |
Update | −0.0799 | 0.3099 | 0.0025 | 0.0016 | 0.0021 | 0.0019 |
OR | −0.0715 | 0.3151 | 0.0017 | 0.0013 | ||
ML | −0.0932 | 0.3283 | 0.0013 | 0.0011 | ||
RBAN | −0.0799 | 0.3099 | 0.0024 | 0.0017 | 0.0021 | 0.0019 |
$n=1000,\hspace{2em}\sigma =0.02$ | ||||||
Ignore-$\widehat{F}$ | −0.0796 | 0.3094 | 0.0033 | 0.0032 | 0.0036 | 0.0033 |
Update | −0.0798 | 0.3097 | 0.0030 | 0.0024 | 0.0033 | 0.0023 |
OR | −0.0782 | 0.3086 | 0.0021 | 0.0018 | ||
ML | −0.0786 | 0.3087 | 0.0019 | 0.0018 | ||
RBAN | −0.0796 | 0.3092 | 0.0030 | 0.0028 | 0.0033 | 0.0024 |
Table 3.
Means, standard deviations, and median standard errors of the estimates of intersection points for uniform distribution of the true points on two line segments
Method | Means | Standard deviations | Standard errors | |||
True value | −0.08 | 0.31 | ||||
$n=1000,\hspace{2em}\sigma =0.1$ | ||||||
Ignore-$\widehat{F}$ | −0.0785 | 0.3122 | 0.0363 | 0.0274 | 0.0318 | 0.0301 |
Update | −0.0794 | 0.3140 | 0.0216 | 0.0258 | 0.0205 | 0.0290 |
OR | −0.0616 | 0.3127 | 0.0185 | 0.0167 | ||
ML | −0.0934 | 0.3118 | 0.0116 | 0.0111 | ||
RBAN | −0.0807 | 0.3126 | 0.0219 | 0.0293 | 0.0193 | 0.0292 |
$n=10000,\hspace{2em}\sigma =0.1$ | ||||||
Ignore-$\widehat{F}$ | −0.0796 | 0.3107 | 0.0103 | 0.0103 | 0.0099 | 0.0095 |
Update | −0.0796 | 0.3110 | 0.0067 | 0.0103 | 0.0065 | 0.0094 |
OR | −0.0639 | 0.3087 | 0.0064 | 0.0049 | ||
ML | −0.0904 | 0.3106 | 0.0042 | 0.0033 | ||
RBAN | −0.0797 | 0.3107 | 0.0066 | 0.0104 | 0.0064 | 0.0095 |
$n=100000,\hspace{2em}\sigma =0.1$ | ||||||
Ignore-$\widehat{F}$ | −0.0798 | 0.3098 | 0.0035 | 0.0030 | 0.0032 | 0.0030 |
Update | −0.0798 | 0.3098 | 0.0021 | 0.0029 | 0.0020 | 0.0030 |
OR | −0.0625 | 0.3085 | 0.0015 | 0.0014 | ||
ML | −0.0891 | 0.3107 | 0.0012 | 0.0011 | ||
RBAN | −0.0796 | 0.3097 | 0.0023 | 0.0030 | 0.0020 | 0.0030 |
$n=1000,\hspace{2em}\sigma =0.02$ | ||||||
Ignore-$\widehat{F}$ | −0.0799 | 0.3100 | 0.0041 | 0.0032 | 0.0041 | 0.0035 |
Update | −0.0798 | 0.3101 | 0.0033 | 0.0032 | 0.0032 | 0.0034 |
OR | −0.0803 | 0.3103 | 0.0023 | 0.0021 | ||
ML | −0.0805 | 0.3101 | 0.0022 | 0.0020 | ||
RBAN | −0.0798 | 0.3100 | 0.0035 | 0.0033 | 0.0032 | 0.0034 |
Using the estimator $\widehat{F}$ (by one-step update before ignore-$\widehat{F}$ step), we improve the precision of estimation. The precision of the RBAN-moment estimator approximates the precision of the updated before ignore-$\widehat{F}$ step estimator.
The parametric maximum likelihood estimator is the best when the normality condition, which was assumed during construction of the estimator, is satisfied. Otherwise, it is biased.
The orthogonal regression and the maximum likelihood estimators are good for small error variance (${\sigma }^{2}={0.02}^{2}$). For ${\sigma }^{2}=0.{1}^{2}$, the orthogonal regression estimator is broken down when the distribution of true points is a mixture of two normal distributions and is biased for the two other distributions of true points.
Mean-square deviance of the intersection of the estimated lines from the true intersection point is presented in Table 4.
Table 4.
Mean-square distances between estimated and true intersection points
n | σ | Ignore-$\widehat{F}$ | Update | OR | ML | RBAN |
Distribution of true points is a mixture of normals | ||||||
1000 | 0.1 | 0.9403 | 0.0954 | 3.2978 | 0.0421 | 0.6124 |
10000 | 0.1 | 0.0722 | 0.0201 | 2.9127 | 0.0138 | 0.0199 |
100000 | 0.1 | 0.0230 | 0.0056 | 2.9645 | 0.0038 | 0.0056 |
1000 | 0.02 | 0.0168 | 0.0073 | 0.0067 | 0.0065 | 0.0084 |
Discrete distribution of true points | ||||||
1000 | 0.1 | 0.0403 | 0.0281 | 0.0228 | 0.0320 | 0.0284 |
10000 | 0.1 | 0.0121 | 0.0091 | 0.0118 | 0.0228 | 0.0090 |
100000 | 0.1 | 0.0039 | 0.0029 | 0.0101 | 0.0226 | 0.0029 |
1000 | 0.02 | 0.0046 | 0.0038 | 0.0036 | 0.0032 | 0.0042 |
Uniform distribution of true points | ||||||
1000 | 0.1 | 0.0453 | 0.0367 | 0.0310 | 0.0209 | 0.0365 |
10000 | 0.1 | 0.0145 | 0.0123 | 0.0181 | 0.0117 | 0.0123 |
100000 | 0.1 | 0.0046 | 0.0036 | 0.0177 | 0.0093 | 0.0037 |
1000 | 0.02 | 0.0052 | 0.0046 | 0.0031 | 0.0030 | 0.0048 |
For small errors, the RBAN-moment estimator is a bit less accurate than the updated before ignore-$\widehat{F}$ step estimator. For ${\sigma }^{2}=0.{1}^{2}$, the difference is negligible.
The parametric maximum likelihood estimator has the smallest deviation from the true value, except for the discrete distribution of true points and ${\sigma }^{2}=0.01$.
For small errors (${\sigma }^{2}={0.02}^{2}$), the orthogonal regression estimator outperforms the consistent estimators and has the deviation approximately as small as the parametric maximum likelihood estimator.
Normalization of the estimator of $\boldsymbol{\beta }$ affects the ALS2-based estimator of two lines with one-step update before the ignore-$\widehat{F}$ step. With normalization $\| \hat{\boldsymbol{\beta }}\| =1$, the derived estimator of two lines is not equivariant, whereas with normalization ${\tilde{\boldsymbol{\beta }}}^{\top }{\boldsymbol{\varPsi }^{\prime }_{n}}({\hat{\sigma }}^{2})\tilde{\boldsymbol{\beta }}=-n$, the derived estimator is equivariant. Comparison of equivariant and nonequivariant versions of the estimator is displayed in Table 5.
Table 5.
Comparison of two versions (equivariant (ev) and nonequivariant (ne)) of the updated before ignore-$\widehat{F}$ step estimator
n | σ | Ver. | Means | Standard deviations | Standard errors | |||
True value: | −0.08 | 0.31 | ||||||
Distribution of true points is a mixture of normals | ||||||||
1000 | 0.1 | ev | −0.082046 | 0.291175 | 0.070617 | 0.062003 | 0.043713 | 0.047939 |
ne | −0.044038 | 0.247382 | 0.251372 | 0.514473 | 0.038853 | 0.050167 | ||
10000 | 0.1 | ev | −0.079623 | 0.308039 | 0.012652 | 0.015582 | 0.012403 | 0.015471 |
ne | −0.085055 | 0.304177 | 0.015924 | 0.018695 | 0.012506 | 0.015550 | ||
100000 | 0.1 | ev | −0.080137 | 0.309780 | 0.003710 | 0.004173 | 0.003925 | 0.004749 |
ne | −0.080991 | 0.309386 | 0.003880 | 0.004255 | 0.003926 | 0.004742 | ||
1000 | 0.02 | ev | −0.079548 | 0.309703 | 0.005156 | 0.005131 | 0.005174 | 0.004891 |
ne | −0.079918 | 0.309508 | 0.005247 | 0.005149 | 0.005179 | 0.004918 | ||
Discrete distribution of true points | ||||||||
1000 | 0.1 | ev | −0.072202 | 0.311648 | 0.019709 | 0.018553 | 0.020266 | 0.017500 |
ne | −0.071460 | 0.312230 | 0.020049 | 0.018740 | 0.020213 | 0.017457 | ||
10000 | 0.1 | ev | −0.078482 | 0.311377 | 0.007087 | 0.005371 | 0.006518 | 0.006066 |
ne | −0.078418 | 0.311436 | 0.007090 | 0.005387 | 0.006520 | 0.006054 | ||
100000 | 0.1 | ev | −0.079868 | 0.309929 | 0.002460 | 0.001647 | 0.002060 | 0.001900 |
ne | −0.079863 | 0.309934 | 0.002461 | 0.001647 | 0.002060 | 0.001901 | ||
1000 | 0.02 | ev | −0.079772 | 0.309728 | 0.002967 | 0.002376 | 0.003320 | 0.002344 |
ne | −0.079755 | 0.309732 | 0.002963 | 0.002375 | 0.003319 | 0.002339 | ||
Uniform distribution of true points | ||||||||
1000 | 0.1 | ev | −0.079405 | 0.313977 | 0.021551 | 0.025759 | 0.020451 | 0.028992 |
ne | −0.078507 | 0.315350 | 0.022219 | 0.026091 | 0.020512 | 0.029115 | ||
10000 | 0.1 | ev | −0.079604 | 0.311024 | 0.006673 | 0.010337 | 0.006467 | 0.009389 |
ne | −0.079576 | 0.311176 | 0.006685 | 0.010349 | 0.006456 | 0.009372 | ||
100000 | 0.1 | ev | −0.079795 | 0.309802 | 0.002075 | 0.002919 | 0.001974 | 0.002994 |
ne | −0.079794 | 0.309818 | 0.002076 | 0.002921 | 0.001972 | 0.002992 | ||
1000 | 0.02 | ev | −0.079833 | 0.310081 | 0.003252 | 0.003249 | 0.003172 | 0.003418 |
ne | −0.079825 | 0.310097 | 0.003250 | 0.003249 | 0.003169 | 0.003411 |
Table 6.
Coverage probability and area of confidence ellipsoids (c.e.) for centers by the ALS2 estimator
n | σ | $\widehat{\varSigma }_{\mathrm{true}}$-based estimator | $\widehat{\varSigma }_{\mathrm{sample}}$-based estimator | ||||
Coverage probab. | Area of | Coverage probab. | Area of | ||||
80%, | 95%, | 95% c.e., | 80%, | 95%, | 95% c.e., | ||
% | % | $\times {10}^{-4}$ | % | % | $\times {10}^{-4}$ | ||
Distribution of true points is a mixture of normals | |||||||
1000 | 0.1 | 70.6 | 80.2 | 1449. | 70.0 | 79.2 | 1562. |
10000 | 0.1 | 79.4 | 93.8 | 236.2 | 79.6 | 92.9 | 259.4 |
100000 | 0.1 | 80.7 | 94.9 | 15.38 | 80.6 | 94.9 | 15.17 |
1000 | 0.02 | 80.4 | 95.1 | 15.95 | 78.0 | 94.1 | 19.86 |
Discrete distribution of true points | |||||||
1000 | 0.1 | 78.1 | 93.9 | 81.80 | 77.4 | 93.4 | 78.94 |
10000 | 0.1 | 81.1 | 95.6 | 12.34 | 80.9 | 95.8 | 12.39 |
100000 | 0.1 | 80.1 | 94.6 | 1.205 | 79.9 | 94.7 | 1.204 |
1000 | 0.02 | 81.0 | 94.9 | 1.984 | 81.3 | 95.2 | 1.988 |
Uniform distribution of true points | |||||||
1000 | 0.1 | 82.1 | 94.3 | 152.9 | 81.5 | 94.3 | 138.4 |
10000 | 0.1 | 81.0 | 96.6 | 20.36 | 80.3 | 96.3 | 18.92 |
100000 | 0.1 | 78.4 | 95.0 | 1.823 | 78.6 | 95.0 | 1.842 |
1000 | 0.02 | 78.7 | 94.7 | 2.926 | 78.5 | 94.1 | 3.041 |
There is a tendency that the equivariant version of the estimator is more accurate for small samples than the nonequivariant version. The two versions of the estimator are consistent and asymptotically equivalent. When the estimation is precise, the difference between the versions is negligible. When the estimation is imprecise, it is impossible to make inference which version is more accurate.
4.5 Comparison of two estimators for asymptotic covariance matrix in the conic section fitting model
In [13] a conic section fitting model is considered, and two estimators ($\widehat{\varSigma }_{\mathrm{true}}$ and $\widehat{\varSigma }_{\mathrm{sample}}$) for the asymptotic covariance matrix of the ALS2 estimator are constructed.
The software developed here can be used to make numerical comparison of the estimates of the asymptotic covariance matrices. The data are generated as described in Section 4.1 with 1000 simulations for each set of true points. Thus, the true conic unnecessarily was chosen degenerate. For each simulation, the parameters of the conic section were estimated; its center is found, and two confidence ellipsoids for the center were constructed using two different estimators of the asymptotic covariance matrix.
The sample coverage probability and median (over 1000 ellipsoids) area of the confidence ellipsoids is presented in Table 6. The ellipsoids were constructed for confidence levels 0.8 and 0.95. The area of 95% confidence ellipsoids is displayed in Table 6, and the area of 80% confidence ellipsoid is $\log _{20}(5)=0.5372$ of the area of 95% confidence ellipsoids.
Note that standard errors for coverage probability are $1.3\% $ for 80% confidence ellipsoids and $0.7\% $ for 95% confidence ellipsoids. The simulations do not allow us to make an inference which estimator is better. Thus, $\widehat{\varSigma }_{\mathrm{sample}}$-based estimator updated before ignore-$\widehat{F}$ step is compared with other estimators in simulations in Section 4.4 because of simpler explicit expression for $\widehat{\varSigma }_{\mathrm{sample}}$.