Asymptotic normality of total least squares estimator in a multivariate errors-in-variables model AX=B

Kukush, Alexander; Tsaregorodtsev, Yaroslav

doi:10.15559/16-VMSTA50

Asymptotic normality of total least squares estimator in a multivariate errors-in-variables model

A X = B

Volume 3, Issue 1 (2016), pp. 47–57

Alexander Kukush Yaroslav Tsaregorodtsev

https://doi.org/10.15559/16-VMSTA50

Pub. online: 29 March 2016 Type: Research Article

Open Access

Received
11 February 2016

Revised
7 March 2016

Accepted
11 March 2016

Published
29 March 2016

Abstract

We consider a multivariate functional measurement error model $AX\approx B$. The errors in $[A,B]$ are uncorrelated, row-wise independent, and have equal (unknown) variances. We study the total least squares estimator of X, which, in the case of normal errors, coincides with the maximum likelihood one. We give conditions for asymptotic normality of the estimator when the number of rows in A is increasing. Under mild assumptions, the covariance structure of the limit Gaussian random matrix is nonsingular. For normal errors, the results can be used to construct an asymptotic confidence interval for a linear functional of X.

1 Introduction

We deal with overdetermined system of linear equations $AX\approx B$, which is common in linear parameter estimation problem [9]. If the data matrix A and observation matrix B are contaminated with errors, and all the errors are uncorrelated and have equal variances, then the total least squares (TLS) technique is appropriate for solving this system [9]. Kukush and Van Huffel [5] showed the statistical consistency of the TLS estimator $\hat{X}_{\mathit{tls}}$ as the number m of rows in A grows, provided that the errors in $[A,B]$ are row-wise i.i.d. with zero mean and covariance matrix proportional to a unit matrix; the covariance matrix was assumed to be known up to a factor of proportionality; the true input matrix $A_{0}$ was supposed to be nonrandom. In fact, in [5] a more general, element-wise weighted TLS estimator was studied, where the errors in $[A,B]$ were row-wise independent, but within each row, the entries could be observed without errors, and, additionally, the error covariance matrix could differ from row to row. In [6], an iterative numerical procedure was developed to compute the elementwise-weighted TLS estimator, and the rate of convergence of the procedure was established.

In a univariate case where B and X are column vectors, the asymptotic normality of $\hat{X}_{\mathit{tls}}$ was shown by Gallo [4] as m grows. In [7], that result was extended to mixing error sequences. Both [4] and [7] utilized an explicit form of the TLS solution.

In the present paper, we extend the Gallo’s asymptotic normality result to a multivariate case, where A, X, and B are matrices.

Now a closed-form solution is unavailable, and we work instead with the cost function. More precisely, we deal with the estimating function, which is a matrix derivative of the cost function. In fact, we show that under mild conditions, the normalized estimator converges in distribution to a Gaussian random matrix with nonsingular covariance structure. For normal errors, the latter structure can be estimated consistently based on the observed matrix $[A,B]$. The results can be used to construct the asymptotic confidence ellipsoid for a vector $Xu$, where u is a column vector of the corresponding dimension.

The paper is organized as follows. In Section 2, we describe the model, refer to the consistency result for the estimator, and present the objective function and corresponding matrix estimating function. In Section 3, we state the asymptotic normality of $\hat{X}_{\mathit{tls}}$ and provide a nonsingular covariance structure for a limit random matrix. The latter structure depends continuously on some nuisance parameters of the model, and we derive consistent estimators for those parameters. Section 4 concludes. The proofs are given in Appendix. There we work with the estimating function and derive an expansion for the normalized estimator using Taylor’s formula. The expansion holds with probability tending to 1.

Throughout the paper, all vectors are column ones, $\operatorname{\mathbf{E}}$ stands for the expectation and acts as an operator on the total product, $\operatorname{\mathbf{cov}}(x)$ denotes the covariance matrix of a random vector x, and for a sequence of random matrices $\{X_{m},m\ge 1\}$ of the same size, the notation $X_{m}=O_{p}(1)$ means that the sequence $\{\| X_{m}\| \}$ is stochastically bounded, and $X_{m}=o_{p}(1)$ means that $\| X_{m}\| \stackrel{\mathrm{P}}{\longrightarrow }0$. By $\operatorname{I}_{p}$ we denote the unit matrix of size p.

2 Model, objective, and estimating

2.1 The TLS problem

Consider the model $AX\approx B$. Here $A\in {\mathbb{R}}^{m\times n}$ and $B\in {\mathbb{R}}^{m\times d}$ are observations, and $X\in {\mathbb{R}}^{n\times d}$ is a parameter of interest. Assume that

(2.1)

\[A=A_{0}+\tilde{A},\hspace{2em}B=B_{0}+\tilde{B},\]

and that there exists $X_{0}\in {\mathbb{R}}^{n\times d}$ such that

(2.2)

\[A_{0}X_{0}=B_{0}.\]

Here $A_{0}$ is the nonrandom true input matrix, $B_{0}$ is the true output matrix, and $\tilde{A}$, $\tilde{B}$ are error matrices. The matrix $X_{0}$ is the true value of the parameter.

We can rewrite the model (2.1)–(2.2) as a classical functional errors-in-variables (EIV) model with vector regressor and vector response [3]. Denote by ${a_{i}^{\operatorname{\mathsf{T}}}}$, ${a_{0i}^{\operatorname{\mathsf{T}}}}$, ${\tilde{a}_{i}^{\operatorname{\mathsf{T}}}}$, ${b_{i}^{\operatorname{\mathsf{T}}}}$, ${b_{0i}^{\operatorname{\mathsf{T}}}}$, and ${\tilde{b}_{i}^{\operatorname{\mathsf{T}}}}$ the rows of A, $A_{0}$, $\tilde{A}$, B, $B_{0}$, and $\tilde{B}$, respectively, $i=1,\dots ,m$. Then the model considered is equivalent to the following EIV model:

\[a_{i}=a_{0i}+\tilde{a}_{i},\hspace{2em}b_{i}=b_{0i}+\tilde{b}_{i},\hspace{2em}b_{oi}={X_{0}^{\operatorname{\mathsf{T}}}}a_{0i},\hspace{1em}i=1,\dots ,m.\]

Based on observations $a_{i}$, $b_{i}$, $i=1,\dots ,m$, we have to estimate $X_{0}$. The vectors $a_{0i}$ are nonrandom and unknown, and the vectors $\tilde{a}_{i}$, $\tilde{b}_{i}$ are random errors.

We state a global assumption of the paper.

(i) The vectors $\tilde{z}_{i}$ with ${\tilde{z}_{i}^{\operatorname{\mathsf{T}}}}=[{\tilde{a}_{i}^{\operatorname{\mathsf{T}}}},{\tilde{b}_{i}^{\operatorname{\mathsf{T}}}}]$, $i=1,2,\dots \hspace{0.1667em}$, are i.i.d., with zero mean and variance–covariance matrix

(2.3)
\[S_{\tilde{z}}:=\operatorname{\mathbf{cov}}(\tilde{z}_{1})={\sigma }^{2}\operatorname{I}_{n+d},\]
where the factor of proportionality ${\sigma }^{2}$ is positive and unknown.

The TLS problem consists in finding the values of disturbances $\Delta \hat{A}$ and $\Delta \hat{B}$ minimizing the sum of squared corrections

(2.4)

\[\underset{(X\in {\mathbb{R}}^{n\times d},\hspace{0.2222em}\Delta A,\hspace{0.2222em}\Delta B)}{\min }\big(\| \Delta A{\| _{F}^{2}}+\| \Delta B{\| _{F}^{2}}\big)\]

subject to the constraints

(2.5)

\[(A-\Delta A)X=B-\Delta B.\]

Here in (2.4), for a matrix $C=(c_{ij})$, $\| C\| _{F}$ denotes the Frobenius norm, $\| C{\| _{F}^{2}}=\sum _{i,j}{c_{ij}^{2}}$. Later on, we will also use the operator norm $\| C\| =\sup _{x\ne 0}\frac{\| Cx\| }{\| x\| }$.

2.2 TLS estimator and its consistency

It may happen that, for some random realization, problem (2.4)–(2.5) has no solution. In such a case, put $\hat{X}_{\mathit{tls}}=\infty $. Now, we give a formal definition of the TLS estimator.

Definition 1.

The TLS estimator $\hat{X}_{\mathit{tls}}$ of $X_{0}$ in the model (2.1)–(2.2) is a measurable mapping of the underlying probability space into ${\mathbb{R}}^{n\times d}\cup \{\infty \}$, which solves problem (2.4)–(2.5) if there exists a solution, and $\hat{X}_{\mathit{tls}}=\infty $ otherwise.

We need the following conditions for the consistency of $\hat{X}_{\mathit{tls}}$.

(ii) $\operatorname{\mathbf{E}}\| \tilde{z}_{1}{\| }^{4}<\infty $, where $\tilde{z}_{1}$ satisfies condition (i).
(iii) $\frac{1}{m}{A_{0}^{\operatorname{\mathsf{T}}}}A_{0}\to V_{A}$ as $m\to \infty $, where $V_{A}$ is a nonsingular matrix.

The next consistency result is contained in Theorem 4(a) of [5].

Theorem 2.

Assume condition (i) to (iii). Then $\hat{X}_{\mathit{tls}}$ is finite with probability tending to one, and $\hat{X}_{\mathit{tls}}$ tends to $X_{0}$ in probability as $m\to \infty $.

2.3 The objective and estimating functions

Denote

(2.6)

\[q(a,b;X)=\big({a}^{\operatorname{\mathsf{T}}}X-{b}^{\operatorname{\mathsf{T}}}\big){\big(\operatorname{I}_{d}+{X}^{\operatorname{\mathsf{T}}}X\big)}^{-1}\big({X}^{\operatorname{\mathsf{T}}}a-b\big),\]

(2.7)

\[Q(X)=\sum \limits_{i=1}^{m}q(a_{i},b_{i};X),\hspace{1em}X\in {\mathbb{R}}^{n\times d}.\]

The TLS estimator is known to minimize the objective function (2.7); see [8] or formula (24) in [5].

Lemma 3.

The TLS estimator $\hat{X}_{\mathit{tls}}$ is finite iff there exists an unconstrained minimum of the function (2.7), and then $\hat{X}_{\mathit{tls}}$ is a minimum point of that function.

Introduce an estimating function related to the loss function (2.6):

(2.8)

\[s(a,b;X):=a\big({a}^{\operatorname{\mathsf{T}}}X-{b}^{\operatorname{\mathsf{T}}}\big)-X{\big(\operatorname{I}_{d}+{X}^{\operatorname{\mathsf{T}}}X\big)}^{-1}\big({X}^{\operatorname{\mathsf{T}}}a-b\big)\big({a}^{\operatorname{\mathsf{T}}}X-{b}^{\operatorname{\mathsf{T}}}\big).\]

Corollary 4.

(a) Under conditions (i) to (iii), with probability tending to one $\hat{X}_{\mathit{tls}}$ is a solution to the equation
\[\sum \limits_{i=1}^{m}s(a_{i},b_{i};X)=0,\hspace{1em}X\in {\mathbb{R}}^{n\times d}.\]
(b) Under assumption (i), the function $s(a,b;X)$ is unbiased estimating function, that is, for each $i\ge 1$, $\operatorname{\mathbf{E}}_{X_{0}}s(a_{i},b_{i};X_{0})=0$.

Expression (2.8) as a function of X is a mapping in ${\mathbb{R}}^{n\times d}$. Its derivative ${s^{\prime }_{X}}$ is a linear operator in this space.

Lemma 5.

Under condition (i), for each $H\in {\mathbb{R}}^{n\times d}$ and $i\ge 1$, we have

(2.9)

\[\operatorname{\mathbf{E}}_{X_{0}}\big[{s^{\prime }_{X}}(a_{i},b_{i};X_{0})\cdot H\big]=a_{0i}{a_{0i}^{\operatorname{\mathsf{T}}}}H.\]

Therefore, we can identify $\operatorname{\mathbf{E}}_{X_{0}}{s^{\prime }_{X}}(a_{i},b_{i};X_{0})$ with the matrix $a_{0i}{a_{0i}^{\operatorname{\mathsf{T}}}}$.

3 Main results

Introduce further assumptions to state the asymptotic normality of $\hat{X}_{\mathit{tls}}$. We need a bit higher moments compared with conditions (ii) and (iii) in order to use the Lyapunov CLT. Recall that $\tilde{z}_{i}$ satisfies condition (i).

(iv) For some $\delta >0$, $\operatorname{\mathbf{E}}\| \tilde{z}_{1}{\| }^{4+2\delta }<\infty $.
(v) For δ from condition (iv),
\[\frac{1}{{m}^{1+\delta /2}}\sum \limits_{i=1}^{m}\| a_{0i}{\| }^{2+\delta }\to 0\hspace{1em}\text{ as }m\to \infty .\]
(vi) $\frac{1}{m}{\sum _{i=1}^{m}}a_{0i}\to \mu _{a}$ as $m\to \infty $, where $\mu _{a}\in {\mathbb{R}}^{n\times 1}$.
(vii) The distribution of $\tilde{z}_{1}$ is symmetric around the origin.

Introduce a random element in the space of systems consisting of five matrices:

(3.1)

\[W_{i}=\big(a_{0i}{\tilde{a}_{i}^{\operatorname{\mathsf{T}}}},a_{0i}{\tilde{b}_{i}^{\operatorname{\mathsf{T}}}},\tilde{a}_{i}{\tilde{a}_{i}^{\operatorname{\mathsf{T}}}}-{\sigma }^{2}\operatorname{I}_{n},\tilde{a}_{i}{\tilde{b}_{i}^{\operatorname{\mathsf{T}}}},\tilde{b}_{i}{\tilde{b}_{i}^{\operatorname{\mathsf{T}}}}-{\sigma }^{2}\operatorname{I}_{d}\big).\]

Hereafter $\stackrel{\mathrm{d}}{\longrightarrow }$ stands for the convergence in distribution.

Lemma 6.

Assume conditions (i) and (iii)–(vi). Then

(3.2)

\[\frac{1}{\sqrt{m}}\sum \limits_{i=1}^{m}W_{i}\stackrel{\mathrm{d}}{\longrightarrow }\varGamma =(\varGamma _{1},\dots ,\varGamma _{5})\hspace{1em}\textit{as }m\to \infty ,\]

where Γ is a Gaussian centered random element with matrix components.

Lemma 7.

In assumptions of Lemma 6, replace condition (vi) with condition (vii). Then the convergence (3.2) still holds with independent components $\varGamma _{1},\dots ,\varGamma _{5}$.

Now, we state the asymptotic normality of $\hat{X}_{\mathit{tls}}$.

Theorem 8.

(a) Assume conditions (i) and (iii)–(vi). Then

(3.3)
\[\frac{1}{\sqrt{m}}(\hat{X}_{\mathit{tls}}-X_{0})\stackrel{\mathrm{d}}{\longrightarrow }{V_{A}^{-1}}\varGamma (X_{0})\hspace{1em}\textit{as }m\to \infty ,\]

(3.4)
\[\varGamma (X):=\varGamma _{1}X-\varGamma _{2}+\varGamma _{3}X-\varGamma _{4}-X{\big(\operatorname{I}_{d}+{X}^{\operatorname{\mathsf{T}}}X\big)}^{-1}\big({X}^{\operatorname{\mathsf{T}}}\varGamma _{3}X-{X}^{\operatorname{\mathsf{T}}}\varGamma _{4}-{\varGamma _{4}^{\operatorname{\mathsf{T}}}}X+\varGamma _{5}\big),\]
where $V_{A}$ satisfies condition (iii), and $\varGamma _{i}$ satisfy relation (3.2).
(b) In the assumption of part (a), replace condition (vi) with condition (vii). Then the convergence (3.3) still holds, and, moreover, the limit random matrix $X_{\infty }:={V_{A}^{-1}}\varGamma (X_{0})$ has a nonsingular covariance structure, that is, for each nonzero vector $u\in {\mathbb{R}}^{d\times 1}$, $\operatorname{\mathbf{cov}}(X_{\infty }u)$ is a nonsingular matrix.

Remark 9.

Conditions of Theorem 8(a) are similar to Gallo’s conditions [4] for the asymptotic normality in the univariate case; see also, [9], pp. 240–243. Compared with Theorems 2.3 and 2.4 of [7], stated for univariate case with mixing errors, we need not the requirement for entries of the true input $A_{0}$ to be totally bounded.

In [7], Section 2, we can find a discussion of importance of the asymptotic normality result for $\hat{X}_{\mathit{tls}}$. It is claimed there that the formula for the asymptotic covariance structure of $\hat{X}_{\mathit{tls}}$ is computationally useless, but in case where the limit distribution is nonsingular, we can use the block-bootstrap techniques when constructing confidence intervals and testing hypotheses.

However, in the case of normal errors $\tilde{z}_{i}$, we can apply Theorem 8(b) to construct the asymptotic confidence ellipsoid, say, for $X_{0}u$, $u\in {\mathbb{R}}^{d\times 1}$, $u\ne 0$. Indeed, relations (3.1)–(3.4) show that the nonsingular matrix

\[S_{u}:=\operatorname{\mathbf{cov}}\big({\mathbf{V}_{A}^{-1}}\varGamma (X_{0})u\big)\]

is a continuous function $S_{u}=S_{u}(X_{0},\mathbf{V}_{A},{\sigma }^{2})$ of unknown parameters $X_{0}$, $\mathbf{V}_{A}$, and ${\sigma }^{2}$. (It is important here that now the components $\varGamma _{j}$ of Γ are independent, and the covariance structure of each $\varGamma _{j}$ depends on ${\sigma }^{2}$ and $\mathbf{V}_{A}$, not on some other limit characteristics of $A_{0}$; see Lemma 6.) Once we possess consistent estimators $\hat{\mathbf{V}}_{A}$ and ${\hat{\sigma }}^{2}$ of $\mathbf{V}_{A}$ and ${\sigma }^{2}$, the matrix $\hat{S}_{u}:=S_{u}(\hat{X}_{\mathit{tls}},\hat{\mathbf{V}}_{A},{\hat{\sigma }}^{2})$ is a consistent estimator for the covariance matrix $S_{u}$.

Hereafter, a bar means averaging for rows $i=1,\dots ,m$, for example, $\overline{a{b}^{\operatorname{\mathsf{T}}}}=m\frac{1}{m}{\sum _{i=1}^{m}}a_{i}{b_{i}^{\operatorname{\mathsf{T}}}}$.

Lemma 10.

Assume the conditions of Theorem 2. Define

(3.5)

\[{\hat{\sigma }}^{2}=\frac{1}{d}\mathrm{tr}\big[\big(\overline{b{b}^{\operatorname{\mathsf{T}}}}-2{\hat{X}_{\mathit{tls}}^{\operatorname{\mathsf{T}}}}\overline{a{b}^{\operatorname{\mathsf{T}}}}+{\hat{X}_{\mathit{tls}}^{\operatorname{\mathsf{T}}}}\overline{a{a}^{\operatorname{\mathsf{T}}}}\hat{X}_{\mathit{tls}}\big){\big(\operatorname{I}_{d}+{\hat{X}_{\mathit{tls}}^{\operatorname{\mathsf{T}}}}\hat{X}_{\mathit{tls}}\big)}^{-1}\big],\]

\[\hat{V}_{A}=\overline{a{a}^{\operatorname{\mathsf{T}}}}-{\hat{\sigma }}^{2}\operatorname{I}_{n}.\]

Then

(3.6)

\[{\hat{\sigma }}^{2}\stackrel{\mathrm{P}}{\longrightarrow }{\sigma }^{2},\hspace{2em}\hat{V}_{A}\stackrel{\mathrm{P}}{\longrightarrow }V_{A}.\]

Remark 11.

Estimator (3.5) is a multivariate analogue of the maximum likelihood estimator (1.53) in [2] in the functional scalar EIV model.

Finally, for the case $\tilde{z}_{1}\sim N(0,{\sigma }^{2}\operatorname{I}_{n+d})$, based on Lemma 10 and the relations

\[\frac{1}{\sqrt{m}}(\hat{X}_{\mathit{tls}}u-X_{0}u)\stackrel{\mathrm{d}}{\longrightarrow }N(0,S_{u}),\hspace{1em}S_{u}>0,\hspace{2.5pt}\hat{S}_{u}\stackrel{\mathrm{P}}{\longrightarrow }S_{u},\]

we can construct the asymptotic confidence ellipsoid for the vector $X_{0}u$ in a standard way.

Remark 12.

In a similar way, a confidence ellipsoid can be constructed for any finite set of linear combinations of $X_{0}$ entries with fixed known coefficients.

4 Conclusion

We extended the result of Gallo [4] and proved the asymptotic normality of the TLS estimator in a multivariate model $AX\approx B$. The normalized estimator converges in distribution to a random matrix with quite complicated covariance structure. If the error distribution is symmetric around the origin, then the latter covariance structure is nonsingular. For the case of normal errors, this makes it possible to construct the asymptotic confidence region for a vector $X_{0}u$, $u\in {\mathbb{R}}^{d\times 1}$, where $X_{0}$ is the true value of X.

In future papers, we will extend the result for the elementwise weighted TLS estimator [5] in the model $AX\approx B$, where some columns of the matrix $[A,B]$ may be observed without errors, and, in addition, the error covariance matrix may differ from row to row.

Appendix

Proof of Corollary 4

(a) For any n and d, the space ${\mathbb{R}}^{n\times d}$ is endowed with natural inner product $\langle A,B\rangle =\mathrm{tr}(A{B}^{\operatorname{\mathsf{T}}})$ and the Frobenius norm. The matrix derivative ${q^{\prime }_{X}}$ of the functional (2.6) is a linear functional on ${\mathbb{R}}^{n\times d}$, which can be identified with certain matrix from ${\mathbb{R}}^{n\times d}$ based on the inner product.

Using the rules of matrix calculus [1], we have for $H\in {\mathbb{R}}^{n\times d}$:

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \big\langle {q^{\prime }_{X}},H\big\rangle & \displaystyle ={a}^{\operatorname{\mathsf{T}}}H{\big(\operatorname{I}_{d}+{X}^{\operatorname{\mathsf{T}}}X\big)}^{-1}\big({X}^{\operatorname{\mathsf{T}}}a-b\big)\\{} & \displaystyle \hspace{1em}-\big({a}^{\operatorname{\mathsf{T}}}X-{b}^{\operatorname{\mathsf{T}}}\big){\big(\operatorname{I}_{d}+{X}^{\operatorname{\mathsf{T}}}X\big)}^{-1}\big({H}^{\operatorname{\mathsf{T}}}X+{X}^{\operatorname{\mathsf{T}}}H\big){\big(\operatorname{I}_{d}+{X}^{\operatorname{\mathsf{T}}}X\big)}^{-1}\big({X}^{\operatorname{\mathsf{T}}}a-b\big)\\{} & \displaystyle \hspace{1em}+\big({a}^{\operatorname{\mathsf{T}}}X-{b}^{\operatorname{\mathsf{T}}}\big){\big(\operatorname{I}_{d}+{X}^{\operatorname{\mathsf{T}}}X\big)}^{-1}{H}^{\operatorname{\mathsf{T}}}a.\end{array}\]

Collecting similar terms, we obtain:

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \frac{1}{2}\big\langle {q^{\prime }_{X}},H\big\rangle & \displaystyle =\big({a}^{\operatorname{\mathsf{T}}}X-{b}^{\operatorname{\mathsf{T}}}\big){\big(\operatorname{I}_{d}+{X}^{\operatorname{\mathsf{T}}}X\big)}^{-1}{H}^{\operatorname{\mathsf{T}}}a\\{} & \displaystyle \hspace{1em}-\big({a}^{\operatorname{\mathsf{T}}}X-{b}^{\operatorname{\mathsf{T}}}\big){\big(\operatorname{I}_{d}+{X}^{\operatorname{\mathsf{T}}}X\big)}^{-1}{H}^{\operatorname{\mathsf{T}}}X{\big(\operatorname{I}_{d}+{X}^{\operatorname{\mathsf{T}}}X\big)}^{-1}\big({X}^{\operatorname{\mathsf{T}}}a-b\big),\end{array}\]

and

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \frac{1}{2}\big\langle {q^{\prime }_{X}},H\big\rangle & \displaystyle =\mathrm{tr}\big[a\big({a}^{\operatorname{\mathsf{T}}}X-{b}^{\operatorname{\mathsf{T}}}\big){\big(\operatorname{I}_{d}+{X}^{\operatorname{\mathsf{T}}}X\big)}^{-1}{H}^{\operatorname{\mathsf{T}}}\big]\\{} & \displaystyle \hspace{1em}-\mathrm{tr}\big[X{\big(\operatorname{I}_{d}+{X}^{\operatorname{\mathsf{T}}}X\big)}^{-1}\big({X}^{\operatorname{\mathsf{T}}}a-b\big)\big({a}^{\operatorname{\mathsf{T}}}X-{b}^{\operatorname{\mathsf{T}}}\big){\big(\operatorname{I}_{d}+{X}^{\operatorname{\mathsf{T}}}X\big)}^{-1}{H}^{\operatorname{\mathsf{T}}}\big].\end{array}\]

Using the inner product in ${\mathbb{R}}^{n\times d}$, we get $\frac{1}{2}{q^{\prime }_{X}}=s(x){(\operatorname{I}_{d}+{X}^{\operatorname{\mathsf{T}}}X)}^{-1}$, where $s(x)$ is the left-hand side of (2.8). In view of Theorem 2 and Lemma 3, this implies the statement of Corollary 4(a).

(b) Now, we set

(4.1)

\[a=a_{0}+\tilde{a},\hspace{2em}b=b_{0}+\tilde{b},\hspace{2em}b_{0}={X}^{\operatorname{\mathsf{T}}}a_{0},\]

where $a_{0}$ is a nonrandom vector, and, like in (2.3),

(4.2)

\[\operatorname{\mathbf{cov}}\left(\left[\begin{array}{l}\tilde{a}\\{} \tilde{b}\end{array}\right]\right)={\sigma }^{2}\operatorname{I}_{n+d},\hspace{2em}\operatorname{\mathbf{E}}\left[\begin{array}{l}\tilde{a}\\{} \tilde{b}\end{array}\right]=0.\]

Then

(4.3)

\[\operatorname{\mathbf{E}}_{X}a\big({a}^{\operatorname{\mathsf{T}}}X-{b}^{\operatorname{\mathsf{T}}}\big)=\operatorname{\mathbf{E}}a\big({\tilde{a}}^{\operatorname{\mathsf{T}}}X-{\tilde{b}}^{\operatorname{\mathsf{T}}}\big)={\sigma }^{2}X,\]

(4.4)

\[\operatorname{\mathbf{E}}_{X}\big({X}^{\operatorname{\mathsf{T}}}a-b\big)\big({a}^{\operatorname{\mathsf{T}}}X-{b}^{\operatorname{\mathsf{T}}}\big)=\operatorname{\mathbf{E}}\big({X}^{\operatorname{\mathsf{T}}}\tilde{a}-\tilde{b}\big)\big({\tilde{a}}^{\operatorname{\mathsf{T}}}X-{\tilde{b}}^{\operatorname{\mathsf{T}}}\big)={\sigma }^{2}\big(\operatorname{I}_{d}+{X}^{\operatorname{\mathsf{T}}}X\big).\]

Therefore (see (2.8)),

\[\operatorname{\mathbf{E}}_{X}s(a,b;X)={\sigma }^{2}X-{\sigma }^{2}X{\big(\operatorname{I}_{d}+{X}^{\operatorname{\mathsf{T}}}X\big)}^{-1}\big(\operatorname{I}_{d}+{X}^{\operatorname{\mathsf{T}}}X\big)=0.\]

This implies the statement of Corollary 4(b).

Proof of Lemma 5

The derivative ${s^{\prime }_{X}}$ of the function (2.8) is a linear operator in ${\mathbb{R}}^{n\times d}$. For $H\in {\mathbb{R}}^{n\times d}$, we have:

(4.5)

\[\begin{array}{r@{\hskip0pt}l}\displaystyle {s^{\prime }_{X}}H& \displaystyle =a{a}^{\operatorname{\mathsf{T}}}H-H{\big(\operatorname{I}_{d}+{X}^{\operatorname{\mathsf{T}}}X\big)}^{-1}\big({X}^{\operatorname{\mathsf{T}}}a-b\big)\big({a}^{\operatorname{\mathsf{T}}}X-{b}^{\operatorname{\mathsf{T}}}\big)\\{} & \displaystyle \hspace{1em}+X{\big(\operatorname{I}_{d}+{X}^{\operatorname{\mathsf{T}}}X\big)}^{-1}\big({H}^{\operatorname{\mathsf{T}}}X+{X}^{\operatorname{\mathsf{T}}}H\big){\big(\operatorname{I}_{d}+{X}^{\operatorname{\mathsf{T}}}X\big)}^{-1}\big({X}^{\operatorname{\mathsf{T}}}a-b\big)\\{} & \displaystyle \hspace{1em}\times \big({a}^{\operatorname{\mathsf{T}}}X-{b}^{\operatorname{\mathsf{T}}}\big)-X{\big(\operatorname{I}_{d}+{X}^{\operatorname{\mathsf{T}}}X\big)}^{-1}\big({H}^{\operatorname{\mathsf{T}}}a\big({a}^{\operatorname{\mathsf{T}}}X-{b}^{\operatorname{\mathsf{T}}}\big)+\big({X}^{\operatorname{\mathsf{T}}}a-b\big){a}^{\operatorname{\mathsf{T}}}H\big).\end{array}\]

As before, we set (4.1), (4.2) and use relations (4.3), (4.4), and the relation $\operatorname{\mathbf{E}}a{a}^{\operatorname{\mathsf{T}}}=a_{0}{a_{0}^{\operatorname{\mathsf{T}}}}+{\sigma }^{2}\operatorname{I}_{n}$. We obtain:

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \operatorname{\mathbf{E}}_{X}{s^{\prime }_{X}}H& \displaystyle =\big(a_{0}{a_{0}^{\operatorname{\mathsf{T}}}}+{\sigma }^{2}\operatorname{I}_{n}\big)H-{\sigma }^{2}H+{\sigma }^{2}X{\big(\operatorname{I}_{d}+{X}^{\operatorname{\mathsf{T}}}X\big)}^{-1}\big({H}^{\operatorname{\mathsf{T}}}X+{X}^{\operatorname{\mathsf{T}}}H\big)\\{} & \displaystyle \hspace{1em}-{\sigma }^{2}X{\big(\operatorname{I}_{d}+{X}^{\operatorname{\mathsf{T}}}X\big)}^{-1}\big({H}^{\operatorname{\mathsf{T}}}H+{X}^{\operatorname{\mathsf{T}}}H\big)=a_{0}{a_{0}^{\operatorname{\mathsf{T}}}}H.\end{array}\]

This implies (2.9).

Proof of Lemma 6

The random elements $W_{i}$, $i\ge 1$, in (3.1) are independent and centered. We want to apply the Lyapunov CLT for the left-hand side of (3.2).

(a) All the second moments of ${m}^{-\frac{1}{2}}{\sum _{i=1}^{m}}W_{i}$ converge to finite limits. For example, for the first component, we have

\[\frac{1}{m}\sum \limits_{i=1}^{m}\operatorname{\mathbf{E}}{\big(\big\langle a_{0i}{\tilde{a}_{i}^{\operatorname{\mathsf{T}}}},H_{1}\big\rangle \big)}^{2}=\frac{1}{m}\sum \limits_{i=1}^{m}\operatorname{\mathbf{E}}{\big(\text{tr }a_{0i}{\tilde{a}_{1}^{\operatorname{\mathsf{T}}}}{H_{1}^{\operatorname{\mathsf{T}}}}\big)}^{2},\]

and this has a finite limit due to assumption (iii). Here $H_{1}\in {\mathbb{R}}^{n\times n}$, and we use the inner product introduced in the proof of Corollary 4.

For the fifth component,

\[\frac{1}{m}\sum \limits_{i=1}^{m}\operatorname{\mathbf{E}}{\big(\big\langle \tilde{b}_{i}{\tilde{b}_{i}^{\operatorname{\mathsf{T}}}}-{\sigma }^{2}\operatorname{I}_{d},H_{2}\big\rangle \big)}^{2}=\operatorname{\mathbf{E}}{\big[\mathrm{tr}\big(\big(\tilde{b}_{1}{\tilde{b}_{1}^{\operatorname{\mathsf{T}}}}-{\sigma }^{2}\operatorname{I}_{d}\big)H_{2}\big)\big]}^{2}<\infty ,\]

because the fourth moments of $\tilde{b}_{i}$ are finite. Here $H_{2}\in {\mathbb{R}}^{d\times d}$.

For mixed moments of the first and fifth components, we have

(4.6)

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \frac{1}{m}\sum \limits_{i=1}^{m}\operatorname{\mathbf{E}}\big\langle a_{0i}{\tilde{a}_{i}^{\operatorname{\mathsf{T}}}},H_{1}\big\rangle \cdot \big\langle \tilde{b}_{i}{\tilde{b}_{i}^{\operatorname{\mathsf{T}}}}-{\sigma }^{2}\operatorname{I}_{d},H_{2}\big\rangle \\{} & \displaystyle \hspace{1em}=\operatorname{\mathbf{E}}\Bigg\langle \Bigg(\frac{1}{m}\sum \limits_{i=1}^{m}a_{0i}\Bigg){\tilde{a}_{1}^{\operatorname{\mathsf{T}}}},H_{1}\Bigg\rangle \cdot \big\langle \tilde{b}_{1}{\tilde{b}_{1}^{\operatorname{\mathsf{T}}}}-{\sigma }^{2}\operatorname{I}_{d},H_{2}\big\rangle ,\end{array}\]

and this, due to condition (vi), converges toward

\[\operatorname{\mathbf{E}}\big\langle \mu _{a}{\tilde{a}_{1}^{\operatorname{\mathsf{T}}}},H_{1}\big\rangle \cdot \big\langle \tilde{b}_{1}{\tilde{b}_{1}^{\operatorname{\mathsf{T}}}}-{\sigma }^{2}\operatorname{I}_{d},H_{2}\big\rangle .\]

Other second moments can be considered in a similar way.

(b) The Lyapunov condition holds for each component of (3.1). Let δ be the quantity from assumptions (iv), (v). Then

\[\frac{1}{{m}^{1+\delta /2}}\sum \limits_{i=1}^{m}\operatorname{\mathbf{E}}{\big\| a_{0i}{\tilde{a}_{i}^{\operatorname{\mathsf{T}}}}\big\| }^{2+\delta }\le \frac{\operatorname{\mathbf{E}}\| \tilde{a}_{1}{\| }^{2+\delta }}{{m}^{1+\delta /2}}\sum \limits_{i=1}^{m}\| a_{0i}{\| }^{2+\delta }\to 0\]

as $m\to \infty $ by condition (v). For the fifth component,

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \frac{1}{{m}^{1+\delta /2}}\sum \limits_{i=1}^{m}\operatorname{\mathbf{E}}{\big\| \tilde{b}_{i}{\tilde{b}_{i}^{\operatorname{\mathsf{T}}}}-{\sigma }^{2}\operatorname{I}_{d}\big\| }^{2+\delta }& \displaystyle =\frac{1}{{m}^{\delta /2}}\operatorname{\mathbf{E}}{\big\| \tilde{b}_{1}{\tilde{b}_{1}^{\operatorname{\mathsf{T}}}}-{\sigma }^{2}\operatorname{I}_{d}\big\| }^{2+\delta }\\{} & \displaystyle \le \frac{\mathrm{const}}{{m}^{\delta /2}}\operatorname{\mathbf{E}}\| \tilde{b}_{1}{\| }^{4+2\delta }\to 0\hspace{1em}\text{ as }m\to \infty .\end{array}\]

The latter expectation is finite by condition (iv).

The Lyapunov condition for other components is considered similarly.

Proof of Lemma 7

Under conditions (vii) and (i), all the five components of $W_{i}$, which is given in (3.1), are uncorrelated (e.g., the cross-correlation like (4.6) equals zero, and condition (vi) is not needed). As in proof of Lemma 6, the convergence (3.2) still holds. The components $\varGamma _{1},\dots ,\varGamma _{5}$ of Γ are independent because the components of $W_{i}$ are uncorrelated.

Proof of Theorem 8(a)

Our reasoning is typical for theory of generalized estimating equations, with specific feature that a matrix parameter rather than vector one is estimated.

By Corollary 4(a), with probability tending to 1 we have

(4.7)

\[\sum \limits_{i=1}^{m}s(a_{i},b_{i};\hat{X}_{\mathit{tls}})=0.\]

Now, we use Taylor’s formula around $X_{0}$ with the remainder in the Lagrange form; see [1], Theorem 5.6.2. Denote

\[\hat{\Delta }=\sqrt{m}(\hat{X}_{\mathit{tls}}-X_{0}),\hspace{2em}y_{m}=\sum \limits_{i=1}^{m}s(a_{i},b_{i};X_{0}),\hspace{2em}U_{m}=\sum \limits_{i=1}^{m}{s^{\prime }_{X}}(a_{i},b_{i};X_{0}).\]

Then (4.7) implies the relation

(4.8)

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \bigg(\frac{1}{m}U_{m}\bigg)\hat{\Delta }=-\frac{1}{\sqrt{m}}y_{m}+\mathit{rest}_{1},\\{} & \displaystyle \| \mathit{rest}_{1}\| \le \| \hat{\Delta }\| \cdot \| \hat{X}_{\mathit{tls}}-X_{0}\| \cdot O_{p}(1).\end{array}\]

Here $O_{p}(1)$ is a factor of the form

(4.9)

\[\frac{1}{m}\sum \limits_{i=1}^{m}\underset{(\| X\| \le \| X_{0}\| +1)}{\sup }\big\| {s^{\prime\prime }_{x}}(a_{i},b_{i};X)\big\| .\]

Relation (4.8) holds with probability tending to 1 because, due to Theorem 2, $\hat{X}_{\mathit{tls}}\stackrel{\mathrm{P}}{\longrightarrow }X_{0}$; expression (4.9) is indeed $O_{p}(1)$ because the derivative ${s^{\prime\prime }_{x}}$ is quadratic in $a_{i}$, $b_{i}$ (cf. (4.5)), and the averaged second moments of $[{a_{i}^{\operatorname{\mathsf{T}}}},{b_{i}^{\operatorname{\mathsf{T}}}}]$ are assumed to be bounded.

Now, $\| \mathit{rest}_{1}\| \le \| \hat{\Delta }\| \cdot o_{p}(1)$. Next, by Lemma 5 and condition (iii),

\[\frac{1}{m}U_{m}=\frac{1}{m}\operatorname{\mathbf{E}}U_{m}+o_{p}(1)=V_{A}+o_{p}(1).\]

Therefore, (4.8) implies that

(4.10)

\[V_{A}\hat{\Delta }=-\frac{1}{\sqrt{m}}y_{m}+\mathit{rest}_{2},\]

(4.11)

\[\| \mathit{rest}_{2}\| \le \| \hat{\Delta }\| \cdot o_{p}(1).\]

Now, we find the limit in distribution of $y_{m}/\sqrt{m}$. The summands in $y_{m}$ have zero expression due to Corollary 4(b). Moreover (see (2.8)),

\[s(a_{i},b_{i};X_{0})=(a_{0i}+\tilde{a}_{i})\big({\tilde{a}_{i}^{\operatorname{\mathsf{T}}}}X_{0}-{\tilde{b}_{i}^{\operatorname{\mathsf{T}}}}\big)-X_{0}{\big(\operatorname{I}_{d}+{X_{0}^{\operatorname{\mathsf{T}}}}X_{0}\big)}^{-1}\big({X_{0}^{\operatorname{\mathsf{T}}}}\tilde{a}_{i}-\tilde{b}_{i}\big)\big({\tilde{a}_{i}^{\operatorname{\mathsf{T}}}}X_{0}-{\tilde{b}_{i}^{\operatorname{\mathsf{T}}}}\big),\]

\[\begin{array}{r@{\hskip0pt}l}\displaystyle s(a_{i},b_{i};X_{0})& \displaystyle =W_{i1}X_{0}-W_{i2}+W_{i3}X_{0}-W_{i4}-X_{0}{\big(\operatorname{I}_{d}+{X_{0}^{\operatorname{\mathsf{T}}}}X_{0}\big)}^{-1}\\{} & \displaystyle \hspace{1em}\times \big({X_{0}^{\operatorname{\mathsf{T}}}}W_{i3}X_{0}-{X_{0}^{\operatorname{\mathsf{T}}}}W_{i4}-{W_{i4}^{\operatorname{\mathsf{T}}}}X_{0}+W_{i5}\big).\end{array}\]

Here $W_{ij}$ are the components of (3.1). By Lemma 6 we have (see (3.4))

(4.12)

\[\frac{1}{\sqrt{m}}y_{m}\stackrel{\mathrm{d}}{\longrightarrow }\varGamma (X_{0})\hspace{1em}\text{as }m\to \infty .\]

Finally, relations (4.10), (4.11), (4.12) and the nonsingularity of $V_{A}$ imply that $\hat{\Delta }=O_{p}(1)$, and by Slutsky’s lemma we get

(4.13)

\[V_{A}\hat{\Delta }\stackrel{\mathrm{d}}{\longrightarrow }\varGamma (X_{0})\hspace{1em}\text{as }m\to \infty .\]

By condition (iii) the matrix $V_{A}$ is nonsingular. Thus, the desired relation (3.3) follows from (4.13).

Proof of Theorem 8(b)

The convergence (3.3) is justified as before, but using Lemma 7 instead of Lemma 6. It suffices to show that $\operatorname{\mathbf{cov}}(\varGamma (X_{0})u)$ is nonsingular for $u\in {\mathbb{R}}^{d\times 1}$, $u\ne 0$.

Now, the components $\varGamma _{1},\dots ,\varGamma _{5}$ are independent. Then (see (3.4))

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \operatorname{\mathbf{cov}}\big(\varGamma (X_{0})u\big)& \displaystyle \ge \operatorname{\mathbf{cov}}(\varGamma _{2}u)=\underset{m\to \infty }{\lim }\frac{1}{m}\sum \limits_{i=1}^{m}\operatorname{\mathbf{E}}\big({u}^{\operatorname{\mathsf{T}}}\tilde{b}_{i}{a_{0i}^{\operatorname{\mathsf{T}}}}a_{0i}{\tilde{b}_{i}^{\operatorname{\mathsf{T}}}}u\big)\\{} & \displaystyle =\mathrm{tr}V_{A}\cdot \operatorname{\mathbf{E}}{\big\| {\tilde{b}_{1}^{\operatorname{\mathsf{T}}}}u\big\| }^{2}={\sigma }^{2}\mathrm{tr}V_{A}\cdot \| \mathrm{u}{\| }^{2}>\mathrm{0}.\end{array}\]

Proof of Lemma 10

By condition (i) we have

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \operatorname{\mathbf{E}}a_{i}{a_{i}^{\operatorname{\mathsf{T}}}}& \displaystyle =a_{0i}{a_{0i}^{\operatorname{\mathsf{T}}}}+{\sigma }^{2}\operatorname{I}_{n},\hspace{2em}\operatorname{\mathbf{E}}a_{i}{b_{i}^{\operatorname{\mathsf{T}}}}=a_{i0}{a_{i0}^{\operatorname{\mathsf{T}}}}X_{0},\\{} \displaystyle \operatorname{\mathbf{E}}b_{i}{b_{i}^{\operatorname{\mathsf{T}}}}& \displaystyle ={X_{0}^{\operatorname{\mathsf{T}}}}a_{0i}{a_{0i}^{\operatorname{\mathsf{T}}}}X_{0}+{\sigma }^{2}\operatorname{I}_{d},\end{array}\]

(4.14)

\[\operatorname{\mathbf{E}}b_{i}{b_{i}^{\operatorname{\mathsf{T}}}}-2{X_{0}^{\operatorname{\mathsf{T}}}}\operatorname{\mathbf{E}}a_{i}{b_{i}^{\operatorname{\mathsf{T}}}}+{X_{0}^{\operatorname{\mathsf{T}}}}\big(\operatorname{\mathbf{E}}a_{i}{a_{i}^{\operatorname{\mathsf{T}}}}\big)X_{0}={\sigma }^{2}\big(\operatorname{I}_{d}+{X_{0}^{\operatorname{\mathsf{T}}}}X_{0}\big).\]

Equality (4.14) implies the first relation in (3.6) because $\hat{X}_{\mathit{tls}}\stackrel{\mathrm{P}}{\longrightarrow }X_{0}$ and $\overline{a{a}^{\operatorname{\mathsf{T}}}}-\operatorname{\mathbf{E}}\overline{a{a}^{\operatorname{\mathsf{T}}}}\stackrel{\mathrm{P}}{\longrightarrow }0$, $\overline{a{b}^{\operatorname{\mathsf{T}}}}-\operatorname{\mathbf{E}}\overline{a{b}^{\operatorname{\mathsf{T}}}}\stackrel{\mathrm{P}}{\longrightarrow }0$, $\overline{b{b}^{\operatorname{\mathsf{T}}}}-\operatorname{\mathbf{E}}\overline{b{b}^{\operatorname{\mathsf{T}}}}\stackrel{\mathrm{P}}{\longrightarrow }0$,

Finally,

\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \hat{V}_{A}=\operatorname{\mathbf{E}}\overline{a{a}^{\operatorname{\mathsf{T}}}}+o_{p}(1)-{\hat{\sigma }}^{2}\operatorname{I}_{n}=\overline{a_{0}{a_{0}^{\operatorname{\mathsf{T}}}}}+({\sigma }^{2}-{\hat{\sigma }}^{2})\operatorname{I}_{n}+o_{p}(1),\\{} & \displaystyle \hspace{1em}\hat{V}_{A}\stackrel{\mathrm{P}}{\longrightarrow }\underset{m\to \infty }{\lim }\overline{a_{0}{a_{0}^{\operatorname{\mathsf{T}}}}}=V_{A}.\end{array}\]

References

[1]

Cartan, H.: Differential Calculus. Hermann/Houghton Mifflin Co., Paris/Boston, MA (1971). Translated from French. MR0344032

[2]

Cheng, C.-L., Van Ness, J.W.: Statistical Regression with Measurement Error. Kendall’s Library of Statistics, vol. 6. Arnold, London (1999). Co-published by Oxford University Press, New York. MR1719513

[3]

Fuller, W.A.: Measurement Error Models. John Wiley & Sons, Inc., New York (1987). MR0898653. doi:10.1002/9780470316665

[4]

Gallo, P.P.: Properties of estimators errors-in-variables models. PhD thesis, The University of North Carolina at Chapel Hill, NC (1982). MR2632121

[5]

Kukush, A., Van Huffel, S.: Consistency of elementwise-weighted total least squares estimator in a multivariate errors-in-variables model $AX=B$. Metrika 59(1), 75–97 (2004). MR2043433. doi:10.1007/s001840300272

[6]

Markovsky, I., Rastello, M.L., Premoli, A., Kukush, A., Van Huffel, S.: The element-wise weighted total least-squares problem. Comput. Stat. Data Anal. 50(1), 181–209 (2006). MR2196229. doi:10.1016/j.csda.2004.07.014

[7]

Pešta, M.: Asymptotics for weakly dependent errors-in-variables. Kybernetika 49(5), 692–704 (2013). MR3182634

[8]

Sprent, P.: A generalized least-squares approach to linear functional relationships. J. R. Stat. Soc. B 28, 278–297 (1966). MR0230432

[9]

Van Huffel, S., Vandewalle, J.: The Total Least Squares Problem. Frontiers in Applied Mathematics, vol. 9. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA (1991). MR1118607. doi:10.1137/1.9781611971002

Exit Reading

Table of contents

1 Introduction
2 Model, objective, and estimating
3 Main results
4 Conclusion
Appendix
References

RSS

Authors