1 Introduction
We deal with overdetermined system of linear equations $AX\approx B$, which is common in linear parameter estimation problem [9]. If the data matrix A and observation matrix B are contaminated with errors, and all the errors are uncorrelated and have equal variances, then the total least squares (TLS) technique is appropriate for solving this system [9]. Kukush and Van Huffel [5] showed the statistical consistency of the TLS estimator $\hat{X}_{\mathit{tls}}$ as the number m of rows in A grows, provided that the errors in $[A,B]$ are row-wise i.i.d. with zero mean and covariance matrix proportional to a unit matrix; the covariance matrix was assumed to be known up to a factor of proportionality; the true input matrix $A_{0}$ was supposed to be nonrandom. In fact, in [5] a more general, element-wise weighted TLS estimator was studied, where the errors in $[A,B]$ were row-wise independent, but within each row, the entries could be observed without errors, and, additionally, the error covariance matrix could differ from row to row. In [6], an iterative numerical procedure was developed to compute the elementwise-weighted TLS estimator, and the rate of convergence of the procedure was established.
In a univariate case where B and X are column vectors, the asymptotic normality of $\hat{X}_{\mathit{tls}}$ was shown by Gallo [4] as m grows. In [7], that result was extended to mixing error sequences. Both [4] and [7] utilized an explicit form of the TLS solution.
In the present paper, we extend the Gallo’s asymptotic normality result to a multivariate case, where A, X, and B are matrices.
Now a closed-form solution is unavailable, and we work instead with the cost function. More precisely, we deal with the estimating function, which is a matrix derivative of the cost function. In fact, we show that under mild conditions, the normalized estimator converges in distribution to a Gaussian random matrix with nonsingular covariance structure. For normal errors, the latter structure can be estimated consistently based on the observed matrix $[A,B]$. The results can be used to construct the asymptotic confidence ellipsoid for a vector $Xu$, where u is a column vector of the corresponding dimension.
The paper is organized as follows. In Section 2, we describe the model, refer to the consistency result for the estimator, and present the objective function and corresponding matrix estimating function. In Section 3, we state the asymptotic normality of $\hat{X}_{\mathit{tls}}$ and provide a nonsingular covariance structure for a limit random matrix. The latter structure depends continuously on some nuisance parameters of the model, and we derive consistent estimators for those parameters. Section 4 concludes. The proofs are given in Appendix. There we work with the estimating function and derive an expansion for the normalized estimator using Taylor’s formula. The expansion holds with probability tending to 1.
Throughout the paper, all vectors are column ones, $\operatorname{\mathbf{E}}$ stands for the expectation and acts as an operator on the total product, $\operatorname{\mathbf{cov}}(x)$ denotes the covariance matrix of a random vector x, and for a sequence of random matrices $\{X_{m},m\ge 1\}$ of the same size, the notation $X_{m}=O_{p}(1)$ means that the sequence $\{\| X_{m}\| \}$ is stochastically bounded, and $X_{m}=o_{p}(1)$ means that $\| X_{m}\| \stackrel{\mathrm{P}}{\longrightarrow }0$. By $\operatorname{I}_{p}$ we denote the unit matrix of size p.
2 Model, objective, and estimating
2.1 The TLS problem
Consider the model $AX\approx B$. Here $A\in {\mathbb{R}}^{m\times n}$ and $B\in {\mathbb{R}}^{m\times d}$ are observations, and $X\in {\mathbb{R}}^{n\times d}$ is a parameter of interest. Assume that
and that there exists $X_{0}\in {\mathbb{R}}^{n\times d}$ such that
Here $A_{0}$ is the nonrandom true input matrix, $B_{0}$ is the true output matrix, and $\tilde{A}$, $\tilde{B}$ are error matrices. The matrix $X_{0}$ is the true value of the parameter.
We can rewrite the model (2.1)–(2.2) as a classical functional errors-in-variables (EIV) model with vector regressor and vector response [3]. Denote by ${a_{i}^{\operatorname{\mathsf{T}}}}$, ${a_{0i}^{\operatorname{\mathsf{T}}}}$, ${\tilde{a}_{i}^{\operatorname{\mathsf{T}}}}$, ${b_{i}^{\operatorname{\mathsf{T}}}}$, ${b_{0i}^{\operatorname{\mathsf{T}}}}$, and ${\tilde{b}_{i}^{\operatorname{\mathsf{T}}}}$ the rows of A, $A_{0}$, $\tilde{A}$, B, $B_{0}$, and $\tilde{B}$, respectively, $i=1,\dots ,m$. Then the model considered is equivalent to the following EIV model:
\[a_{i}=a_{0i}+\tilde{a}_{i},\hspace{2em}b_{i}=b_{0i}+\tilde{b}_{i},\hspace{2em}b_{oi}={X_{0}^{\operatorname{\mathsf{T}}}}a_{0i},\hspace{1em}i=1,\dots ,m.\]
Based on observations $a_{i}$, $b_{i}$, $i=1,\dots ,m$, we have to estimate $X_{0}$. The vectors $a_{0i}$ are nonrandom and unknown, and the vectors $\tilde{a}_{i}$, $\tilde{b}_{i}$ are random errors.We state a global assumption of the paper.
-
(i) The vectors $\tilde{z}_{i}$ with ${\tilde{z}_{i}^{\operatorname{\mathsf{T}}}}=[{\tilde{a}_{i}^{\operatorname{\mathsf{T}}}},{\tilde{b}_{i}^{\operatorname{\mathsf{T}}}}]$, $i=1,2,\dots \hspace{0.1667em}$, are i.i.d., with zero mean and variance–covariance matrix
(2.3)
\[S_{\tilde{z}}:=\operatorname{\mathbf{cov}}(\tilde{z}_{1})={\sigma }^{2}\operatorname{I}_{n+d},\]
The TLS problem consists in finding the values of disturbances $\Delta \hat{A}$ and $\Delta \hat{B}$ minimizing the sum of squared corrections
subject to the constraints
Here in (2.4), for a matrix $C=(c_{ij})$, $\| C\| _{F}$ denotes the Frobenius norm, $\| C{\| _{F}^{2}}=\sum _{i,j}{c_{ij}^{2}}$. Later on, we will also use the operator norm $\| C\| =\sup _{x\ne 0}\frac{\| Cx\| }{\| x\| }$.
2.2 TLS estimator and its consistency
It may happen that, for some random realization, problem (2.4)–(2.5) has no solution. In such a case, put $\hat{X}_{\mathit{tls}}=\infty $. Now, we give a formal definition of the TLS estimator.
We need the following conditions for the consistency of $\hat{X}_{\mathit{tls}}$.
-
(ii) $\operatorname{\mathbf{E}}\| \tilde{z}_{1}{\| }^{4}<\infty $, where $\tilde{z}_{1}$ satisfies condition (i).
-
(iii) $\frac{1}{m}{A_{0}^{\operatorname{\mathsf{T}}}}A_{0}\to V_{A}$ as $m\to \infty $, where $V_{A}$ is a nonsingular matrix.
2.3 The objective and estimating functions
Denote
The TLS estimator is known to minimize the objective function (2.7); see [8] or formula (24) in [5].
Lemma 3.
The TLS estimator $\hat{X}_{\mathit{tls}}$ is finite iff there exists an unconstrained minimum of the function (2.7), and then $\hat{X}_{\mathit{tls}}$ is a minimum point of that function.
Introduce an estimating function related to the loss function (2.6):
Corollary 4.
-
(b) Under assumption (i), the function $s(a,b;X)$ is unbiased estimating function, that is, for each $i\ge 1$, $\operatorname{\mathbf{E}}_{X_{0}}s(a_{i},b_{i};X_{0})=0$.
Expression (2.8) as a function of X is a mapping in ${\mathbb{R}}^{n\times d}$. Its derivative ${s^{\prime }_{X}}$ is a linear operator in this space.
Therefore, we can identify $\operatorname{\mathbf{E}}_{X_{0}}{s^{\prime }_{X}}(a_{i},b_{i};X_{0})$ with the matrix $a_{0i}{a_{0i}^{\operatorname{\mathsf{T}}}}$.
3 Main results
Introduce further assumptions to state the asymptotic normality of $\hat{X}_{\mathit{tls}}$. We need a bit higher moments compared with conditions (ii) and (iii) in order to use the Lyapunov CLT. Recall that $\tilde{z}_{i}$ satisfies condition (i).
-
(iv) For some $\delta >0$, $\operatorname{\mathbf{E}}\| \tilde{z}_{1}{\| }^{4+2\delta }<\infty $.
-
(v) For δ from condition (iv),
-
(vi) $\frac{1}{m}{\sum _{i=1}^{m}}a_{0i}\to \mu _{a}$ as $m\to \infty $, where $\mu _{a}\in {\mathbb{R}}^{n\times 1}$.
-
(vii) The distribution of $\tilde{z}_{1}$ is symmetric around the origin.
Introduce a random element in the space of systems consisting of five matrices:
(3.1)
\[W_{i}=\big(a_{0i}{\tilde{a}_{i}^{\operatorname{\mathsf{T}}}},a_{0i}{\tilde{b}_{i}^{\operatorname{\mathsf{T}}}},\tilde{a}_{i}{\tilde{a}_{i}^{\operatorname{\mathsf{T}}}}-{\sigma }^{2}\operatorname{I}_{n},\tilde{a}_{i}{\tilde{b}_{i}^{\operatorname{\mathsf{T}}}},\tilde{b}_{i}{\tilde{b}_{i}^{\operatorname{\mathsf{T}}}}-{\sigma }^{2}\operatorname{I}_{d}\big).\]Hereafter $\stackrel{\mathrm{d}}{\longrightarrow }$ stands for the convergence in distribution.
Now, we state the asymptotic normality of $\hat{X}_{\mathit{tls}}$.
Theorem 8.
-
(a) Assume conditions (i) and (iii)–(vi). Then
(3.3)
\[\frac{1}{\sqrt{m}}(\hat{X}_{\mathit{tls}}-X_{0})\stackrel{\mathrm{d}}{\longrightarrow }{V_{A}^{-1}}\varGamma (X_{0})\hspace{1em}\textit{as }m\to \infty ,\](3.4)
\[\varGamma (X):=\varGamma _{1}X-\varGamma _{2}+\varGamma _{3}X-\varGamma _{4}-X{\big(\operatorname{I}_{d}+{X}^{\operatorname{\mathsf{T}}}X\big)}^{-1}\big({X}^{\operatorname{\mathsf{T}}}\varGamma _{3}X-{X}^{\operatorname{\mathsf{T}}}\varGamma _{4}-{\varGamma _{4}^{\operatorname{\mathsf{T}}}}X+\varGamma _{5}\big),\] -
(b) In the assumption of part (a), replace condition (vi) with condition (vii). Then the convergence (3.3) still holds, and, moreover, the limit random matrix $X_{\infty }:={V_{A}^{-1}}\varGamma (X_{0})$ has a nonsingular covariance structure, that is, for each nonzero vector $u\in {\mathbb{R}}^{d\times 1}$, $\operatorname{\mathbf{cov}}(X_{\infty }u)$ is a nonsingular matrix.
Remark 9.
Conditions of Theorem 8(a) are similar to Gallo’s conditions [4] for the asymptotic normality in the univariate case; see also, [9], pp. 240–243. Compared with Theorems 2.3 and 2.4 of [7], stated for univariate case with mixing errors, we need not the requirement for entries of the true input $A_{0}$ to be totally bounded.
In [7], Section 2, we can find a discussion of importance of the asymptotic normality result for $\hat{X}_{\mathit{tls}}$. It is claimed there that the formula for the asymptotic covariance structure of $\hat{X}_{\mathit{tls}}$ is computationally useless, but in case where the limit distribution is nonsingular, we can use the block-bootstrap techniques when constructing confidence intervals and testing hypotheses.
However, in the case of normal errors $\tilde{z}_{i}$, we can apply Theorem 8(b) to construct the asymptotic confidence ellipsoid, say, for $X_{0}u$, $u\in {\mathbb{R}}^{d\times 1}$, $u\ne 0$. Indeed, relations (3.1)–(3.4) show that the nonsingular matrix
is a continuous function $S_{u}=S_{u}(X_{0},\mathbf{V}_{A},{\sigma }^{2})$ of unknown parameters $X_{0}$, $\mathbf{V}_{A}$, and ${\sigma }^{2}$. (It is important here that now the components $\varGamma _{j}$ of Γ are independent, and the covariance structure of each $\varGamma _{j}$ depends on ${\sigma }^{2}$ and $\mathbf{V}_{A}$, not on some other limit characteristics of $A_{0}$; see Lemma 6.) Once we possess consistent estimators $\hat{\mathbf{V}}_{A}$ and ${\hat{\sigma }}^{2}$ of $\mathbf{V}_{A}$ and ${\sigma }^{2}$, the matrix $\hat{S}_{u}:=S_{u}(\hat{X}_{\mathit{tls}},\hat{\mathbf{V}}_{A},{\hat{\sigma }}^{2})$ is a consistent estimator for the covariance matrix $S_{u}$.
Hereafter, a bar means averaging for rows $i=1,\dots ,m$, for example, $\overline{a{b}^{\operatorname{\mathsf{T}}}}=m\frac{1}{m}{\sum _{i=1}^{m}}a_{i}{b_{i}^{\operatorname{\mathsf{T}}}}$.
Lemma 10.
Assume the conditions of Theorem 2. Define
Then
(3.5)
\[{\hat{\sigma }}^{2}=\frac{1}{d}\mathrm{tr}\big[\big(\overline{b{b}^{\operatorname{\mathsf{T}}}}-2{\hat{X}_{\mathit{tls}}^{\operatorname{\mathsf{T}}}}\overline{a{b}^{\operatorname{\mathsf{T}}}}+{\hat{X}_{\mathit{tls}}^{\operatorname{\mathsf{T}}}}\overline{a{a}^{\operatorname{\mathsf{T}}}}\hat{X}_{\mathit{tls}}\big){\big(\operatorname{I}_{d}+{\hat{X}_{\mathit{tls}}^{\operatorname{\mathsf{T}}}}\hat{X}_{\mathit{tls}}\big)}^{-1}\big],\]Finally, for the case $\tilde{z}_{1}\sim N(0,{\sigma }^{2}\operatorname{I}_{n+d})$, based on Lemma 10 and the relations
\[\frac{1}{\sqrt{m}}(\hat{X}_{\mathit{tls}}u-X_{0}u)\stackrel{\mathrm{d}}{\longrightarrow }N(0,S_{u}),\hspace{1em}S_{u}>0,\hspace{2.5pt}\hat{S}_{u}\stackrel{\mathrm{P}}{\longrightarrow }S_{u},\]
we can construct the asymptotic confidence ellipsoid for the vector $X_{0}u$ in a standard way.
4 Conclusion
We extended the result of Gallo [4] and proved the asymptotic normality of the TLS estimator in a multivariate model $AX\approx B$. The normalized estimator converges in distribution to a random matrix with quite complicated covariance structure. If the error distribution is symmetric around the origin, then the latter covariance structure is nonsingular. For the case of normal errors, this makes it possible to construct the asymptotic confidence region for a vector $X_{0}u$, $u\in {\mathbb{R}}^{d\times 1}$, where $X_{0}$ is the true value of X.
In future papers, we will extend the result for the elementwise weighted TLS estimator [5] in the model $AX\approx B$, where some columns of the matrix $[A,B]$ may be observed without errors, and, in addition, the error covariance matrix may differ from row to row.