1 Introduction
We study an overdetermined system of linear equations $AX\approx B$, which often occurs in the problems of dynamical system identification [10]. If matrices A and B are observed with additive uncorrelated errors of equal size, then the total least squares (TLS) method is used to solve the system [10].
In papers [3, 7, 9], under various conditions, the consistency of the TLS estimator $\hat{X}$ is proven as the number m of rows in the matrix A is increasing, assuming that the true value ${A}^{0}$ of the input matrix is nonrandom. The asymptotic normality of the estimator is studied in [3] and [6].
The model $AX\approx B$ with random measurement errors corresponds to the vector linear errors-in-variables model (EIVM). In [2], a goodness-of-fit test is constructed for a polynomial EIVM with nonrandom latent variable (i.e., in the functional case); the test can be also used in the structural case, where the latent variable is random with unknown probability distribution. A more powerful test in the polynomial EIVM is elaborated in [4].
In the paper [5], a goodness-of-fit test is constructed for the functional model $AX\approx B$, assuming that the error matrices $\tilde{A}$ and $\tilde{B}$ are independent and the covariance structure of $\tilde{A}$ is known. In the present paper, we construct a goodness-of-fit test in a more common situation, where the total covariance structure of the matrices $\tilde{A}$ and $\tilde{B}$ is known up to a scalar factor. A test statistic is based on the TLS estimator $\hat{X}$. Under the null hypothesis, the asymptotic behavior of the test statistic is studied based on results of [6] and, under local alternatives, based on [9].
The present paper is organized as follows. In Section 2, we describe the observation model, introduce the TLS estimator, and formulate known results on the strong consistency and asymptotic normality of the estimator. In the next section, we construct the goodness-of-fit test and show that the proposed test statistic has an asymptotic chi-squared distribution with the corresponding number of degrees of freedom. The power of the test with respect to the local alternatives is studied in Section 4, and Section 5 concludes. The proofs are given in Appendix.
We use the following notation: $\| C\| =\sqrt{\sum _{i,j}{c_{ij}^{2}}}$ is the Frobenius norm of a matrix $C=(c_{ij})$, and $\mathrm{I}_{p}$ is the unit matrix of size p. The symbol $\operatorname{\mathsf{E}}$ denotes the expectation and acts as an operator on the total product of quantities, and $\operatorname{\mathbf{cov}}$ means the covariance matrix of a random vector. The upper index ⊤ denotes transposition. In the paper, all the vectors are column ones. The bar means averaging over $i=1,\dots ,m$, for example, $\bar{a}:={m}^{-1}{\sum _{i=1}^{m}}a_{i}$, $\overline{a{b}^{\top }}:={m}^{-1}{\sum _{i=1}^{m}}a_{i}{b_{i}^{\top }}$. Convergence with probability one, in probability, and in distribution are denoted as $\stackrel{\mathrm{P}\mathrm{1}}{\to }$, $\stackrel{\mathrm{P}}{\to }$, and $\stackrel{\mathrm{d}}{\to }$, respectively. A sequence of random matrices that converges to zero in probability is denoted as $o_{p}(1)$, and a sequence of stochastically bounded random matrices is denoted as $O_{p}(1)$. The notation $\varepsilon \stackrel{\mathrm{d}}{=}\varepsilon _{1}$ means that random variables ε and $\varepsilon _{1}$ have the same probability distribution. Positive constants that do not depend on the sample size m are denoted as $\mathit{const}$, so that equalities like $2\cdot \mathit{const}=\mathit{const}$ are possible.
2 Observation model and total least squares estimator
2.1 The TLS problem
Consider the observation model
where ${A}^{0}\in {\mathbb{R}}^{m\times n}$, ${X}^{0}\in {\mathbb{R}}^{n\times d}$, and ${B}^{0}\in {\mathbb{R}}^{m\times d}$. The matrices A and B contain the data, ${A}^{0}$ and ${B}^{0}$ are unknown nonrandom matrices, and $\tilde{A}$, $\tilde{B}$ are the matrices of random errors.
We can rewrite model (2.1) in an implicit way. Introduce three matrices of size $m\times (n+d)$:
Then
Let ${A}^{\top }=[a_{1}\dots a_{m}]$, ${B}^{\top }=[b_{1}\dots b_{m}]$, and we use similar notation for the rows of the matrices C, ${A}^{0}$, ${B}^{0}$, $\tilde{A}$, $\tilde{B}$, and $\tilde{C}$. Rewrite model (2.1) as a multivariate linear one:
Throughout the paper, the following assumption holds about the errors $\tilde{c_{i}}={[{\tilde{a_{i}}}^{\top }{\tilde{b}_{i}^{\top }}]}^{\top }$:
Thus, the total error covariance structure is assumed to be known up to a scalar factor ${\sigma }^{2}$, and the errors are uncorrelated with equal variances.
For model (2.1), the TLS problem lies in searching such disturbances $\Delta \hat{A}$ and $\Delta \hat{B}$ that minimize the sum of squared corrections
provided that
2.2 The TLS estimator and its consistency
It can happen that for certain random realization, the optimization problem (2.6)–(2.7) has no solution. In the latter case, we set $\hat{X}=\infty $.
Definition 1.
The TLS estimator $\hat{X}$ of the matrix parameter ${X}^{0}$ in the model (2.1) is a Borel-measurable function of the observed matrices A and B such that its values lie in ${\mathbb{R}}^{n\times d}\cup \{\infty \}$ and it provides a solution to problem (2.6)–(2.7) in case there exists a solution, and $\hat{X}=\infty $ otherwise.
We need the following conditions to provide the consistency of the estimator:
The next result on the strong consistency of the estimator follows, for example, from Theorem 4.3 in [9].
Define the loss function $Q(X)$ as follows:
It is known that the TLS estimator minimizes the loss function (2.9); see formula (24) in [7].
Introduce the following unbiased estimating function related to the elementary loss function (2.8):
2.3 Asymptotic normality of the estimator
We need further restrictions on the model. Recall that the augmented errors $\tilde{c}_{i}$ were introduced in Section 2.2, and the vectors ${a_{i}^{0}}$, $\tilde{b}_{i}$, and so on are those from model (2.3)–(2.4).
-
(iv) $\operatorname{\mathsf{E}}\| \tilde{c}_{1}{\| }^{4+2\delta }<\infty $ for some $\delta >0$;
-
(v) For δ from condition (iv), $\frac{1}{{m}^{1+\delta /2}}{\sum _{i=1}^{m}}\| {a_{i}^{0}}{\| }^{2+\delta }\to 0$ as $m\to \infty $.
Denote by ${\tilde{c}_{1}^{(p)}}$ the pth coordinate of the vector $\tilde{c}_{1}$.
Under assumptions (i) and (iv), condition (vi) holds, for example, in two cases: (a) when the random vector $\tilde{c}_{1}$ is symmetrically distributed, or (b) when the components of the vector $\tilde{c}_{1}$ are independent and, moreover, for each $p=1,\dots ,n+d$, the asymmetry coefficient of the random variable ${\tilde{c}_{1}^{(p)}}$ equals 0.
Introduce the following random element in the space of collections of five matrices:
The next statement on the asymptotic normality of the estimator follows from the proof of Theorem 8(b) in [6], where, instead of condition (vi), there was a stronger assumption that $\tilde{c}_{1}$ is symmetrically distributed, but the proof of Theorem 8(b) in [6] still works under the weaker condition (vi).
Theorem 4.
Assume conditions (i) and (iii)–(vi). Then:
-
(b)
(2.13)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \sqrt{m}\big(\hat{X}-{X}^{0}\big)& \displaystyle \stackrel{\mathrm{d}}{\to }{V_{A}^{-1}}\varGamma \big({X}^{0}\big)\hspace{1em}\textit{as }\hspace{2.5pt}m\to \infty ,\end{array}\](2.14)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \varGamma (X)& \displaystyle :=\varGamma _{1}X-\varGamma _{2}+\varGamma _{3}X-\varGamma _{4}\\{} & \displaystyle \hspace{1em}-X{\big(\mathrm{I}_{d}+{X}^{\top }X\big)}^{-1}\big({X}^{\top }\varGamma _{3}X-{X}^{\top }\varGamma _{4}-{\varGamma _{4}^{\top }}X+\varGamma _{5}\big),\end{array}\]
Let $f\in {\mathbb{R}}^{n\times 1}$. Under the conditions of Theorem 4, the convergence (2.13) implies that
Let a consistent estimator $\hat{f}=\hat{f}_{m}$ of the vector f be given. We want to construct a consistent estimator of matrix (2.16). The matrix $S({X}^{0},f)$ is expressed, for example, via the fourth moments of errors $\tilde{c}_{i}$, and those moments cannot be consistently estimated without additional assumptions on the error probability distribution. Therefore, an explicit expression for the latter matrix does not help to construct the desirable estimator. Nevertheless, we can construct something like the sandwich estimator [1, pp. 368–369].
The next statement on the consistency of the nuisance parameter estimators follows from the proof of Lemma 10 in [6]. Recall that the bar means averaging over the observations; see Section 1.
Lemma 6.
Assume the conditions of Theorem 4. Define the estimators:
Then
(2.17)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle {\hat{\sigma }}^{2}& \displaystyle =\frac{1}{d}\mathrm{tr}\big[\big(\overline{b{b}^{\top }}-2{\hat{X}}^{\top }\overline{a{b}^{\top }}+{\hat{X}}^{\top }\overline{a{a}^{\top }}\hat{X}\big){\big(\mathrm{I}_{d}+{\hat{X}}^{\top }\hat{X}\big)}^{-1}\big],\end{array}\]The next asymptotic expansion of the TLS estimator is presented in [6], formulas (4.10) and (4.11).
In view of Lemma 7, introduce the sandwich estimator $\hat{S}(\hat{f})$ of the matrix (2.16):
where the estimator $\hat{V}_{A}$ is given in (2.18).
(2.21)
\[\hat{S}(\hat{f})=\frac{1}{m}\sum \limits_{i=1}^{m}{s}^{\top }(a_{i},b_{i};\hat{X})\hspace{2.5pt}{\hat{V}_{A}^{-1}}\hat{f}{\hat{f}}^{\top }{\hat{V}_{A}^{-1}}\hspace{2.5pt}s(a_{i},b_{i};\hat{X}),\]Theorem 8.
Let $f\in {\mathbb{R}}^{n\times 1}$, and let $\hat{f}$ be a consistent estimator of this vector. Under the conditions of Theorem 4, the statistic $\hat{S}(\hat{f})$ is a consistent estimator of the matrix $S({X}^{0},f)$, that is, $\hat{S}(\hat{f})\stackrel{\mathrm{P}}{\to }S({X}^{0},f)$.
Appendix contains the proof of this theorem and of all further statements.
3 Construction of goodness-of-fit test
For the observation model (2.4), we test the following hypotheses concerning the response b and the latent variable ${a}^{0}$:
$\textbf{H}_{0}$ There exists such a matrix $X\in {\mathbb{R}}^{n\times d}$ that
$\textbf{H}_{1}$ For each matrix $X\in {\mathbb{R}}^{n\times d}$,
In fact, the null hypothesis means that the observation model (1.3)–(1.4) holds. Based on observations $a_{i}$, $b_{i}$, $i=1,\dots ,m$, we want to construct a test statistic to check this hypothesis. Let
We need the following stabilization condition on the latent variable:
To ensure the nonsingularity of the matrix $\varSigma _{T}$, we impose a final restriction on the observation model:
and, moreover, the matrix $S_{a}$ is nonsingular.
Lemma 10.
Assume conditions (i) and (iii)–(vii). Then
(3.5)
\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \sqrt{m}{T_{m}^{0}}\stackrel{\mathrm{d}}{\to }N(0,\varSigma _{T}),\\{} & \displaystyle \hspace{1em}\varSigma _{T}={\sigma }^{2}\big(1-2{\mu _{a}^{\top }}{V_{A}^{-1}}\mu _{a}\big)\big(\mathrm{I}_{d}+{X}^{0\top }{X}^{0}\big)+S\big({X}^{0},\mu _{a}\big).\end{array}\]Lemma 11.
Assume the conditions of Lemma 10. Then:
-
(a) A strong consistent estimator of the vector $\mu _{a}$ from condition (vii) is given by the statistic
-
(b) A consistent estimator of matrix (3.5) is given by the matrix statistic
(3.6)
\[\hat{\varSigma }_{T}:={\hat{\sigma }}^{2}\big(1-2{\hat{\mu }_{a}^{\top }}{\hat{V}_{A}^{-1}}\hat{\mu }_{a}\big)\big(\mathrm{I}_{d}+{\hat{X}}^{\top }\hat{X}\big)+\hat{S}(\hat{\mu }_{a}),\]
Remark 12.
Assume conditions (vii) and (viii). Then
\[\frac{1}{m}{A}^{0\top }{A}^{0}=\frac{1}{m}\sum \limits_{i=1}^{m}{a_{i}^{0}}{a_{i}^{0\top }}\to V_{A}=S_{a}+\mu _{a}{\mu _{a}^{\top }}\hspace{1em}\text{as}\hspace{2.5pt}m\to \infty ,\]
and $V_{A}$ is nonsingular as a sum of positive definite and positive semidefinite matrices. Thus, condition (iii) is a consequence of assumptions (vii) and (viii).For $m\ge 1$ and ω from the underlying probability space Ω such that $\hat{\varSigma }_{T}$ is positive definite, we define the test statistic
Given a confidence level α, $0<\alpha <1/2$, let ${\chi _{d\alpha }^{2}}$ be the upper α-quantile of the ${\chi _{d}^{2}}\hspace{2.5pt}$ probability law, that is, $\operatorname{\mathsf{P}}\{{\chi _{d}^{2}}>{\chi _{d\alpha }^{2}}\}=\alpha $. Based on Theorem 14, we construct the following goodness-of-fit test with the asymptotic confidence probability $1-\alpha $:
4 Power of the test
Consider a sequence of models
Here $g:{\mathbb{R}}^{n}\to {\mathbb{R}}^{d}$ is a given nonlinear perturbation of the linear regression function.
(4.1)
\[\textbf{H}_{1,m}:\hspace{1em}b_{i}={X}^{\top }{a_{i}^{0}}+\frac{g({a_{i}^{0}})}{\sqrt{m}}+\tilde{b}_{i},\hspace{2em}a_{i}={a_{i}^{0}}+\tilde{a}_{i},\hspace{1em}i=1,\dots ,m.\]For arbitrary function $f({a}^{0})$, denote the limit of averages
provided that the limit exists and is finite.
In order to study the behavior of the test statistic under local alternatives $\textbf{H}_{1,m}$, we impose two restrictions on the perturbation function g:
Under local alternatives $\textbf{H}_{1,m}$, we ensure the weak consistency and asymptotic normality of the TLS estimator $\hat{X}$.
Lemma 15.
Lemma 16.
Assume the conditions of Lemma 15. Then under local alternatives $\textbf{\textit{H}}_{1,m}$, we have:
Now, we define the noncentral chi-squared distribution ${\chi _{d}^{2}}(\tau )$ with d degrees of freedom and the noncentrality parameter τ.
Definition 17.
For $d\ge 1$ and $\tau \ge 0$, let ${\chi _{d}^{2}}(\tau )\stackrel{\mathrm{d}}{=}\| N(\tau e,\mathrm{I}_{d}){\| }^{2}$, where $e\in {\mathbb{R}}^{d}$, $\| e\| =1$, or, equivalently, ${\chi _{d}^{2}}(\tau )\stackrel{\mathrm{d}}{=}{(\gamma _{1}+\tau )}^{2}+{\sum _{i=2}^{d}}{\gamma _{i}^{2}}$, where $\{\gamma _{i}\}$ are i.i.d. standard normal random variables.
Lemma 16 implies directly the following convergence.
Theorem 18 makes it possible to find the asymptotic power of the test under local alternatives $\textbf{H}_{1,m}$. It is evident that the asymptotic power is an increasing function of $\tau =\| {\varSigma _{T}^{-1/2}}C_{T}\| $. In other words, the larger τ, the more powerful the test.
5 Conclusion
We constructed a goodness-of-fit test for a multivariate linear errors-in-variables model, provided that the errors are uncorrelated with equal (unknown) variances and vanishing third moments. The latter moment assumption makes it possible to estimate consistently the asymptotic covariance matrix $\varSigma _{T}$ of the statistic ${T_{m}^{0}}$ and construct the test statistic ${T_{m}^{2}}$, which has the asymptotic ${\chi _{d}^{2}}$ distribution under the null hypothesis. The local alternatives $\textbf{H}_{1,m}$ are presented, under which the test statistic has the noncentral ${\chi _{d}^{2}}(\tau )$ asymptotic distribution. The larger τ, the larger the asymptotic power of the test.
In future, we will try to construct, like in [5], a more powerful test using within a test statistic the exponential weight function
\[\omega _{\lambda }(a)={e}^{{\lambda }^{\top }a},\hspace{1em}\lambda \in {\mathbb{R}}^{n\times 1}.\]
To this end, it is necessary to require the independence he terrors $\tilde{b}_{i}$ and $\tilde{a}_{i}$ and also the existence of exponential moments of the errors $\tilde{a}_{i}$. This is the price for a greater power of the test.