1 Introduction
We study an overdetermined system of linear equations AX≈B, which often occurs in the problems of dynamical system identification [10]. If matrices A and B are observed with additive uncorrelated errors of equal size, then the total least squares (TLS) method is used to solve the system [10].
In papers [3, 7, 9], under various conditions, the consistency of the TLS estimator ˆX is proven as the number m of rows in the matrix A is increasing, assuming that the true value A0 of the input matrix is nonrandom. The asymptotic normality of the estimator is studied in [3] and [6].
The model AX≈B with random measurement errors corresponds to the vector linear errors-in-variables model (EIVM). In [2], a goodness-of-fit test is constructed for a polynomial EIVM with nonrandom latent variable (i.e., in the functional case); the test can be also used in the structural case, where the latent variable is random with unknown probability distribution. A more powerful test in the polynomial EIVM is elaborated in [4].
In the paper [5], a goodness-of-fit test is constructed for the functional model AX≈B, assuming that the error matrices ˜A and ˜B are independent and the covariance structure of ˜A is known. In the present paper, we construct a goodness-of-fit test in a more common situation, where the total covariance structure of the matrices ˜A and ˜B is known up to a scalar factor. A test statistic is based on the TLS estimator ˆX. Under the null hypothesis, the asymptotic behavior of the test statistic is studied based on results of [6] and, under local alternatives, based on [9].
The present paper is organized as follows. In Section 2, we describe the observation model, introduce the TLS estimator, and formulate known results on the strong consistency and asymptotic normality of the estimator. In the next section, we construct the goodness-of-fit test and show that the proposed test statistic has an asymptotic chi-squared distribution with the corresponding number of degrees of freedom. The power of the test with respect to the local alternatives is studied in Section 4, and Section 5 concludes. The proofs are given in Appendix.
We use the following notation: ‖C‖=√∑i,jc2ij is the Frobenius norm of a matrix C=(cij), and Ip is the unit matrix of size p. The symbol E denotes the expectation and acts as an operator on the total product of quantities, and cov means the covariance matrix of a random vector. The upper index ⊤ denotes transposition. In the paper, all the vectors are column ones. The bar means averaging over i=1,…,m, for example, ˉa:=m−1∑mi=1ai, ¯ab⊤:=m−1∑mi=1aib⊤i. Convergence with probability one, in probability, and in distribution are denoted as P1→, P→, and d→, respectively. A sequence of random matrices that converges to zero in probability is denoted as op(1), and a sequence of stochastically bounded random matrices is denoted as Op(1). The notation εd=ε1 means that random variables ε and ε1 have the same probability distribution. Positive constants that do not depend on the sample size m are denoted as const, so that equalities like 2⋅const=const are possible.
2 Observation model and total least squares estimator
2.1 The TLS problem
Consider the observation model
where A0∈Rm×n, X0∈Rn×d, and B0∈Rm×d. The matrices A and B contain the data, A0 and B0 are unknown nonrandom matrices, and ˜A, ˜B are the matrices of random errors.
We can rewrite model (2.1) in an implicit way. Introduce three matrices of size m×(n+d):
Then
Let A⊤=[a1…am], B⊤=[b1…bm], and we use similar notation for the rows of the matrices C, A0, B0, ˜A, ˜B, and ˜C. Rewrite model (2.1) as a multivariate linear one:
Throughout the paper, the following assumption holds about the errors ~ci=[~ai⊤˜b⊤i]⊤:
Thus, the total error covariance structure is assumed to be known up to a scalar factor σ2, and the errors are uncorrelated with equal variances.
For model (2.1), the TLS problem lies in searching such disturbances ΔˆA and ΔˆB that minimize the sum of squared corrections
provided that
2.2 The TLS estimator and its consistency
It can happen that for certain random realization, the optimization problem (2.6)–(2.7) has no solution. In the latter case, we set ˆX=∞.
We need the following conditions to provide the consistency of the estimator:
The next result on the strong consistency of the estimator follows, for example, from Theorem 4.3 in [9].
Define the loss function Q(X) as follows: It is known that the TLS estimator minimizes the loss function (2.9); see formula (24) in [7].
Introduce the following unbiased estimating function related to the elementary loss function (2.8):
2.3 Asymptotic normality of the estimator
We need further restrictions on the model. Recall that the augmented errors ˜ci were introduced in Section 2.2, and the vectors a0i, ˜bi, and so on are those from model (2.3)–(2.4).
Denote by ˜c(p)1 the pth coordinate of the vector ˜c1.
Under assumptions (i) and (iv), condition (vi) holds, for example, in two cases: (a) when the random vector ˜c1 is symmetrically distributed, or (b) when the components of the vector ˜c1 are independent and, moreover, for each p=1,…,n+d, the asymmetry coefficient of the random variable ˜c(p)1 equals 0.
Introduce the following random element in the space of collections of five matrices:
The next statement on the asymptotic normality of the estimator follows from the proof of Theorem 8(b) in [6], where, instead of condition (vi), there was a stronger assumption that ˜c1 is symmetrically distributed, but the proof of Theorem 8(b) in [6] still works under the weaker condition (vi).
Let a consistent estimator ˆf=ˆfm of the vector f be given. We want to construct a consistent estimator of matrix (2.16). The matrix S(X0,f) is expressed, for example, via the fourth moments of errors ˜ci, and those moments cannot be consistently estimated without additional assumptions on the error probability distribution. Therefore, an explicit expression for the latter matrix does not help to construct the desirable estimator. Nevertheless, we can construct something like the sandwich estimator [1, pp. 368–369].
The next statement on the consistency of the nuisance parameter estimators follows from the proof of Lemma 10 in [6]. Recall that the bar means averaging over the observations; see Section 1.
The next asymptotic expansion of the TLS estimator is presented in [6], formulas (4.10) and (4.11).
In view of Lemma 7, introduce the sandwich estimator ˆS(ˆf) of the matrix (2.16):
where the estimator ˆVA is given in (2.18).
Theorem 8.
Let f∈Rn×1, and let ˆf be a consistent estimator of this vector. Under the conditions of Theorem 4, the statistic ˆS(ˆf) is a consistent estimator of the matrix S(X0,f), that is, ˆS(ˆf)P→S(X0,f).
Appendix contains the proof of this theorem and of all further statements.
3 Construction of goodness-of-fit test
For the observation model (2.4), we test the following hypotheses concerning the response b and the latent variable a0:
In fact, the null hypothesis means that the observation model (1.3)–(1.4) holds. Based on observations ai, bi, i=1,…,m, we want to construct a test statistic to check this hypothesis. Let
We need the following stabilization condition on the latent variable:
To ensure the nonsingularity of the matrix ΣT, we impose a final restriction on the observation model:
and, moreover, the matrix Sa is nonsingular.
Lemma 11.
For m≥1 and ω from the underlying probability space Ω such that ˆΣT is positive definite, we define the test statistic
Given a confidence level α, 0<α<1/2, let χ2dα be the upper α-quantile of the χ2d probability law, that is, P{χ2d>χ2dα}=α. Based on Theorem 14, we construct the following goodness-of-fit test with the asymptotic confidence probability 1−α:
4 Power of the test
Consider a sequence of models
Here g:Rn→Rd is a given nonlinear perturbation of the linear regression function.
For arbitrary function f(a0), denote the limit of averages
provided that the limit exists and is finite.
In order to study the behavior of the test statistic under local alternatives H1,m, we impose two restrictions on the perturbation function g:
Under local alternatives H1,m, we ensure the weak consistency and asymptotic normality of the TLS estimator ˆX.
Now, we define the noncentral chi-squared distribution χ2d(τ) with d degrees of freedom and the noncentrality parameter τ.
Definition 17.
For d≥1 and τ≥0, let χ2d(τ)d=‖N(τe,Id)‖2, where e∈Rd, ‖e‖=1, or, equivalently, χ2d(τ)d=(γ1+τ)2+∑di=2γ2i, where {γi} are i.i.d. standard normal random variables.
Lemma 16 implies directly the following convergence.
Theorem 18 makes it possible to find the asymptotic power of the test under local alternatives H1,m. It is evident that the asymptotic power is an increasing function of τ=‖Σ−1/2TCT‖. In other words, the larger τ, the more powerful the test.
5 Conclusion
We constructed a goodness-of-fit test for a multivariate linear errors-in-variables model, provided that the errors are uncorrelated with equal (unknown) variances and vanishing third moments. The latter moment assumption makes it possible to estimate consistently the asymptotic covariance matrix ΣT of the statistic T0m and construct the test statistic T2m, which has the asymptotic χ2d distribution under the null hypothesis. The local alternatives H1,m are presented, under which the test statistic has the noncentral χ2d(τ) asymptotic distribution. The larger τ, the larger the asymptotic power of the test.
In future, we will try to construct, like in [5], a more powerful test using within a test statistic the exponential weight function
To this end, it is necessary to require the independence he terrors ˜bi and ˜ai and also the existence of exponential moments of the errors ˜ai. This is the price for a greater power of the test.