Consistency of the total least squares estimator in the linear errors-in-variables regression

Shklyar, Sergiy

doi:10.15559/18-VMSTA104

Modern Stochastics: Theory and Applications

Consistency of the total least squares estimator in the linear errors-in-variables regression

Volume 5, Issue 3 (2018), pp. 247–295

Sergiy Shklyar

https://doi.org/10.15559/18-VMSTA104

Pub. online: 30 May 2018 Type: Research Article

Open Access

Received
31 October 2017

Revised
7 May 2018

Accepted
8 May 2018

Published
30 May 2018

Abstract

This paper deals with a homoskedastic errors-in-variables linear regression model and properties of the total least squares (TLS) estimator. We partly revise the consistency results for the TLS estimator previously obtained by the author [18]. We present complete and comprehensive proofs of consistency theorems. A theoretical foundation for construction of the TLS estimator and its relation to the generalized eigenvalue problem is explained. Particularly, the uniqueness of the estimate is proved. The Frobenius norm in the definition of the estimator can be substituted by the spectral norm, or by any other unitarily invariant norm; then the consistency results are still valid.

1 Introduction

We consider a functional linear error-in-variables model. Let $\{{a_{i}^{0}},\hspace{0.2778em}i\ge 1\}$ be a sequence of unobserved nonrandom n-dimensional vectors. The elements of the vectors are true explanatory variables or (in other terminology) true regressors. We observe m n-dimensional random vectors ${a_{1}},\dots ,{a_{m}}$ and m d-dimensional random vectors ${b_{1}},\dots ,{b_{m}}$. They are thought to be true vectors ${a_{i}^{0}}$ and ${X_{0}^{\top }}{a_{i}^{0}}$, respectively, plus additive errors:

(1)

\[ \left\{\begin{array}{l}{b_{i}}={X_{0}^{\top }}{a_{i}^{0}}+{\tilde{b}_{i}},\hspace{1em}\\{} {a_{i}}={a_{i}^{0}}+{\tilde{a}_{i}},\hspace{1em}\end{array}\right.\]

where ${\tilde{a}_{i}}$ and ${\tilde{b}_{i}}$ are random measurement errors in the regressor and in the response. A nonrandom matrix ${X_{0}}$ is estimated based on observations ${a_{i}}$, ${b_{i}}$, $i=1,\dots ,m$.

This problem is related to finding an approximate solution to incompatible linear equations (“overdetermined” linear equation, because the number of equations exceeds the number of variables)

\[ AX\approx B,\]

where $A={[{a_{1}},\dots ,{a_{m}}]}^{\top }$ is an $m\times n$ matrix and $B={[{b_{1}},\dots ,{b_{m}}]}^{\top }$ is an $m\times d$ matrix. Here X is an unknown $n\times d$ matrix.

In the linear error-in-variables regression model (1), the Total Least Squares (TLS) estimator in widely used. It is a multivariate equivalent to the orthogonal regression estimator. We are looking for conditions that provide consistency or strong consistency of the estimator. It is assumed (for granted) that the measurement errors ${\tilde{c}_{i}}=(\begin{array}{c}{\tilde{a}_{i}}\\{} {\tilde{b}_{i}}\end{array})$, $i=1,2,\dots $, are independent and have the same covariance matrix Σ. It may be singular. In particular, some of regressors may be observed without errors. (If the matrix Σ is nonsingular, the proofs can be simplified.) An intercept can be introduced into (1) by augmenting the model and inserting a constant error-free regressor.

Sufficient conditions for consistency of the estimator are presented in Gleser [5], Gallo [4], Kukush and Van Huffel [10]. In [18], the consistency results are obtained under less restrictive conditions than in [10]. In particular, there is no requirement that

\[ \frac{{\lambda _{\min }^{2}}({A_{0}^{\top }}{A_{0}})}{{\lambda _{\max }}({A_{0}^{\top }}{A_{0}})}\to \infty \hspace{1em}\text{as}\hspace{1em}m\to \infty ,\]

where ${A_{0}}={[{a_{1}^{0}},\dots ,{a_{m}^{0}}]}^{\top }$ is the matrix A without measurement errors. Hereafter, ${\lambda _{\min }}$ and ${\lambda _{\max }}$ denotes the minimum and maximum eigenvalues of a matrix if all the eigenvalues are real numbers. The matrix ${A_{0}^{\top }}{A_{0}^{}}$ is symmetric (and positive semidefinite). Hence, its eigenvalues are real (and nonnegative).

The model where some variables are explanatory and the other are response is called explicit. The alternative is the implicit model, where all the variables are treated equally. In the implicit model, the n-dimensional linear subspace in ${\mathbb{R}}^{n+d}$ is fitted to an observed set of points. Some n-dimensional subspaces can be represented in a form $\{(a,b)\in {\mathbb{R}}^{n+d}:b={X}^{\top }a\}$ for some $n\times d$ matrix X; such subspaces are called generic. The other subspaces are called non-generic. The true points lie on a generic subspace $\{(a,b):b={X_{0}^{\top }}a\}$. A consistently estimated subspace must be generic with high probability. We state our results for the explicit model, but use the ideas of the implicit model in the definition of the estimator, as well as in proofs.

We allow errors in different variables to correlate. Our problem is a minor generalization of the mixed LS-TLS problem, which is studied in [20, Section 3.5]. In the latter problem, some explanatory variables are observed without errors; the other explanatory variables and all the response variables are observed with errors. The errors have the same variance and are uncorrelated. The basic LS model (where the explanatory variables are error-free, and the response variables are error-ridden) and the basic TLS model (where all the variables are observed with error, and the errors are uncorrelated) are marginal cases of the mixed LS-TLS problem. By a linear transformation of variables our model can be transformed into either a mixed LS-TLS or basic LS or basic TLS problem. (We do not handle the case where there are more error-free variables than explanatory variables.) Such a transformation does not always preserve the sets of generic and non-generic subspaces. The mixed LS-TLS problem can be transformed into the basic TLS problem as it is shown in [6].

The Weighted TLS and Structured TLS estimators are generalizations of the TLS estimator for the cases where the error covariance matrices do not coincide for different observations or where the errors for different observations are dependent; more precisely, the independence condition is replaced with the condition on the “structure of the errors”. The consistency of these estimators is proved in Kukush and Van Huffel [10] and Kukush et al. [9]. Relaxing conditions for consistency of the Weighted TLS and Structured TLS estimators is an interesting topic for a future research. For generalizations of the TLS problem, see the monograph [13] and the review [12].

In the present paper, for a multivariate regression model with multiple response variables we consider two versions of the TLS estimator. In these estimators, different norms of the weighted residual matrix are minimized. (These estimators coincide for the univariate regression model.) The common way to construct the estimator is to minimize the Frobenius norm. The estimator that minimizes the Frobenius norm also minimizes the spectral norm. Any estimator that minimizes the spectral norm is consistent under conditions of our consistency theorems (see Theorems 3.5–3.7 in Section 3). We also provide a sufficient condition for uniqueness of the estimator that minimizes the Frobenius norm.

In this paper, for the results on consistency of the TLS estimator which are stated in paper [18], we provide complete and comprehensive proofs and present all necessary auxiliary and complementary results. For convenience of the reader we first present the sketch of proof. Detailed proofs are postponed to the appendix. Moreover, the paper contains new results on the relation between the TLS estimator and the generalized eigenvalue problem.

The structure of the paper is as follows. In Section 2 we introduce the model and define the TLS estimator. The consistency theorems for different moment conditions on the errors and for different senses of consistency are stated in Section 3, and their proofs are sketched in Section 5. Section 4 states the existence and uniqueness of the TLS estimator. Auxiliary theoretical constructions and theorems are presented in Section 6. Section 7 explains the relationship between the TLS estimator and the generalized eigenvalue problem. The results in Section 7 are used in construction of the TLS estimator and in the proof of its uniqueness. Detailed proofs are moved to the appendix (Section 8).

Notations

At first, we list the general notation. For $v\hspace{0.1667em}=\hspace{0.1667em}{({x_{k}})_{k=1}^{n}}$ being a vector, $\| v\| \hspace{0.1667em}=\hspace{0.1667em}\sqrt{{\sum _{k=1}^{n}}{x_{k}^{2}}}$ is the 2-norm of v.

For $M={({x_{i,j}})_{i=1}^{m{_{}^{}}}}$ being an $m\times n$ matrix, $\| M\| ={\max _{v\ne 0}}\frac{\| Mv\| }{\| v\| }={\sigma _{\max }}(M)$ is the spectral norm of M; $\| M{\| _{F}}=\sqrt{{\sum _{i=1}^{m}}{\sum _{j=1}^{n}}{x_{i,j}^{2}}}$ is the Frobenius norm of M; ${\sigma _{\max }}(M)={\sigma _{1}}(M)\ge {\sigma _{2}}(M)\ge \cdots \ge {\sigma _{\min (m,n)}}(M)\ge 0$ are the singular values of M, arranged in descending order; $\operatorname{span}\langle M\rangle $ is the column space of M; $\operatorname{rk}M$ is the rank of M. For a square $n\times n$ matrix M, $\operatorname{def}M=n-\operatorname{rk}M$ is rank deficiency of M; $\operatorname{tr}M={\sum _{i=1}^{n}}{x_{i,i}}$ is the trace of M; ${\chi _{M}}(\lambda )=\det (M-\lambda I)$ is the characteristic polynomial of M. If M is an $n\times n$ matrix with real eigenvalues (e.g., if M is Hermitian or if M admits a decomposition $M=AB$, where A and B are Hermitian matrices, and either A or B is positive semidefinite), ${\lambda _{\min }}(M)={\lambda _{1}}(M)\le {\lambda _{2}}(M)\le \cdots \le {\lambda _{n}}(M)={\lambda _{\max }}(M)$ are eigenvalues of M arranged in ascending order.

For ${V_{1}}$ and ${V_{2}}$ being linear subspaces of ${\mathbb{R}}^{n}$ of equal dimension $\dim {V_{1}}=\dim {V_{2}}$, $\| \sin \angle ({V_{1}},{V_{2}})\| =\| {P_{{V_{1}}}}-{P_{{V_{2}}}}\| =\| {P_{{V_{1}}}}(I-{P_{{V_{2}}}})\| $ is the greatest sine of the canonical angles between ${V_{1}}$ and ${V_{2}}$. See Section 6.2 for more general definitions.

Now, list the model-specific notations. The notations (except for the matrix Σ) come from [9]. The notations are listed here only for reference; they are introduced elsewhere in this paper – in Sections 1 and 2.

n is the number of regressors, i.e., the number of explanatory variables for each observation; d is the number of response variables for each observation; m is the number of observations, i.e., the sample size.

is the matrix of true variables. It is an $m\times (n+d)$ nonrandom matrix. The left-hand block ${A_{0}}$ of size $m\times n$ consists of true explanatory variables, and the right-hand block ${B_{0}}$ of size $m\times d$ consists of true response variables.

is the matrix of errors. It is an $m\times (n+d)$ random matrix.

is the matrix of observations. It is an $m\times (n+d)$ random matrix.

is a covariance matrix of errors for one observation. For every i, it is assumed that $\mathbb{E}{\tilde{c}_{i}}=0$ and $\mathbb{E}{\tilde{c}_{i}}{\tilde{c}_{i}^{\top }}=\varSigma $. The matrix Σ is symmetric, positive semidefinite, nonrandom, and of size $(n+d)\times (n+d)$. It is assumed known when we construct the TLS estimator.

${X_{0}}$

is the matrix of true regression parameters. It is a nonrandom $n\times d$ matrix and is a parameter of interest.

${X_{\mathrm{ext}}^{0}}=\left(\genfrac{}{}{0.0pt}{}{{X_{0}}}{-I}\right)$

is an augmented matrix of regression coefficients. It is a nonrandom $(n+d)\times d$ matrix.

$\widehat{X}$

is the TLS estimator of the matrix ${X_{0}}$.

${\widehat{X}_{\mathrm{ext}}}$

is a matrix whose column space $\operatorname{span}\langle {\widehat{X}_{\mathrm{ext}}}\rangle $ is considered an estimator of the subspace $\operatorname{span}\langle {X_{\mathrm{ext}}^{0}}\rangle $. The matrix ${\widehat{X}_{\mathrm{ext}}}$ is of size $(n+d)\times d$. For fixed m and Σ, ${\widehat{X}_{\mathrm{ext}}}$ is a Borel measurable function of the matrix C.

While in consistency theorems m tends to ∞, all matrices in this list except Σ, ${X_{0}}$ and ${X_{\mathrm{ext}}^{0}}$ silently depend on m. For example, in equations “${\lim _{m\to \infty }}{\lambda _{\min }}({A_{0}^{\top }}{A_{0}})=+\infty $” and “$\widehat{X}\to {X_{0}}$ almost surely” the matrices ${A_{0}}$ and $\widehat{X}$ depend on m.

2 The model and the estimator

2.1 Statistical model

It is assumed that the matrices ${A_{0}}$ and ${B_{0}}$ satisfy the relation

(2)

\[ \underset{m\times n}{{A_{0}}}\cdot \underset{n\times d}{{X_{0}}}=\underset{m\times d}{{B_{0}}}.\]

They are observed with measurement errors $\tilde{A}$ and $\widetilde{B}$, that is

\[ A={A_{0}}+\tilde{A},\hspace{2em}B={B_{0}}+\widetilde{B}.\]

The matrix ${X_{0}}$ is a parameter of interest.

Rewrite the relation in an implicit form. Let the $m\times (n+d)$ block matrices ${C_{0}},\widetilde{C},C\in {\mathbb{R}}^{m\times (n+d)}$ be constructed by binding “respective versions” of matrices A and B:

\[ {C_{0}}=[{A_{0}}\hspace{2.5pt}{B_{0}}],\hspace{2em}\widetilde{C}=[\tilde{A}\hspace{2.5pt}\widetilde{B}],\hspace{2em}C=[A\hspace{2.5pt}B].\]

Denote ${X_{\mathrm{ext}}^{0}}=(\begin{array}{c}{X_{0}}\\{} -{I_{d}}\end{array})$. Then

(3)

\[ \underset{m\times (n+d)}{{C_{0}}}\cdot \underset{(n+d)\times d}{{X_{\mathrm{ext}}^{0}}}=\underset{m\times d}{0}.\]

The entries of the matrix $\widetilde{C}$ are denoted ${\delta _{ij}}$; the rows are ${\tilde{c}_{i}}$:

\[ \widetilde{C}={({\delta _{ij}})_{i=1}^{m{_{}^{}}}},\hspace{2em}{\tilde{c}_{i}}={({\delta _{ij}})_{j=1}^{n+d}}.\]

Throughout the paper the following three conditions are assumed to be true:

(4)

\[\begin{aligned}{}& \text{The rows}\hspace{2.5pt}{\tilde{c}_{i}}\hspace{2.5pt}\text{of the matrix}\hspace{2.5pt}\widetilde{C}\hspace{2.5pt}\text{are mutually independent random vectors.}\end{aligned}\]

(5)

\[\begin{aligned}{}& \mathbb{E}\widetilde{C}=0\text{, and}\hspace{2.5pt}\mathbb{E}{\tilde{c}_{i}^{}}{\tilde{c}_{i}^{\top }}:={(\mathbb{E}{\delta _{ij}}{\delta _{ik}})_{i=1,\hspace{0.1667em}\hspace{0.1667em}k=1}^{n+d\hspace{0.1667em}\hspace{0.1667em}n+d}}=\varSigma \hspace{2.5pt}\text{for all}\hspace{2.5pt}i=1,\dots ,m\text{.}\end{aligned}\]

(6)

\[\begin{aligned}{}& \operatorname{rk}(\varSigma {X_{\mathrm{ext}}^{0}})=d\text{.}\end{aligned}\]

Example 2.1 (simple univariate linear regression with intercept).

For $i=1,\dots ,m$

\[ \left\{\begin{array}{l}{x_{i}}={\xi _{i}}+{\delta _{i}};\hspace{1em}\\{} {y_{i}}={\beta _{0}}+{\beta _{1}}{\xi _{i}}+{\varepsilon _{i}},\hspace{1em}\end{array}\right.\]

where the measurement errors ${\delta _{i}}$, ${\varepsilon _{i}}$, $i=1,\dots ,m$, – all the $2m$ variables – are uncorrelated, $\mathbb{E}{\delta _{i}}=0$, $\mathbb{E}{\delta _{i}^{2}}={\sigma _{\delta }^{2}}$, $\mathbb{E}{\varepsilon _{i}}=0$, and $\mathbb{E}{\varepsilon _{i}^{2}}={\sigma _{\varepsilon }^{2}}$. A sequence $\{({x_{i}},{y_{i}}),\hspace{2.5pt}i=1,\dots ,m\}$ is observed. The parameters ${\beta _{0}}$ and ${\beta _{1}}$ are to be estimated.

This example is taken from [1, Section 1.1]. But the notation in Example 2.1 and elsewhere in the paper is different. Our notation is ${a_{i}^{0}}={(1,{\xi _{i}})}^{\top }$, ${b_{i}^{0}}={\eta _{i}}$, ${a_{i}}={(1,{x_{i}})}^{\top }$, ${b_{i}}={y_{i}}$, ${\delta _{i,1}}=0$, ${\delta _{i,2}}={\delta _{i}}$, ${\delta _{i,3}}={\varepsilon _{i}}$, $\varSigma =\operatorname{diag}(0,{\sigma _{\delta }^{2}},{\sigma _{\varepsilon }^{2}})$, and ${X_{0}}={({\beta _{0}},{\beta _{1}})}^{\top }$.

Remark 2.1.

For some matrices Σ, (6) is satisfied for any $n\times d$ matrix ${X_{0}}$. If the matrix Σ in nonsingular, then condition (6) is satisfied. If the errors in the explanatory variables and in the response are uncorrelated, i.e., if the matrix Σ has a block-diagonal form

\[ \varSigma =\left(\begin{array}{c@{\hskip10.0pt}c}{\varSigma _{aa}}& 0\\{} 0& {\varSigma _{bb}}\end{array}\right)\]

(where ${\varSigma _{aa}}=\mathbb{E}{\tilde{a}_{i}}{\tilde{a}_{i}^{\top }}$ and ${\varSigma _{bb}}=\mathbb{E}{\tilde{b}_{i}}{\tilde{b}_{i}^{\top }}$) with nonsingular matrix ${\varSigma _{bb}}$, then condition (6) is satisfied. For example, in the basic mixed LS-TLS problem Σ is diagonal, ${\varSigma _{bb}}$ is nonsingular, and so (6) holds true. If the null-space of the matrix Σ (which equals $\operatorname{span}{\langle \varSigma \rangle }^{\perp }$ because Σ is symmetric) lies inside the subspace spanned by the first n (of $n+d$) standard basis vectors, then condition (6) is also satisfied. On the other hand, if $\operatorname{rk}\varSigma <d$, then condition (6) is not satisfied.

2.2 Total least squares (TLS) estimator

First, find the $m\times (n+d)$ matrix Δ for which the constrained minimum is attained

(7)

\[ \left\{\begin{array}{l}\| \Delta \hspace{0.1667em}{({\varSigma }^{1/2})}^{\dagger }{\| _{F}}\to \min ;\hspace{1em}\\{} \Delta \hspace{0.1667em}(I-{P_{\varSigma }})=0;\hspace{1em}\\{} \operatorname{rk}(C-\Delta )\le n.\hspace{1em}\end{array}\right.\]

Hereafter ${\varSigma }^{\dagger }$ is the Moore–Penrose pseudoinverse matrix of the matrix Σ, ${P_{\varSigma }}$ is an orthogonal projector onto the column space of Σ, ${P_{\varSigma }}=\varSigma {\varSigma }^{\dagger }$.

Now, show that the minimum in (7) is attained. The constraint $\operatorname{rk}(C-\Delta )\le n$ is satisfied if and only if all the minors of $C-\Delta $ of order $n+1$ vanish. Thus the set of all Δ that satisfy the constraints (the constraint set) is defined by $\frac{m!(n+d)!}{(n+1){!}^{2}(m-n-1)!(d-1)!}+1$ algebraic equations; and so it is closed. The constraint set is nonempty almost surely because it contains $\widetilde{C}$. The functional $\| \Delta {\varSigma }^{\dagger }{\| _{F}}$ is a pseudonorm on ${\mathbb{R}}^{m\times (n+d)}$, but it is a norm on the linear subspace $\{\Delta :\Delta \hspace{0.1667em}(I-{\varSigma }^{\dagger })=0\}$, where it induces a natural subspace topology. The constraint set is closed on the subspace (with the norm), and whenever it is nonempty (i.e., almost surely), it has a minimal-norm element.

Notice that under condition (6) the constrain set is non-empty always and not just almost surely. This follows from Proposition 7.9.

For the matrix Δ that is a solution to minimization problem (7), consider the rowspace $\operatorname{span}\langle {(C-\Delta )}^{\top }\rangle $ of the matrix $C-\Delta $. Its dimension does not exceed n. Its orthogonal basis can be completed to the orthogonal basis in ${\mathbb{R}}^{n+d}$, and the complement consists of $n+d-\operatorname{rk}(C-\Delta )\ge d$ vectors. Choose d vectors from the complement, which are linearly independent, and bind them (as column-vectors) into $(n+d)\times d$ matrix ${\widehat{X}_{\mathrm{ext}}}$. The matrix ${\widehat{X}_{\mathrm{ext}}}$ satisfies the equation

(8)

\[ (C-\Delta ){\widehat{X}_{\mathrm{ext}}}=0.\]

If the lower $d\times d$ block of the matrix ${\widehat{X}_{\mathrm{ext}}}$ is a nonsingular matrix, by linear transformation of columns (i.e., by right-multiplying by some nonsingular matrix) the matrix ${\widehat{X}_{\mathrm{ext}}}$ can be transformed to the form

\[ \left(\begin{array}{c}\widehat{X}\\{} -I\end{array}\right),\]

where I is $d\times d$ identity matrix. The matrix $\widehat{X}$ satisfies the equation

(9)

\[ (C-\Delta )\left(\begin{array}{c}\widehat{X}\\{} -I\end{array}\right)=0.\]

(Otherwise, if the lower block of the matrix ${\widehat{X}_{\mathrm{ext}}}$ is singular, then our estimation fails. Note that whether the lower block of the matrix ${\widehat{X}_{\mathrm{ext}}}$ is singular might depend not only on the observations C, but also on the choice of the matrix Δ where the minimum in (7) in attained and the d vectors that make matrix ${\widehat{X}_{\mathrm{ext}}}$. We will show that the lower block of the matrix ${\widehat{X}_{\mathrm{ext}}}$ is nonsingular with high probability regardless of the choice of Δ and ${\widehat{X}_{\mathrm{ext}}}$.)

Columns of the matrix ${\widehat{X}_{\mathrm{ext}}}$ should span the eigenspace (generalized invariant space) of the matrix pencil $\langle {C}^{\top }C,\varSigma \rangle $ which corresponds to the d smallest generalized eigenvalues. That the columns of the matrix ${\widehat{X}_{\mathrm{ext}}}$ span the generalized invariant space corresponding to finite generalized eigenvalues is written in the matrix notation as follows:

\[ \exists M\in {\mathbb{R}}^{d\times d}:\hspace{0.2778em}{C}^{\top }C{\widehat{X}_{\mathrm{ext}}}=\varSigma {\widehat{X}_{\mathrm{ext}}}M.\]

Possible problems that may arise in the course of solving the minimization problem (7) are discussed in [18]. We should mention that our two-step definition $\text{(7)}$ & $\text{(9)}$ of the TLS estimator is slightly different from the conventional definition in [20, Sections 2.3.2 and 3.2] or in [10]. In these papers, the problem from which the estimator $\widehat{X}$ is found is equivalent to the following:

(10)

\[ \left\{\begin{array}{l}\| \Delta \hspace{0.1667em}{({\varSigma }^{1/2})}^{\dagger }{\| _{F}}\to \min ;\hspace{1em}\\{} \Delta \hspace{0.1667em}(I-{P_{\varSigma }})=0;\hspace{1em}\\{} (C-\Delta )\left(\begin{array}{c}\widehat{X}\\{} -I\end{array}\right)=0,\hspace{1em}\end{array}\right.\]

where the optimization is performed for Δ and $\widehat{X}$ that satisfy the constraints in (10). If our estimation defined with (7) and (9) succeeds, then the minimum values in (7) and (10) coincide, and the minimum in (10) is attained for $(\Delta ,\widehat{X})$ that is the solution to (7) & (9). Conversely, if our estimation succeeds for at least one choice of Δ and ${\widehat{X}_{\mathrm{ext}}}$, then all the solutions to (10) can be obtained with different choices of Δ and ${\widehat{X}_{\mathrm{ext}}}$. However, strange things may happen if our estimation always fails.

Besides (7), consider the optimization problem

(11)

\[ \left\{\begin{array}{l}{\lambda _{\max }}(\Delta {\varSigma }^{\dagger }{\Delta }^{\top })\to \min ;\hspace{1em}\\{} \Delta \hspace{0.1667em}(I-{P_{\varSigma }})=0;\hspace{1em}\\{} \operatorname{rk}(C-\Delta )\le n.\hspace{1em}\end{array}\right.\]

It will be shown that every Δ that minimizes (7) also minimizes (11).

We can construct the optimization problem that generalizes both (7) and (11). Let $\| M{\| _{\mathrm{U}}}$ be a unitarily invariant norm on $m\times (n+d)$ matrices. Consider the optimization problem

(12)

\[ \left\{\begin{array}{l}\| \Delta \hspace{0.1667em}{({\varSigma }^{1/2})}^{\dagger }{\| _{\mathrm{U}}}\to \min ;\hspace{1em}\\{} \Delta \hspace{0.1667em}(I-{P_{\varSigma }})=0;\hspace{1em}\\{} \operatorname{rk}(C-\Delta )\le n.\hspace{1em}\end{array}\right.\]

Then every Δ that minimizes (7) also minimizes (12), and every Δ that minimizes (12) also minimizes (11). If $\| M{\| _{\mathrm{U}}}$ is the Frobenius norm, then optimization problems (7) and (12) coincide, and if $\| M{\| _{\mathrm{U}}}$ is the spectral norm, then optimization problems (11) and (12) coincide.

Remark 2.2.

A solution to problem (7) or (11) does not change if the matrix Σ is multiplied by a positive scalar factor. Thus, instead of assuming that the matrix Σ is known completely, we can assume that Σ is known up to a scalar factor.

3 Known consistency results

In this section we briefly revise known consistency results. One of conditions for the consistency of the TLS estimator is the convergence of $\frac{1}{m}{A_{0}^{\top }}{A_{0}}$ to a nonsingular matrix. It is required, for example, in [5]. The condition is relaxed in the paper by Gallo [4].

Theorem 3.1 (Gallo [4], Theorem 2).

Let $d=1$,

\[\begin{aligned}{}{m}^{-1/2}{\lambda _{\min }}\big({A_{0}^{\top }}{A_{0}}\big)& \to \infty \hspace{1em}\textit{as}\hspace{1em}m\to \infty ,\\{} \frac{{\lambda _{\min }^{2}}({A_{0}^{\top }}{A_{0}})}{{\lambda _{\max }}({A_{0}^{\top }}{A_{0}})}& \to \infty \hspace{1em}\textit{as}\hspace{1em}m\to \infty ,\end{aligned}\]

and the measurement errors ${\tilde{c}_{i}}$ are identically distributed, with finite fourth moment $\mathbb{E}\| {\tilde{c}_{i}}{\| }^{4}<\infty $. Then $\widehat{X}\stackrel{\mathrm{P}}{\longrightarrow }{X_{0}}$, $m\to \infty $.

The theorem can be generalized for the multivariate regression. The condition that the errors on different observations have the same distribution can be dropped. Instead, Kukush and Van Huffel [10] assume that the fourth moments of the error distributions are bounded.

Theorem 3.2 (Kukush and Van Huffel [10], Theorem 4a).

Let

\[\begin{aligned}{}\underset{\begin{array}{c}i\ge 1\\{} j=1,\dots ,n+d\end{array}}{\sup }\mathbb{E}|{\delta _{ij}}{|}^{4}& <\infty ,\\{} {m}^{-1/2}{\lambda _{\min }}\big({A_{0}^{\top }}{A_{0}}\big)& \to \infty \hspace{1em}\textit{as}\hspace{1em}m\to \infty ,\\{} \frac{{\lambda _{\min }^{2}}({A_{0}^{\top }}{A_{0}})}{{\lambda _{\max }}({A_{0}^{\top }}{A_{0}})}& \to \infty \hspace{1em}\textit{as}\hspace{1em}m\to \infty .\end{aligned}\]

Then $\widehat{X}\stackrel{\mathrm{P}}{\longrightarrow }{X_{0}}$ as $m\to \infty $.

Here is the strong consistency theorem:

Theorem 3.3 (Kukush and Van Huffel [10], Theorem 4b).

Let for some $r\ge 2$ and ${m_{0}}\ge 1$,

\[\begin{aligned}{}\underset{\begin{array}{c}i\ge 1\\{} j=1,\dots ,n+d\end{array}}{\sup }\mathbb{E}|{\delta _{ij}}{|}^{2r}& <\infty ,\\{} {\sum \limits_{m={m_{0}}}^{\infty }}{\bigg(\frac{\sqrt{m}}{{\lambda _{\min }}({A_{0}^{\top }}{A_{0}})}\bigg)}^{r}& <\infty ,\\{} {\sum \limits_{m={m_{0}}}^{\infty }}{\bigg(\frac{{\lambda _{\max }}({A_{0}^{\top }}{A_{0}})}{{\lambda _{\min }^{2}}({A_{0}^{\top }}{A_{0}})}\bigg)}^{r}& <\infty .\end{aligned}\]

Then $\widehat{X}\to {X_{0}}$ as $m\to \infty $, almost surely.

In the following consistency theorem the moment condition imposed on the errors is relaxed.

Theorem 3.4 (Kukush and Van Huffel [10], Theorem 5b).

Let for some r, $1\le r<2$,

\[\begin{aligned}{}\underset{\begin{array}{c}i\ge 1\\{} j=1,\dots ,n+d\end{array}}{\sup }\mathbb{E}|{\delta _{ij}}{|}^{2r}& <\infty ,\\{} {m}^{-1/r}{\lambda _{\min }}\big({A_{0}^{\top }}{A_{0}}\big)& \to \infty \hspace{1em}\textit{as}\hspace{1em}m\to \infty ,\\{} \frac{{\lambda _{\min }^{2}}({A_{0}^{\top }}{A_{0}})}{{\lambda _{\max }}({A_{0}^{\top }}{A_{0}})}& \to \infty \hspace{1em}\textit{as}\hspace{1em}m\to \infty .\end{aligned}\]

Then $\widehat{X}\stackrel{\mathrm{P}}{\longrightarrow }{X_{0}}$ as $m\to \infty $.

Generalizations of Theorems 3.2, 3.3, and 3.4 are obtained in [18]. An essential improvement is achieved. Namely, it is not required that ${\lambda _{\min }^{-2}}({A_{0}^{\top }}{A_{0}}){\lambda _{\max }}({A_{0}^{\top }}{A_{0}})$ converge to 0.

Theorem 3.5 (Shklyar [18], Theorem 4.1, generalization of Theorems 3.2 and 3.4).

Let for some r, $1\le r\le 2$,

Then $\widehat{X}\stackrel{\mathrm{P}}{\longrightarrow }{X_{0}}$ as $m\to \infty $.

Theorem 3.6 (Shklyar [18], Theorem 4.2, generalization of Theorem 3.3).

Let for some $r\ge 2$ and ${m_{0}}\ge 1$,

Then $\widehat{X}\to {X_{0}}$ as $m\to \infty $, almost surely.

In the next theorem strong consistency is obtained for $r<2$.

Theorem 3.7 (Shklyar [18], Theorem 4.3).

Let for some r ($1\le r\le 2$) and ${m_{0}}\ge 1$,

\[ \underset{\begin{array}{c}i\ge 1\\{} j=1,\dots ,n+d\end{array}}{\sup }\mathbb{E}|{\delta _{ij}}{|}^{2r}<\infty ,\hspace{2em}{\sum \limits_{m={m_{0}}}^{\infty }}\frac{1}{{\lambda _{\min }^{r}}({A_{0}^{\top }}{A_{0}})}<\infty .\]

Then $\widehat{X}\to {X_{0}}$ as $m\to \infty $, almost surely.

The key point of the proof is the application of our own theorem on perturbation bounds for generalized eigenvectors (Theorems 6.5 and 6.6, see also [18]). The conditions were relaxed by renormalization of the data.

4 Existence and uniqueness of the estimator

When we speak of sequence $\{{A_{m}},\hspace{0.2778em}m\ge 1\}$ of random events parametrized by sample size m, we say that a random event occurs with high probability if the probability of the event tends to 1 as $m\to \infty $, and we say that a random event occurs eventually if almost surely there exists ${m_{0}}$ such that the random event occurs whenever $m>{m_{0}}$, that is $\mathbb{P}(\underset{m\to \infty }{\liminf }{A_{m}})=1$. (In this definition, ${A_{m}}$ are random events. Elsewhere in this paper, ${A_{m}}$ are matrices.)

Theorem 4.1.

Under the conditions of Theorem 3.5, the following three events occur with high probability; under the conditions of Theorem 3.6 or 3.7, the following relations occur eventually.

1. The constrained minimum in (7) is attained. If Δ satisfies the constraints in (7) (particularly, if matrix Δ is a solution to optimization problem (7)), then the linear equation (8) has a solution ${\widehat{X}_{\mathrm{ext}}}$ that is a full-rank matrix.
2. The optimization problem (7) has a unique solution Δ.
3. For any Δ that is a solution to (7), equation (9) (which is a linear equation in $\widehat{X}$) has a unique solution.

Theorem 4.2.

1. The constrained minimum in (11) is attained. If Δ satisfies the constraints in (11), then the linear equation (8) has a solution ${\widehat{X}_{\mathrm{ext}}}$ that is a full-rank matrix.
2. Under the conditions of Theorem 3.5, the following random event occurs with high probability: for any Δ that is a solution to (11), equation (9) has a solution $\widehat{X}$. (Equation (9) might have multiple solutions.) The solution is a consistent estimator of ${X_{0}}$, i.e., $\widehat{X}\to {X_{0}}$ in probability.
3. Under the conditions of Theorem 3.6 or 3.7, the following random event occurs eventually: for any Δ that is a solution to (11), equation (9) has a solution $\widehat{X}$. The solution is a strongly consistent estimator of ${X_{0}}$, i.e., $\widehat{X}\to {X_{0}}$ almost surely.

Remark 4.2-1.

Theorem 4.2 can be generalized in the following way: all references to (11) can be changed into references to (12). Thus, if Frobenius norm in the definition of the estimator is changed to any unitarily invariant norm, the consistency results are still valid.

5 Sketch of the proof of Theorems 3.5–3.7

Denote

\[ N={C_{0}^{\top }}{C_{0}^{}}+{\lambda _{\mathrm{min}}}\big({A_{0}^{\top }}{A_{0}^{}}\big)I.\]

Under the conditions of any of the consistency theorems in Section 3 there is a convergence ${\lambda _{\min }}({A_{0}^{\top }}{A_{0}^{}})\to \infty $. Hence the matrix N is nonsingular for m large enough. The matrix N is used as the denominator in the law of large numbers. Also, it is used for rescaling the problem: the condition number of ${N}^{-1/2}{C_{0}^{\top }}{C_{0}^{}}{N}^{-1/2}$ equals 2 at most.

The proofs of consistency theorems differ one from another, but they have the same structure and common parts. First, the law of large numbers

(13)

\[ {N}^{-1/2}\big({C}^{\top }C-{C_{0}^{\top }}{C_{0}^{}}-m\varSigma \big){N}^{-1/2}={N}^{-1/2}{\sum \limits_{i=1}^{m}}\big({c_{i}^{\top }}{c_{i}^{}}-{\big({c_{i}^{0}}\big)}^{\top }{c_{i}^{0}}-\varSigma \big){N}^{-1/2}\to 0\]

holds either in probability or almost surely, which depends on the theorem being proved. The proof varies for different theorems.

The inequalities (54) and (57) imply that whenever convergence (13) occurs, the sine between vectors ${\widehat{X}_{\mathrm{ext}}}$ and ${X_{\mathrm{ext}}^{0}}$ (in the univariate regression) or the largest of sines of canonical values between column spans of matrices ${\widehat{X}_{\mathrm{ext}}}$ and ${X_{\mathrm{ext}}^{0}}$ tends to 0 as the sample size m increases:

(14)

\[ \big\| \sin \angle ({\widehat{X}_{\mathrm{ext}}},{X_{\mathrm{ext}}^{0}})\big\| \le \big\| \sin \angle \big({N}^{1/2}{\widehat{X}_{\mathrm{ext}}},{N}^{1/2}{X_{\mathrm{ext}}^{0}}\big)\big\| \to 0.\]

To prove (14), we use some algebra, the fact that ${X_{\mathrm{ext}}^{0}}$ (in the univariate model) or the columns of ${X_{\mathrm{ext}}^{0}}$ (in the multivariate model) are the minimum-eigenvalue eigenvectors of matrix N (see ineq. (52)), and eigenvector perturbation theorems – Lemma 6.5 or Lemma 6.6.

Then, by Theorem 8.3 we conclude that

(15)

\[ \| \widehat{X}-{X_{0}}\| \to 0.\]

6 Relevant classical results

We use some classical results. However, we state them in a form convenient for our study and provide the proof for some of them.

6.1 Generalized eigenvectors and eigenvalues

In this paper we deal with real matrices. Most theorems in this section can be generalized for matrices with complex entries by requiring that matrices be Hermitian rather than symmetric, and by complex conjugating where it is necessary.

Theorem 6.1 (Simultaneous diagonalization of a definite matrix pair).

Let A and B be $n\times n$ symmetric matrices such that for some α and β the matrix $\alpha A+\beta B$ is positive definite. Then there exist a nonsingular matrix T and diagonal matrices Λ and M such that

\[ A={\big({T}^{-1}\big)}^{\top }\varLambda {T}^{-1},\hspace{2em}B={\big({T}^{-1}\big)}^{\top }\mathrm{M}{T}^{-1}.\]

If in the decomposition $T=[{u_{1}},{u_{2}},\dots ,{u_{n}}]$, $\varLambda =\operatorname{diag}({\lambda _{1}},\dots ,{\lambda _{n}})$, $\mathrm{M}=\operatorname{diag}({\mu _{1}},\dots ,{\mu _{n}})$, then the numbers ${\lambda _{i}}/{\mu _{i}}\in \mathbb{R}\cup \{\infty \}$ are called generalized eigenvalues, and the columns ${u_{i}}$ of the matrix T are called the right generalized eigenvectors of the matrix pencil $\langle A,B\rangle $ because the following relation holds true:

\[ {\mu _{i}}A{u_{i}}={\lambda _{i}}B{u_{i}}.\]

Theorem 6.1 is well known; see Theorem IV.3.5 in [19, page 318]. The conditions of Theorem 6.1 can be changed as follows:

Theorem 6.2.

Let A and B be symmetric positive semidefinite matrices. Then there exist a nonsingular matrix T and diagonal matrices Λ and M such that

(16)

\[ A={\big({T}^{-1}\big)}^{\top }\varLambda {T}^{-1},\hspace{2em}B={\big({T}^{-1}\big)}^{\top }\mathrm{M}{T}^{-1}.\]

In Theorem 6.1 ${\lambda _{i}}$ and ${\mu _{i}}$ cannot be equal to 0 for the same i, while in Theorem 6.2 they can. On the other hand, in Theorem 6.1 ${\lambda _{i}}$ and ${\mu _{i}}$ can be any real numbers, while in Theorem 6.2 ${\lambda _{i}}\ge 0$ and ${\mu _{i}}\ge 0$. Theorem 6.2 is proved in [15].

Remark 6.2-1.

If the matrices A and B are symmetric and positive semidefinite, then

(17)

\[ \operatorname{rk}\langle A,B\rangle =\operatorname{rk}(A+B),\]

where

\[ \operatorname{rk}\langle A,B\rangle =\underset{k}{\max }\operatorname{rk}(A+kB)\]

is the determinantal rank of the matrix pencil $\langle A,B\rangle $. (For square $n\times n$ matrices A and B, the determinantal rank characterizes if the matrix pencil is regular or singular. The matrix pencil $\langle A,B\rangle $ is regular if $\operatorname{rk}\langle A,B\rangle =n$, and singular if $\operatorname{rk}\langle A,B\rangle <n$.)

The inequality $\operatorname{rk}\langle A,B\rangle \ge \operatorname{rk}(A+B)$ follows from the definition of the determinantal rank. For all $k\in \mathbb{R}$ and for all such vectors x that $(A+B)x=0$ we have ${x}^{\top }Ax+{x}^{\top }Bx=0$, and because of positive semidefiniteness of matrices A and B, ${x}^{\top }Ax\ge 0$ and ${x}^{\top }Bx\ge 0$. Thus, ${x}^{\top }Ax={x}^{\top }Bx=0$. Again, due to positive semidefiniteness of A and B, $Ax=Bx=0$ and $(A+kB)x=0$. Thus, for all $k\in \mathbb{R}$

\[\begin{aligned}{}\big\{x:(A+B)x=0\big\}& \subset \big\{x:(A+kB)x=0\big\},\\{} \operatorname{rk}(A+B)& \ge \operatorname{rk}(A+kB),\\{} \operatorname{rk}\langle A,B\rangle =\underset{k}{\max }\operatorname{rk}(A+kB)& \le \operatorname{rk}(A+B),\end{aligned}\]

and (17) is proved.

Remark 6.2-2.

Let A and B be positive semidefinite matrices of the same size such that $\operatorname{rk}(A+B)=\operatorname{rk}(B)$. The representation (16) might be not unique. But there exists a representation (16) such that

\[\begin{aligned}{}{\lambda _{i}}& ={\mu _{i}}=0\hspace{1em}\text{if}\hspace{1em}i=1,\dots ,\operatorname{def}(B),\\{} {\mu _{i}}& >0\hspace{1em}\text{if}\hspace{1em}i=\operatorname{def}(B)+1,\dots ,n,\\{} T& =\big[\hspace{-0.1667em}\underset{n\times \operatorname{def}(B)}{{T_{1}}}\hspace{0.1667em}\hspace{0.1667em}\underset{n\times \operatorname{rk}(B)}{{T_{2}}}\hspace{-0.1667em}\big],\\{} {T_{1}^{\top }}{T_{2}^{}}& =0.\end{aligned}\]

(Here if the matrix B is nonsingular, then ${T_{1}}$ is $n\times 0$ empty matrix; if $B=0$, then ${T_{2}}$ is $n\times 0$ matrix. In these marginal cases, ${T_{1}^{\top }}{T_{2}}$ is an empty matrix and is considered to be zero matrix.) The desired representation can be obtained from [2] for $S=0$ (in de Leeuw’s notation). This representation is constructed as follows. Let the columns of matrix ${T_{1}}$ make the orthogonal normalized basis of $\operatorname{Ker}(B)=\{v:Bv=0\}$. There exists $n\times \operatorname{rk}(B)$ matrix F such that $B=F{F}^{\top }$. Let the columns of matrix L be the orthogonal normalized eigenvectors of the matrix ${F}^{\dagger }A{({F}^{\dagger })}^{\top }$. Then set ${T_{2}}={({F}^{\dagger })}^{\top }L$. Note that the notation S, F and L is borrowed from [2], and is used only once. Elsewhere in the paper, the matrix F will have a different meaning.

Proposition 6.3.

Let A and B be symmetric positive semidefinite matrices such that $\operatorname{rk}(A+B)=\operatorname{rk}(B)$. In the simultaneous diagonalization in Theorem 6.2 with Remark 6.2-2

\[\begin{aligned}{}{B}^{\dagger }& =T{\mathrm{M}}^{\dagger }{T}^{\top },\\{} {\mathrm{M}}^{\dagger }& =\operatorname{diag}\big(\underset{\operatorname{def}(B)}{\underbrace{0,\dots ,0}},{\mu _{\operatorname{def}(B)+1}^{-1}},\dots ,{\mu _{n}^{-1}}\big).\end{aligned}\]

Proof.

Let us verify the Moore–Penrose conditions:

(18)

\[\begin{aligned}{}{\big({T}^{-1}\big)}^{\top }\text{M}{T}^{-1}\hspace{0.1667em}T{\text{M}}^{\dagger }{T}^{\top }\hspace{0.1667em}{\big({T}^{-1}\big)}^{\top }\text{M}{T}^{-1}& ={\big({T}^{-1}\big)}^{\top }\text{M}{T}^{-1},\end{aligned}\]

(19)

\[\begin{aligned}{}T{\text{M}}^{\dagger }{T}^{\top }\hspace{0.1667em}{\big({T}^{-1}\big)}^{\top }\text{M}{T}^{-1}\hspace{0.1667em}T{\text{M}}^{\dagger }{T}^{\top }& =T{\text{M}}^{\dagger }{T}^{\top },\end{aligned}\]

and the fact that the matrices ${({T}^{-1})}^{\top }\text{M}{T}^{-1}\hspace{0.1667em}T{\text{M}}^{\dagger }{T}^{\top }$ and $T{\text{M}}^{\dagger }{T}^{\top }\times {({T}^{-1})}^{\top }\text{M}{T}^{-1}$ are symmetric. The equalities $\text{(18)}$ and $\text{(19)}$ can be verified directly; and the symmetry properties can be reduced to the equality

(20)

\[ {\big({T}^{-1}\big)}^{\top }{P_{\mathrm{M}}}{T}^{\top }={T}^{}{P_{\mathrm{M}}}{T}^{-1}\]

with ${P_{\mathrm{M}}}=\mathrm{M}{\mathrm{M}}^{\dagger }=\operatorname{diag}(\underset{\operatorname{def}(B)}{\underbrace{0,\dots ,0}},\underset{\operatorname{rk}(B)}{\underbrace{1,\dots ,1}})$.

Since ${T_{1}^{\top }}{T_{2}^{}}=0$, ${T}^{\top }{T}^{}$ is a block diagonal matrix. Hence ${P_{\mathrm{M}}}{T}^{\top }T={T}^{\top }{T}^{}{P_{\mathrm{M}}}$, whence (20) follows. □

6.2 Angle between two linear subspaces

Let ${V_{1}}$ and ${V_{2}}$ be linear subspaces of ${\mathbb{R}}^{n}$, with $\dim {V_{1}}={k_{1}}\le \dim {V_{2}}={k_{2}}$. Then there exists an orthogonal $n\times n$ matrix U such that

(21)

\[\begin{aligned}{}{V_{1}}& =\operatorname{span}\left\langle U\left(\begin{array}{c}{\operatorname{diag}_{{k_{2}}\times {k_{1}}}}(\cos {\theta _{i}},\hspace{0.2778em}i=1,\dots ,{k_{1}})\\{} {\operatorname{diag}_{(n-{k_{2}})\times {k_{1}}}}(\sin {\theta _{i}},\hspace{0.2778em}i=1,\dots ,\min (n-{k_{2}},\hspace{0.2222em}{k_{1}}))\end{array}\right)\right\rangle ,\end{aligned}\]

(22)

\[\begin{aligned}{}{V_{2}}& =\operatorname{span}\left\langle U\left(\begin{array}{c}{I_{{k_{2}}}}\\{} {0_{(n-{k_{2}})\times {k_{2}}}}\end{array}\right)\right\rangle .\end{aligned}\]

Here rectangular diagonal matrices are allowed. If in (21) there are more cosines than sines (i.e., if ${k_{2}}+{k_{1}}>n$), then the excessive cosines should be equal to 1, so the columns of the bidiagonal matrix in (21) are unit vectors (which are orthogonal to each other). Here the columns of U are the vectors of some convenient “new” basis in ${\mathbb{R}}^{n}$, so U is a transitional matrix from the standard basis to “new” basis; the columns of matrix products in $\operatorname{span}\langle \cdots \hspace{0.1667em}\rangle $ in (21) and (22) are the vectors of the bases of subspaces ${V_{1}}$ and ${V_{2}}$; the bidiagonal matrix in (21) and the diagonal matrix in (22) are the transitional matrices from “new” basis in ${\mathbb{R}}^{n}$ to the bases in ${V_{1}}$ and ${V_{2}}$, respectively.

The angles ${\theta _{k}}$ are called the canonical angles between ${V_{1}}$ and ${V_{2}}$. They can be selected so that $0\le {\theta _{k}}\le \frac{1}{2}\pi $ (to achieve this, we might have to reverse some vectors of the bases).

Denote ${P_{{V_{1}}}}$ the matrix of the orthogonal projector onto ${V_{1}}$. The singular values of the matrix ${P_{{V_{1}}}}(I-{P_{{V_{2}}}})$ are equal to $\sin {\theta _{k}}$ ($k=1,\dots ,{k_{1}}$); besides them, there is a singular value 0 of multiplicity $n-{k_{1}}$.

Denote the greatest of the sines of the canonical eigenvalues

(23)

\[ \big\| \sin \angle ({V_{1}},{V_{2}})\big\| =\underset{k=1,\dots ,{k_{1}}}{\max }\sin {\theta _{k}}=\big\| {P_{{V_{1}}}}(I-{P_{{V_{2}}}})\big\| .\]

If $\dim {V_{1}}=1$, ${V_{1}}=\operatorname{span}\langle v\rangle $, then

\[ \sin \angle (v,{V_{2}})=\bigg\| (I-{P_{{V_{2}}}})\frac{v}{\| v\| }\bigg\| =\operatorname{dist}\bigg(\frac{1}{\| v\| }v,{V_{2}}\bigg).\]

This can be generalized for $\dim {V_{1}}\ge 1$:

\[ \big\| \sin \angle ({V_{1}},{V_{2}})\big\| =\underset{v\in {V_{1}}\setminus \{0\}}{\max }\bigg\| (I-{P_{{V_{2}}}})\frac{v}{\| v\| }\bigg\| ,\]

whence

(24)

\[\begin{aligned}{}{\big\| \sin \angle ({V_{1}},{V_{2}})\big\| }^{2}& =\underset{v\in {V_{1}}\setminus \{0\}}{\max }\frac{{v}^{\top }(I-{P_{{V_{2}}}})v}{\| v{\| }^{2}},\\{} 1-{\big\| \sin \angle ({V_{1}},{V_{2}})\big\| }^{2}& =\underset{v\in {V_{1}}\setminus \{0\}}{\min }\frac{{v}^{\top }{P_{{V_{2}}}}v}{\| v{\| }^{2}}.\end{aligned}\]

If $\dim {V_{1}}=\dim {V_{2}}$, then $\| \sin \angle ({V_{1}},{V_{2}})\| =\| {P_{{V_{1}}}}-{P_{{V_{2}}}}\| $, and therefore $\| \sin \angle ({V_{1}},{V_{2}})\| =\| \sin \angle ({V_{2}},{V_{1}})\| $. Otherwise the right-hand side of (23) may change if ${V_{1}}$ and ${V_{2}}$ are swapped (particularly, if $\dim {V_{1}}<\dim {V_{2}}$, then $\| {P_{{V_{1}}}}(I-{P_{{V_{2}}}})\| $ may or may not be equal to 1, but always $\| {P_{{V_{2}}}}(I-{P_{{V_{1}}}})\| =1$; see the proof of Lemma 8.2 in the appendix).

We will often omit “span” in arguments of sine. Thus, for n-row matrices ${X_{1}}$ and ${X_{2}}$, $\| \sin \angle ({X_{1}},{V_{2}})\| =\| \sin \angle (\operatorname{span}\langle {X_{1}}\rangle ,{V_{2}})\| $ and $\| \sin \angle ({X_{1}},{X_{2}})\| =\| \sin \angle (\operatorname{span}\langle {X_{1}}\rangle ,\operatorname{span}\langle {X_{2}}\rangle )\| $.

Lemma 6.4.

Let ${V_{11}}$, ${V_{2}}$ and ${V_{13}}$ be three linear subspaces in ${\mathbb{R}}^{n}$, with $\dim {V_{11}}={d_{1}}<\dim {V_{2}}={d_{2}}<\dim {V_{13}}={d_{3}}$ and ${V_{11}}\subset {V_{13}}$. Then there exists such a linear subspace ${V_{12}}\subset {\mathbb{R}}^{n}$ that ${V_{11}}\subset {V_{12}}\subset {V_{13}}$, $\dim {V_{12}}={d_{2}}$, and $\| \sin \angle ({V_{12}},{V_{2}})\| =1$.

Proof.

Since $\dim {V_{13}}+\dim {V_{2}^{\perp }}={d_{3}}+n-{d_{2}}>n$, there exists a vector $v\ne 0$, $v\in {V_{13}}\cap {V_{2}^{\perp }}$. Since $\max ({d_{1}},1)\le \dim \operatorname{span}\langle {V_{11}},v\rangle \le {d_{1}}+1$, it holds that

\[ \dim \operatorname{span}\langle {V_{11}},v\rangle \le {d_{2}}<\dim {V_{13}}.\]

Therefore, there exists a ${d_{2}}$-dimensional subspace ${V_{12}}$ such that $\operatorname{span}\langle {V_{11}},v\rangle \hspace{0.1667em}\subset \hspace{0.1667em}{V_{12}}\subset {V_{13}}$. Then ${V_{11}}\subset {V_{12}}\subset {V_{13}}$ and $v\in {V_{12}}\cap {V_{2}^{\perp }}$. Hence ${P_{{V_{12}}}}(I-{P_{{V_{2}}}})v=v$, $\| {P_{{V_{12}}}}(I-{P_{{V_{2}}}})\| \ge 1$, and due to equation (23), $\| \sin \angle ({V_{12}},\hspace{0.1667em}{V_{2}})\| =1$. Thus, the subspace ${V_{12}}$ has the desired properties. □

6.3 Perturbation of eigenvectors and invariant spaces

Lemma 6.5.

Let A, B, $\tilde{A}$ be symmetric matrices, ${\lambda _{\min }}(A)=0$, ${\lambda _{2}}(A)>0$ and ${\lambda _{\min }}(B)\ge 0$. Let $A{x_{0}}=0$ and $B{x_{0}}\ne 0$ (so ${x_{0}}$ is an eigenvector of the matrix A that corresponds to the minimum eigenvalue). Let minimum of the function

\[ f(x):=\frac{{x}^{\top }(A+\tilde{A})x}{{x}^{\top }Bx},\hspace{2em}{x}^{\top }Bx>0,\]

be attained at the point ${x_{\ast }}$. Then

\[ {\sin }^{2}\angle ({x_{\ast }},{x_{0}})\le \frac{\| \tilde{A}\| }{{\lambda _{2}}(A)}\bigg(1+\frac{\| {x_{0}}{\| }^{2}}{{x_{0}^{\top }}B{x_{0}}}\hspace{0.1667em}\frac{{x}^{\top }Bx}{\| x{\| }^{2}}\bigg).\]

Remark 6.5-1.

The function $f(x)$ may or may not attain the minimum. Thus the condition $f({x_{\ast }})={\min _{{x}^{\top }Bx>0}}f(x)$ sometimes cannot be satisfied. But the theorem is still true if

(25)

\[ \underset{x\to {x_{\ast }}}{\liminf }f(x)=\underset{x:\hspace{0.2778em}{x}^{\top }\hspace{-0.1667em}Bx>0}{\inf }f(x)\]

and ${x_{\ast }}\ne 0$.

Now proclaim the multivariate generalization of Lemma 6.5. We will not generalize Remark 6.5-1. Instead, we will check that the minimum is attained when we use Lemma 6.6 (see Proposition 7.10).

Lemma 6.6.

Let A, B, $\tilde{A}$ be $n\times n$ symmetric matrices, ${\lambda _{i}}(A)=0$ for all $i=1,\dots ,d$, ${\lambda _{d+1}}(A)>0$, ${\lambda _{\min }}(B)\ge 0$. Let ${X_{0}}$ be $n\times d$ matrix such that $A{X_{0}}=0$ and the matrix ${X_{0}^{\top }}B{X_{0}^{}}$ is nonsingular. Let the functional

(26)

\[\begin{aligned}{}f(X)& ={\lambda _{\max }}\big({\big({X}^{\top }BX\big)}^{-1}{X}^{\top }(A+\tilde{A})X\big)\hspace{1em}\textit{if}\hspace{2.5pt}X\in {\mathbb{R}}^{n\times d}\hspace{2.5pt}\textit{and}\hspace{2.5pt}{X}^{\top }BX>0\textit{,}\\{} f(X)& \hspace{0.2778em}\textit{is not defined otherwise,}\end{aligned}\]

attain its minimum. Then for any point X where the minimum is attained,

\[ {\big\| \sin \angle (X,{X_{0}})\big\| }^{2}\le \frac{\| \tilde{A}\| }{{\lambda _{d+1}}(A)}\big(1+\| B\| \hspace{0.1667em}{\lambda _{\max }}\big({\big({X_{0}^{\top }}B{X_{0}^{}}\big)}^{-1}{X_{0}^{\top }}{X_{0}^{}}\big)\big).\]

6.4 Rosenthal inequality

In the following theorems, a random variable ξ is called centered if $\mathbb{E}\xi =0$.

Theorem 6.7.

Let $\nu \ge 2$ be a nonrandom real number. Then there exist $\alpha \ge 0$ and $\beta \ge 0$ such that for any set of centered mutually independent random variables $\{{\xi _{i}},i=1,\dots ,m\}$, $m\ge 1$, the following inequality holds true:

\[ \mathbb{E}\Bigg[{\Bigg|{\sum \limits_{i=1}^{m}}{\xi _{i}}\Bigg|}^{\nu }\Bigg]\le \alpha {\sum \limits_{i=1}^{m}}\mathbb{E}\big[|{\xi _{i}}{|}^{\nu }\big]+\beta {\Bigg({\sum \limits_{i=1}^{m}}\mathbb{E}{\xi _{i}^{2}}\Bigg)}^{\nu /2}.\]

Theorem 6.7 is well known; see [16, Theorem 2.9, page 59].

Theorem 6.8.

Let ν be a nonrandom real number, $1\le \nu \le 2$. Then there exists $\alpha \ge 0$ such that for any set of centered mutually independent random variables $\{{\xi _{i}},i=1,\dots ,m\}$, $m\ge 1$, the inequality holds true:

\[ \mathbb{E}\Bigg[{\Bigg|{\sum \limits_{i=1}^{m}}{\xi _{i}}\Bigg|}^{\nu }\Bigg]\le \alpha {\sum \limits_{i=1}^{m}}\mathbb{E}\big[|{\xi _{i}}{|}^{\nu }\big].\]

Proof.

The desired inequality is trivial for $\nu =1$. For all $1<\nu \le 2$ it is a consequence of the Marcinkiewicz–Zygmund inequality

\[ \mathbb{E}\Bigg[{\Bigg|{\sum \limits_{i=1}^{m}}{\xi _{i}}\Bigg|}^{\nu }\Bigg]\le \alpha \mathbb{E}\Bigg[{\Bigg({\sum \limits_{i=1}^{m}}{\xi _{i}^{2}}\Bigg)}^{\nu /2}\Bigg]\le \alpha \mathbb{E}{\sum \limits_{i=1}^{m}}|{\xi _{i}}{|}^{\nu }=\alpha {\sum \limits_{i=1}^{m}}\mathbb{E}|{\xi _{i}}{|}^{\nu }.\]

Here the first inequality is due to Marcinkiewicz and Zygmund [11, Theorem 13]. The second inequality follows from the fact that for $\nu \le 2$,

\[ {\Bigg({\sum \limits_{i=1}^{m}}{\xi _{i}^{2}}\Bigg)}^{\nu /2}\le {\sum \limits_{i=1}^{m}}|{\xi _{i}}{|}^{\nu }.\]

□

7 Generalized eigenvalue problem for positive semidefinite matrices

In this section we explain the relationship between the TLS estimator and the generalized eigenvalue problem. The results of this section are important for constructing the TLS estimator. Proposition 7.9 is used to state the uniqueness of the TLS estimator.

Lemma 7.1.

Let A and B be $n\times n$ symmetric positive semidefinite matrices, with simultaneous diagonalization

\[ A={\big({T}^{-1}\big)}^{\top }\varLambda {T}^{-1},\hspace{2em}B={\big({T}^{-1}\big)}^{\top }\mathrm{M}{T}^{-1},\]

with

\[ \varLambda =\operatorname{diag}({\lambda _{1}},\dots ,{\lambda _{n}}),\hspace{2em}\mathrm{M}=\operatorname{diag}({\mu _{1}},\dots ,{\mu _{n}})\]

(see Theorem 6.2 for its existence). For $i=1,\dots ,n$ denote

\[ {\nu _{i}}=\left\{\begin{array}{l@{\hskip10.0pt}l}{\lambda _{i}}/{\mu _{i}}\hspace{1em}& \textit{if}\hspace{2.5pt}{\mu _{i}}>0\textit{,}\\{} 0\hspace{1em}& \textit{if}\hspace{2.5pt}{\lambda _{i}}=0\textit{,}\\{} +\infty \hspace{1em}& \textit{if}\hspace{2.5pt}{\lambda _{i}}>0\textit{,}\hspace{2.5pt}{\mu _{i}}=0\textit{.}\end{array}\right.\]

Assume that ${\nu _{1}}\le {\nu _{2}}\le \cdots \le {\nu _{n}}$. Then

(27)

\[ {\nu _{i}}=\min \big\{\lambda \ge 0|\textit{``}\exists V,\hspace{2.5pt}\dim V=i:(A-\lambda B){|_{V}}\le 0\textit{''}\big\},\]

i.e., ${\nu _{i}}$ is the smallest number $\lambda \ge 0$, such that there exists an i-dimensional subspace $V\subset {\mathbb{R}}^{n}$, such that the quadratic form $A-\lambda B$ is negative semidefinite on V.

Remark 7.1-1.

${\nu _{i}}<\infty $ if and only if

\[ \exists \lambda \hspace{2.5pt}\exists V,\hspace{2.5pt}\dim V=i:(A-\lambda B){|_{V}}\le 0.\]

Remark 7.1-2.

Let ${\nu _{i}}<\infty $. The minimum in (27) is attained for V being the linear span of first i columns of the matrix T (i.e., the linear span of the eigenvectors of the matrix pencil $\langle A,B\rangle $ that correspond to the i smallest generalized eigenvalues). That is

\[ (A-{\nu _{i}}B){|_{V}}\le 0\hspace{1em}\text{for}\hspace{1em}V=\operatorname{span}\big\langle T(\begin{array}{c}{I_{k}}\\{} {0_{(n-k)\times k}}\end{array})\big\rangle .\]

In Propositions 7.2–7.5 the following optimization problem is considered. For a fixed $(n+d)\times d$ matrix X find an $m\times (n+d)$ matrix Δ where the constrained minimum is attained:

(28)

\[ \left\{\begin{array}{l}\Delta {\varSigma }^{\dagger }{\Delta }^{\top }\to \min ;\hspace{1em}\\{} \Delta \hspace{0.1667em}(I-{P_{\varSigma }})=0;\hspace{1em}\\{} (C-\Delta )X=0.\hspace{1em}\end{array}\right.\]

Here the matrix X is assumed to be of full rank:

(29)

\[ \operatorname{rk}X=d.\]

Proposition 7.2.

1. The constraints in (28) are compatible if and only if

(30)

\[ \operatorname{span}\big\langle {X}^{\top }{C}^{\top }\big\rangle \subset \operatorname{span}\big\langle {X}^{\top }\varSigma \big\rangle .\]

Here $\operatorname{span}\langle M\rangle $ is a column space of the matrix M.

2. Let the constraints in (28) be compatible. Then the least element of the partially ordered set (in the Loewner order) $\{\Delta {\varSigma }^{\dagger }{\Delta }^{\top }:\Delta \hspace{0.1667em}(I-{P_{\varSigma }})=0\hspace{0.2778em}\textit{and}\hspace{0.2778em}(C-\Delta )X=0\}$ is attained for $\Delta =CX{({X}^{\top }\varSigma X)}^{\dagger }{X}^{\top }\varSigma $ and is equal to $CX{({X}^{\top }\varSigma X)}^{\dagger }{X}^{\top }{C}^{\top }$. This means the following:

2a. For $\Delta =CX{({X}^{\top }\varSigma X)}^{\dagger }{X}^{\top }\varSigma $, it holds that

(31)

\[\begin{aligned}{}\Delta \hspace{0.1667em}(I-{P_{\varSigma }})& =0,\hspace{2em}(C-\Delta )X=0,\end{aligned}\]

(32)

\[\begin{aligned}{}\Delta {\varSigma }^{\dagger }{\Delta }^{\top }& =CX{\big({X}^{\top }\varSigma X\big)}^{\dagger }{X}^{\top }{C}^{\top };\end{aligned}\]

2b. For any Δ which satisfies the constraints $\Delta \hspace{0.1667em}(I-{P_{\varSigma }})=0$ and $(C-\Delta )X=0$,

(33)

\[ \Delta {\varSigma }^{\dagger }{\Delta }^{\top }\ge CX{\big({X}^{\top }\varSigma X\big)}^{\dagger }{X}^{\top }{C}^{\top }.\]

Remark 7.2-1.

If the constraints are compatible, the least element (and the unique minimum) is attained at a single point. Namely, the equalities

\[\begin{aligned}{}\Delta \hspace{0.1667em}(I-{P_{\varSigma }})& =0,\hspace{2em}(C-\Delta )X=0,\\{} \Delta {\varSigma }^{\dagger }{\Delta }^{\top }& =CX{\big({X}^{\top }\varSigma X\big)}^{\dagger }{X}^{\top }{C}^{\top }\end{aligned}\]

imply $\Delta =CX{({X}^{\top }\varSigma X)}^{\dagger }{X}^{\top }\varSigma $.

Proposition 7.3.

Let the matrix pencil $\langle {C}^{\top }C,\varSigma \rangle $ be definite and (29) hold. The constraints in (28) are compatible if and only if the matrix ${X}^{\top }\varSigma X$ is nonsingular. Then Proposition 7.2 still holds true if ${({X}^{\top }\varSigma X)}^{-1}$ is substituted for ${({X}^{\top }\varSigma X)}^{\dagger }$.

Proposition 7.4.

Let X be an $(n+d)\times d$ matrix which satisfies (29) and makes the constraints in (28) compatible. Then for $k=1,2,\dots ,d$,

(34)

\[\begin{aligned}{}& \underset{\begin{array}{c}\Delta (I-{P_{\varSigma }})=0\\{} (C-\Delta )X=0\end{array}}{\min }{\lambda _{k+m-d}}\big(\Delta {\varSigma }^{\dagger }{\Delta }^{\top }\big)\\{} & \hspace{1em}=\min \big\{\lambda \ge 0:\textit{``}\exists V\subset \operatorname{span}\langle X\rangle ,\hspace{0.2778em}\dim V=k:\big({C}^{\top }C-\lambda \varSigma \big){|_{V}}\le 0\textit{''}\big\}.\end{aligned}\]

Remark 7.4-1.

In the left-hand side of (34) the minima are attained for the same $\Delta =CX{({X}^{\top }\varSigma X)}^{\dagger }{X}^{\top }\varSigma $ for all k (the k sets where the minima are attained have non-empty intersection; we will show that the intersection comprises of a single element).

One can choose a stack of subspaces

\[ {V_{1}}\subset {V_{2}}\subset \cdots \subset {V_{d}}=\operatorname{span}\langle X\rangle \]

such that ${V_{k}}$ is the element where the minimum in the right-hand side of (34) is attained, i.e., for all $k=1,\dots ,d$,

\[ \dim {V_{k}}=k,\hspace{2em}{V_{k}}\subset \operatorname{span}\langle X\rangle ,\hspace{2em}\big({C}^{\top }C-{\nu _{k}}\varSigma \big){|_{{V_{k}}}}\le 0,\]

with ${\nu _{k}}={\min _{\begin{array}{c}\Delta (I-{P_{\varSigma }})=0\\{} (C-\Delta )X=0\end{array}}}{\lambda _{k+m-d}}(\Delta {\varSigma }^{\dagger }{\Delta }^{\top })$.

In Propositions 7.5 to 7.9, we will use notation from simultaneous diagonalization of matrices ${C}^{\top }C$ and Σ:

(35)

\[ {C}^{\top }C={\big({T}^{-1}\big)}^{\top }\varLambda {T}^{-1},\hspace{2em}\varSigma ={\big({T}^{-1}\big)}^{\top }\mathrm{M}{T}^{-1},\]

where

\[\begin{aligned}{}\varLambda & =\operatorname{diag}({\lambda _{1}},\dots ,{\lambda _{n+d}}),\hspace{2em}\mathrm{M}=\operatorname{diag}({\mu _{1}},\dots ,{\mu _{n+d}}),\\{} T& =[{u_{1}},{u_{2}},\dots ,{u_{d}},\dots ,{u_{n+d}}].\end{aligned}\]

If Remark 6.2-2 is applicable, let the simultaneous diagonalization be constructed accordingly. For $k=1,\dots ,n+d$ denote

\[ {\nu _{i}}=\left\{\begin{array}{l@{\hskip10.0pt}l}{\lambda _{k}}/{\mu _{k}}\hspace{1em}& \text{if}\hspace{2.5pt}{\mu _{k}}>0\text{,}\\{} 0\hspace{1em}& \text{if}\hspace{2.5pt}{\lambda _{k}}=0\text{,}\\{} +\infty \hspace{1em}& \text{if}\hspace{2.5pt}{\lambda _{k}}>0\text{,}\hspace{2.5pt}{\mu _{k}}=0\text{.}\end{array}\right.\]

Let ${\nu _{k}}$ be arranged in ascending order.

Proposition 7.5.

Let X be an $(n+d)\times d$ matrix which satisfies (29) and makes constraints in (28) compatible. Then

(36)

\[ \underset{\begin{array}{c}\Delta \hspace{0.1667em}(I-{P_{\varSigma }})=0\\{} (C-\Delta )X=0\end{array}}{\min }{\lambda _{k+m-d}}\big(\Delta {\varSigma }^{\dagger }{\Delta }^{\top }\big)\ge {\nu _{k}}.\]

If ${\nu _{d}}<\infty $, then for $X=[{u_{1}},{u_{2}},\dots ,{u_{d}}]$ the inequality in (36) becomes an equality.

Corollary.

In the minimization problem (11), the constrained minimum is equal to

\[ \underset{\begin{array}{c}\Delta \hspace{0.1667em}(I-{P_{\varSigma }})=0\\{} \operatorname{rk}(C-\Delta )\le n\end{array}}{\min }{\lambda _{\max }}\big(\Delta {\varSigma }^{\dagger }{\Delta }^{\top }\big)={\nu _{d}}.\]

Proposition 7.6.

In the minimization problem (7) the constrained minimum is equal to

\[ \underset{\begin{array}{c}\Delta (I-{P_{\varSigma }})=0\\{} \operatorname{rk}(C-\Delta )\le n\end{array}}{\min }{\big\| {\big(\Delta \hspace{0.1667em}{\varSigma }^{1/2}\big)}^{\dagger }\big\| _{F}}=\sqrt{{\sum \limits_{k=1}^{d}}{\nu _{k}}}.\]

Whenever the minimum in (7) is attained for some matrix Δ, the minimum in (11) is attained for the same Δ.

Proposition 7.7.

Let $\| M{\| _{\mathrm{U}}}$ be an arbitrary unitarily invariant norm on $m\times n$ matrices. Singular values of the matrix M are arranged in descending order and denoted ${\sigma _{i}}(M)$:

\[ {\sigma _{1}}(M)\ge {\sigma _{2}}(M)\ge \cdots \ge {\sigma _{\min (m,n)}}(M)\ge 0.\]

Let ${M_{1}}$ and ${M_{2}}$ be $m\times n$ matrices. Then

1. If ${\sigma _{i}}({M_{1}})\le {\sigma _{i}}({M_{2}})$ for all $i=1,\dots ,\min (m,n)$, then $\| {M_{1}}{\| _{\mathrm{U}}}\le \| {M_{2}}{\| _{\mathrm{U}}}$.
2. If ${\sigma _{1}}({M_{1}})<{\sigma _{1}}({M_{2}})$ and ${\sigma _{i}}({M_{1}})\le {\sigma _{i}}({M_{2}})$ for all $i=2,\dots ,\min (m,n)$, then $\| {M_{1}}{\| _{\mathrm{U}}}<\| {M_{2}}{\| _{\mathrm{U}}}$.

Proposition 7.8.

Consider the optimization problem (12) with arbitrary unitarily invariant norm $\| M{\| _{\mathrm{U}}}$. Then

1. Any minimizer Δ to the optimization problem (7) also minimizes (12).
2. Any minimizer Δ to the optimization problem (12) also minimizes (11).

Proposition 7.9.

For any Δ where the minimum in (7) is attained and the corresponding solution ${\widehat{X}_{\mathrm{ext}}}$ of the linear equations (8) (${\widehat{X}_{\mathrm{ext}}}$ is an $(n+d)\times d$ matrix of rank d), it holds that

(37)

\[ \operatorname{span}\langle {u_{i}}:{\nu _{i}}<{\nu _{d}}\rangle \subset \operatorname{span}\langle {\widehat{X}_{\mathrm{ext}}}\rangle \subset \operatorname{span}\langle {u_{i}}:{\nu _{i}}\le {\nu _{d}}\rangle .\]

Conversely, if ${\nu _{d}}<+\infty $ and the matrix ${\widehat{X}_{\mathrm{ext}}}$ satisfies conditions (37), then there exists a common solution Δ to the minimization problem (7) and the linear equations (8).

As a consequence, if ${\nu _{d}}<{\nu _{d+1}}$, then (7) and (8) unambiguously determine $\operatorname{span}\langle {\widehat{X}_{\mathrm{ext}}}\rangle $ of rank d.

Proposition 7.10.

Let $\langle {C}^{\top }C,\varSigma \rangle $ be a definite matrix pencil. Then for any Δ where the minimum in (11) is attained, the corresponding solution ${\widehat{X}_{\mathrm{ext}}}$ of the linear equations (8) (such that $\operatorname{rk}{\widehat{X}_{\mathrm{ext}}}=d$) is a point where the minimum of the functional

(38)

\[ X\mapsto {\lambda _{\max }}\big({\big({X}^{\top }\varSigma X\big)}^{-1}{X}^{\top }{C}^{\top }CX\big),\hspace{1em}X\in {\mathbb{R}}^{(n+d)\times d},\hspace{1em}{X}^{\top }\varSigma X>0,\]

is attained. It is also a point where the minimum of

(39)

\[ X\mapsto {\lambda _{\max }}\big({\big({X}^{\top }\varSigma X\big)}^{-1}{X}^{\top }\big({C}^{\top }C-m\varSigma \big)X\big),\]

is attained.

The functional (39) equals the functional (38) minus m.

8 Appendix: Proofs

Detailed proofs of Theorems 3.5–3.7

8.1 Bounds for eigenvalues of some matrices used in the proof

8.1.1 Eigenvalues of the matrix ${C_{0}^{\top }}{C_{0}^{}}$

The $(n+d)\times (n+d)$ matrix ${C_{0}^{\top }}{C_{0}^{}}$ is symmetric and positive semidefinite. Since ${C_{0}}{X_{\mathrm{ext}}^{0}}={A_{0}}{X_{0}}-{B_{0}}=0$, the matrix ${C_{0}^{\top }}{C_{0}^{}}$ is rank deficient with eigenvalue 0 of multiplicity at least d. As ${A_{0}^{\top }}{A_{0}^{}}$ is a $n\times n$ principal submatrix of ${C_{0}^{\top }}{C_{0}^{}}$,

(40)

\[ {\lambda _{d+1}}\big({C_{0}^{\top }}{C_{0}}\big)\ge {\lambda _{\min }}\big({A_{0}^{\top }}{A_{0}}\big)\]

by the Cauchy interlacing theorem (Theorem IV.4.2 from [19] used d times).

Due to inequality (40), if the matrix ${A_{0}^{\top }}{A_{0}}$ is nonsingular, then ${\lambda _{n+1}}({C_{0}^{\top }}{C_{0}})>0$, whence $\operatorname{rk}({C_{0}^{\top }}{C_{0}})=d$. If the conditions of Theorem 3.5, 3.6 or 3.7 hold true, then ${\lambda _{\min }}({A_{0}^{\top }}{A_{0}})\to \infty $, and thus

\[ {\lambda _{d+1}}\big({C_{0}^{\top }}{C_{0}}\big)\ge {\lambda _{\min }}\big({A_{0}^{\top }}{A_{0}}\big)>0\]

for m large enough.

Proposition 8.1.

If conditions (4)–(6) hold true, and conditions of either of Theorems 3.5, 3.6, or 3.7 hold true, then for m large enough $\langle {C}^{\top }C,\varSigma \rangle $ is a definite matrix pencil almost surely. More specifically,

\[ \exists {m_{0}}\hspace{0.2778em}\forall m>{m_{0}}:\hspace{0.2778em}\mathbb{P}\big({C}^{\top }C+\varSigma >0\big)=1.\]

Proof.

1. If the matrix Σ is nonsingular, then Proposition 8.1 is obvious. Due to condition (6), $\operatorname{rk}\varSigma \ge d$ (see Remark 2.1), whence $\varSigma \ne 0$. In what follows, assume that Σ is a singular but non-zero matrix. Let $F=(\begin{array}{c}{F_{1}}\\{} {F_{2}}\end{array})$ be a $(n+d)\times (n+d-\operatorname{rk}(\varSigma ))$ matrix whose columns make the basis of the null-space $\operatorname{Ker}(\varSigma )=\{x:\varSigma x=0\}$ of the matrix Σ.

2. Now prove that columns of the matrix $[{I_{n}}\hspace{0.2778em}{X_{0}}]\hspace{0.2222em}F$ are linearly independent. Assume the contrary. Then for some $v\in {\mathbb{R}}^{n+d-\operatorname{rk}(\varSigma )}\setminus \{0\}$,

(41)

\[\begin{aligned}{}[{I_{n}}\hspace{1em}{X_{0}}]\hspace{0.2222em}Fv& =0,\\{} {F_{1}}v& =-{X_{0}}{F_{2}}v,\\{} Fv& =\big(\begin{array}{c}{X_{0}}\\{} -{I_{d}}\end{array}\big){F_{2}}v={X_{\mathrm{ext}}^{0}}{F_{2}}v,\end{aligned}\]

(42)

\[\begin{aligned}{}0& =\varSigma Fv=\varSigma {X_{\mathrm{ext}}^{0}}\cdot {F_{2}}v.\end{aligned}\]

Furthermore, $Fv\ne 0$ because $v\ne 0$ and the columns of F are linearly independent. Hence, by (41), ${F_{2}}v\ne 0$.

Equality (42) implies that the columns of the matrix $\varSigma {X_{\mathrm{ext}}^{0}}$ are linearly dependent, and this contradicts condition (6). The contradiction means that columns of the matrix $[I\hspace{0.2778em}{X_{\mathrm{ext}}^{0}}]\hspace{0.2222em}F$ are linearly independent.

3. If the conditions of either Theorem 3.5, 3.6, or 3.7 hold true, then the matrix ${A_{0}^{\top }}{A_{0}}$ is positive definite for m large enough.

4. Under conditions (4) and (5), $\tilde{C}F=0$ almost surely. Indeed, $\mathbb{E}{\tilde{c}_{i}}=0$ and $\operatorname{var}[{\tilde{c}_{i}}F]={F}^{\top }\varSigma F=0$, $i=1,2,\dots ,m$.

5. It remains to prove the implication:

\[ \text{if}\hspace{1em}{A_{0}^{\top }}{A_{0}^{}}>0\hspace{1em}\text{and}\hspace{1em}\tilde{C}F=0,\hspace{1em}\text{then}\hspace{1em}{C}^{\top }C+\varSigma >0.\]

The matrices ${C}^{\top }C$ and Σ are positive semidefinite. Suppose that ${x}^{\top }({C}^{\top }C+\varSigma )x=0$ and prove that $x=0$. Since ${x}^{\top }({C}^{\top }C+\varSigma )x=0$, $Cx=0$ and $\varSigma x=0$. The vector x belongs to the null-space of the matrix Σ. Therefore, $x=Fv$ for some vector $v\in {\mathbb{R}}^{n+d-\operatorname{rk}\varSigma }$. Then

(43)

\[\begin{aligned}{}0={A_{0}^{\top }}Cx& ={A_{0}}({C_{0}}+\tilde{C})x\\{} & ={A_{0}}{C_{0}}Fv+{A_{0}}\tilde{C}Fv\\{} & ={A_{0}^{\top }}{A_{0}^{}}\hspace{0.2222em}[{I_{n}}\hspace{1em}{X_{0}}]\hspace{0.2222em}Fv+0.\end{aligned}\]

As the matrix ${A_{0}^{\top }}{A_{0}^{}}$ is nonsingular and columns of the matrix $[{I_{n}}\hspace{0.2778em}{X_{0}}]\hspace{0.2222em}F$ are linearly independent, the columns of the matrix ${A_{0}^{\top }}{A_{0}^{}}\hspace{0.2222em}[{I_{n}}\hspace{0.2778em}{X_{0}}]\hspace{0.2222em}F$ are linearly independent as well. Hence, (43) implies $v=0$, and so $x=Fv=0$.

We have proved that the equality ${x}^{\top }({C}^{\top }C+\varSigma )x=0$ implies $x=0$. Thus, the positive semidefinite matrix ${C}^{\top }C+\varSigma $ is nonsingular, and so positive definite. □

8.1.2 Eigenvalues and common eigenvectors of N and ${N}^{-\frac{1}{2}}{C_{0}^{\top }}{C_{0}^{}}{N}^{-\frac{1}{2}}$

The rank-deficient positive semidefinite symmetric matrix ${C_{0}^{\top }}{C_{0}}$ can be factorized as:

\[\begin{aligned}{}{C_{0}^{\top }}{C_{0}^{}}& =U\operatorname{diag}\big({\lambda _{\min }}\big({C_{0}^{\top }}{C_{0}}\big),{\lambda _{2}}\big({C_{0}^{\top }}{C_{0}}\big),\dots ,{\lambda _{n+d}}\big({C_{0}^{\top }}{C_{0}}\big)\big){U}^{\top }\\{} & =U\operatorname{diag}\big({\lambda _{j}}\big({C_{0}^{\top }}{C_{0}}\big);\hspace{0.2778em}j=1,\dots ,n+d\big){U}^{\top },\end{aligned}\]

with an orthogonal matrix U and

\[ {\lambda _{\min }}\big({C_{0}^{\top }}{C_{0}}\big)={\lambda _{2}}\big({C_{0}^{\top }}{C_{0}}\big)=\cdots ={\lambda _{d}}\big({C_{0}^{\top }}{C_{0}}\big)=0.\]

Then the eigendecomposition of the matrix $N={C_{0}^{\top }}{C_{0}}+{\lambda _{\min }}({A_{0}^{\top }}{A_{0}})I$ is

\[ N=U\operatorname{diag}\big({\lambda _{j}}\big({C_{0}^{\top }}{C_{0}}\big)+{\lambda _{\min }}\big({A_{0}^{\top }}{A_{0}}\big);\hspace{0.2778em}j=1,\dots ,n+d\big){U}^{\top }.\]

Notice that

(44)

\[ {\lambda _{\min }}(N)=\cdots ={\lambda _{d}}(N)={\lambda _{\min }}\big({A_{0}^{\top }}{A_{0}}\big).\]

The matrix N is nonsingular as soon as ${A_{0}^{\top }}{A_{0}}$ is nonsingular. Hence, under the conditions of Theorem 3.5, 3.6, or 3.7, the matrix N is nonsingular for m large enough.

Since ${C_{0}}{X_{\mathrm{ext}}^{0}}=0$, it holds that

(45)

\[ N{X_{\mathrm{ext}}^{0}}={\lambda _{\min }}\big({A_{0}^{\top }}{A_{0}}\big){X_{\mathrm{ext}}^{0}}.\]

As soon as N is nonsingular, the matrices ${N}^{-1/2}$ and ${N}^{-1/2}{C_{0}^{\top }}{C_{0}}{N}^{-1/2}$ have the eigendecomposition

\[\begin{aligned}{}{N}^{-1/2}& =U\operatorname{diag}\bigg(\frac{1}{\sqrt{{\lambda _{j}}({C_{0}^{\top }}{C_{0}})\hspace{0.1667em}+\hspace{0.1667em}{\lambda _{\min }}({A_{0}^{\top }}{A_{0}})}};\hspace{0.2778em}j\hspace{0.1667em}=\hspace{0.1667em}1,\dots ,n\hspace{0.1667em}+\hspace{0.1667em}d\bigg){U}^{\top },\\{} {N}^{-1/2}{C_{0}^{\top }}{C_{0}}{N}^{-1/2}& =U\operatorname{diag}\bigg(\frac{{\lambda _{j}}({C_{0}^{\top }}{C_{0}})}{{\lambda _{j}}({C_{0}^{\top }}{C_{0}})+{\lambda _{\min }}({A_{0}^{\top }}{A_{0}})};\hspace{0.2778em}j=1,\dots ,n+d\bigg){U}^{\top }.\end{aligned}\]

Thus, the eigenvalues of ${N}^{-1/2}$ and ${N}^{-1/2}{C_{0}^{\top }}{C_{0}}{N}^{-1/2}$ satisfy the following:

(46)

\[\begin{aligned}{}\big\| {N}^{-1/2}\big\| ={\lambda _{\max }}\big({N}^{-1/2}\big)& =\frac{1}{\sqrt{{\lambda _{\min }}({A_{0}^{\top }}{A_{0}})}};\end{aligned}\]

(47)

\[\begin{aligned}{}{\lambda _{j}}\big({N}^{-1/2}{C_{0}^{\top }}{C_{0}}{N}^{-1/2}\big)& =0,\hspace{1em}j=1,\dots ,d;\end{aligned}\]

(48)

\[\begin{aligned}{}\frac{1}{2}\le {\lambda _{j}}\big({N}^{-1/2}{C_{0}^{\top }}{C_{0}}{N}^{-1/2}\big)& \le 1,\hspace{1em}j=d+1,\dots ,n+d.\end{aligned}\]

As a result,

(49)

\[ \frac{1}{2}n\le \operatorname{tr}\big({N}^{-1/2}{C_{0}^{\top }}{C_{0}^{}}{N}^{-1/2}\big)\le n.\]

Because $\operatorname{tr}({C_{0}^{}}{N}^{-1}{C_{0}^{\top }})=\operatorname{tr}({C_{0}^{}}{N}^{-1/2}{N}^{-1/2}{C_{0}^{\top }})=\operatorname{tr}({N}^{-1/2}{C_{0}^{\top }}{C_{0}^{}}{N}^{-1/2})$,

(50)

\[ \frac{1}{2}n\le \operatorname{tr}\big({C_{0}}{N}^{-1}{C_{0}^{\top }}\big)\le n.\]

These properties will be used in Sections 8.2 and 8.3.

8.2 Use of eigenvector perturbation theorems

8.2.1 Univariate regression ($d=1$)

Remember inequalities (44) (whence (51) follows) and (45):

(51)

\[\begin{array}{l}\displaystyle {\widehat{X}_{\mathrm{ext}}^{\top }}N{\widehat{X}_{\mathrm{ext}}}\ge {\lambda _{\min }}\big({A_{0}^{\top }}{A_{0}}\big){\widehat{X}_{\mathrm{ext}}^{\top }}{\widehat{X}_{\mathrm{ext}}};\\{} \displaystyle N{X_{\mathrm{ext}}^{0}}={\lambda _{\min }}\big({A_{0}^{\top }}{A_{0}}\big){X_{\mathrm{ext}}^{0}}.\end{array}\]

Then

(52)

\[\begin{aligned}{}\frac{{({\widehat{X}_{\mathrm{ext}}^{\top }}{X_{\mathrm{ext}}^{0}})}^{2}}{{\widehat{X}_{\mathrm{ext}}^{\top }}{\widehat{X}_{\mathrm{ext}}}\cdot {X_{\mathrm{ext}}^{0\hspace{0.1667em}\top }}{X_{\mathrm{ext}}^{0}}}& \ge \frac{{({\widehat{X}_{\mathrm{ext}}^{\top }}N{X_{\mathrm{ext}}^{0}})}^{2}}{{\widehat{X}_{\mathrm{ext}}^{\top }}N{\widehat{X}_{\mathrm{ext}}}\cdot {X_{\mathrm{ext}}^{0\hspace{0.1667em}\top }}N{X_{\mathrm{ext}}^{0}}},\\{} {\cos }^{2}\angle \big({\widehat{X}_{\mathrm{ext}}},{X_{\mathrm{ext}}^{0}}\big)& \ge {\cos }^{2}\angle \big({N}^{1/2}{\widehat{X}_{\mathrm{ext}}},{N}^{1/2}{X_{\mathrm{ext}}^{0}}\big),\\{} {\sin }^{2}\angle \big({\widehat{X}_{\mathrm{ext}}},{X_{\mathrm{ext}}^{0}}\big)& \le {\sin }^{2}\angle \big({N}^{1/2}{\widehat{X}_{\mathrm{ext}}},{N}^{1/2}{X_{\mathrm{ext}}^{0}}\big).\end{aligned}\]

Now, apply Lemma 6.5 on the perturbation bound for the minimum-eigenvalue eigenvector. The unperturbed symmetric matrix is ${N}^{-1/2}{C_{0}^{\top }}{C_{0}}{N}^{-1/2}$, satisfying

\[\begin{aligned}{}{\lambda _{\min }}\big({N}^{-1/2}{C_{0}^{\top }}{C_{0}}{N}^{-1/2}\big)& =0,\\{} {N}^{-1/2}{C_{0}^{\top }}{C_{0}}{N}^{-1/2}{N}^{1/2}{X_{\mathrm{ext}}^{0}}& =0,\\{} {\lambda _{2}}\big({N}^{-1/2}{C_{0}^{\top }}{C_{0}}{N}^{-1/2}\big)& \ge \frac{1}{2}.\end{aligned}\]

The null-vector of the unperturbed matrix is ${N}^{-1/2}{X_{\mathrm{ext}}^{0}}$.

The column vector ${\widehat{X}_{\mathrm{ext}}}$ is a generalized eigenvector of the matrix pencil $\langle {C}^{\top }C,\varSigma \rangle $. Denote the corresponding eigenvalue by ${\lambda _{\min }}$. Thus,

\[ {C}^{\top }C{\widehat{X}_{\mathrm{ext}}}={\lambda _{\min }}\cdot \varSigma {\widehat{X}_{\mathrm{ext}}}.\]

The perturbed matrix is ${N}^{-1/2}({C}^{\top }C-m\varSigma ){N}^{-1/2}$; the minimum eigenvalue of the matrix pencil $\langle {N}^{-1/2}({C}^{\top }C-m\varSigma ){N}^{-1/2},\hspace{0.2778em}{N}^{-1/2}\varSigma {N}^{-1/2}\rangle $ is equal to ${\lambda _{\min }}-m$, and the eigenvector is ${N}^{1/2}{\widehat{X}_{\mathrm{ext}}}$:

\[ {N}^{-1/2}\big({C}^{\top }C-m\varSigma \big){N}^{-1/2}{N}^{1/2}{\widehat{X}_{\mathrm{ext}}}=({\lambda _{\min }}-m){N}^{-1/2}\varSigma {N}^{-1/2}{N}^{1/2}{\widehat{X}_{\mathrm{ext}}}.\]

We have to verify that ${N}^{-1/2}\varSigma {N}^{-1/2}{N}^{1/2}{X_{\mathrm{ext}}^{0}}\ne 0$; this follows from condition (6). Obviously, the matrix ${N}^{-1/2}\varSigma {N}^{-1/2}$ is positive semidefinite:

(53)

\[ {N}^{-1/2}\varSigma {N}^{-1/2}\ge 0.\]

Denote

\[ \epsilon =\big\| {N}^{-1/2}\big({C}^{\top }C-m\varSigma \big){N}^{-1/2}-{N}^{-1/2}{C_{0}^{\top }}{C_{0}}{N}^{-1/2}\big\| .\]

By Lemma 6.5

\[ {\sin }^{2}\angle \big({N}^{1/2}{\widehat{X}_{\mathrm{ext}}},{N}^{1/2}{X_{\mathrm{ext}}^{0}}\big)\le \frac{\epsilon }{0.5}\bigg(1+\frac{{X_{\mathrm{ext}}^{0\hspace{0.1667em}\top }}N{X_{\mathrm{ext}}^{0}}}{{X_{\mathrm{ext}}^{0\hspace{0.1667em}\top }}\varSigma {X_{\mathrm{ext}}^{0}}}\cdot \frac{{\widehat{X}_{\mathrm{ext}}^{\top }}\varSigma {\widehat{X}_{\mathrm{ext}}}}{{\widehat{X}_{\mathrm{ext}}^{\top }}N{\widehat{X}_{\mathrm{ext}}}}\bigg).\]

Use (45) and (51) again, and also use (52):

(54)

\[\begin{aligned}{}{\sin }^{2}\angle \big({\widehat{X}_{\mathrm{ext}}},{X_{\mathrm{ext}}^{0}}\big)& \le {\sin }^{2}\angle \big({N}^{1/2}{\widehat{X}_{\mathrm{ext}}},{N}^{1/2}{X_{\mathrm{ext}}^{0}}\big)\\{} & \le 2\epsilon \bigg(1+\frac{{X_{\mathrm{ext}}^{0\hspace{0.1667em}\top }}{X_{\mathrm{ext}}^{0}}}{{X_{\mathrm{ext}}^{0\hspace{0.1667em}\top }}\varSigma {X_{\mathrm{ext}}^{0}}}\cdot \frac{{\widehat{X}_{\mathrm{ext}}^{\top }}\varSigma {\widehat{X}_{\mathrm{ext}}}}{{\widehat{X}_{\mathrm{ext}}^{\top }}{\widehat{X}_{\mathrm{ext}}}}\bigg)\\{} & \le 2\epsilon \bigg(1+\frac{{X_{\mathrm{ext}}^{0\hspace{0.1667em}\top }}{X_{\mathrm{ext}}^{0}}\cdot \| \varSigma \| }{{X_{\mathrm{ext}}^{0\hspace{0.1667em}\top }}\varSigma {X_{\mathrm{ext}}^{0}}}\bigg).\end{aligned}\]

8.2.2 Multivariate regression ($d\ge 1$)

What follows is valid for both univariate ($d=1$) and multivariate ($d>1$) regression.

Due to (44), $N\ge {\lambda _{\min }}({A_{0}^{\top }}{A_{0}})I$ in the Loewner order; thus inequality (51) holds in the Loewner order. Hence

\[\begin{aligned}{}\forall v\in {\mathbb{R}}^{d}\setminus \{0\}:\hspace{0.1667em}& \frac{{v}^{\top }{\widehat{X}_{\mathrm{ext}}^{\top }}{X_{\mathrm{ext}}^{0}}{({X_{\mathrm{ext}}^{0\hspace{0.1667em}\top }}{X_{\mathrm{ext}}^{0}})}^{-1}{X_{\mathrm{ext}}^{0\hspace{0.1667em}\top }}{\widehat{X}_{\mathrm{ext}}}v}{{v}^{\top }{\widehat{X}_{\mathrm{ext}}^{\top }}{\widehat{X}_{\mathrm{ext}}}v}\\{} & \hspace{1em}\ge {\lambda _{\min }}\big({A_{0}^{\top }}{A_{0}}\big)\frac{{v}^{\top }{\widehat{X}_{\mathrm{ext}}^{\top }}{X_{\mathrm{ext}}^{0}}{({X_{\mathrm{ext}}^{0\hspace{0.1667em}\top }}{X_{\mathrm{ext}}^{0}})}^{-1}{X_{\mathrm{ext}}^{0\hspace{0.1667em}\top }}{\widehat{X}_{\mathrm{ext}}}v}{{v}^{\top }{\widehat{X}_{\mathrm{ext}}^{\top }}N{\widehat{X}_{\mathrm{ext}}}v}.\end{aligned}\]

With inequality (45), we get

\[\begin{aligned}{}& \frac{{v}^{\top }{\widehat{X}_{\mathrm{ext}}^{\top }}{X_{\mathrm{ext}}^{0}}{({X_{\mathrm{ext}}^{0\hspace{0.1667em}\top }}{X_{\mathrm{ext}}^{0}})}^{-1}{X_{\mathrm{ext}}^{0\hspace{0.1667em}\top }}{\widehat{X}_{\mathrm{ext}}}v}{{v}^{\top }{\widehat{X}_{\mathrm{ext}}^{\top }}{\widehat{X}_{\mathrm{ext}}}v}\\{} & \hspace{1em}\ge \frac{{v}^{\top }N{\widehat{X}_{\mathrm{ext}}^{\top }}{X_{\mathrm{ext}}^{0}}{({X_{\mathrm{ext}}^{0\hspace{0.1667em}\top }}N{X_{\mathrm{ext}}^{0}})}^{-1}{X_{\mathrm{ext}}^{0\hspace{0.1667em}\top }}N{\widehat{X}_{\mathrm{ext}}}v}{{v}^{\top }{\widehat{X}_{\mathrm{ext}}^{\top }}N{\widehat{X}_{\mathrm{ext}}}v}.\end{aligned}\]

Using equation (24) to determine the sine and noticing that

\[\begin{aligned}{}{P_{{X_{\mathrm{ext}}^{0}}}}& ={X_{\mathrm{ext}}^{0}}{\big({X_{\mathrm{ext}}^{0\hspace{0.1667em}\top }}{X_{\mathrm{ext}}^{0}}\big)}^{-1}{X_{\mathrm{ext}}^{0\hspace{0.1667em}\top }},\\{} {P_{{N}^{1/2}{X_{\mathrm{ext}}^{0}}}}& ={N}^{1/2}{X_{\mathrm{ext}}^{0}}{\big({X_{\mathrm{ext}}^{0\hspace{0.1667em}\top }}N{X_{\mathrm{ext}}^{0}}\big)}^{-1}{X_{\mathrm{ext}}^{0\hspace{0.1667em}\top }}{N}^{1/2},\end{aligned}\]

we get

(55)

\[\begin{array}{l}\displaystyle 1-{\big\| \sin \angle \big({\widehat{X}_{\mathrm{ext}}},{X_{\mathrm{ext}}^{0}}\big)\big\| }^{2}\ge 1-{\big\| \sin \angle \big({N}^{1/2}{\widehat{X}_{\mathrm{ext}}},{N}^{1/2}{X_{\mathrm{ext}}^{0}}\big)\big\| }^{2},\\{} \displaystyle \big\| \sin \angle \big({\widehat{X}_{\mathrm{ext}}},{X_{\mathrm{ext}}^{0}}\big)\big\| \le \big\| \sin \angle \big({N}^{1/2}{\widehat{X}_{\mathrm{ext}}},{N}^{1/2}{X_{\mathrm{ext}}^{0}}\big)\big\| .\end{array}\]

The TLS estimator ${\widehat{X}_{\mathrm{ext}}}$ is defined as a solution to the linear equations (8) for Δ that brings the minimum to (7). By Proposition 7.6, the same Δ brings the minimum to (11). By Proposition 7.10, the functions (38) and (39) attain their minima at the point ${\widehat{X}_{\mathrm{ext}}}$. Therefore, the minimum of the function

(56)

\[ M\mapsto {\lambda _{\max }}\big({\big({M}^{\top }{N}^{-1/2}\varSigma {N}^{-1/2}M\big)}^{-1}{M}^{\top }{N}^{-1/2}\big({C}^{\top }C-m\varSigma \big){N}^{-1/2}M\big)\]

is attained for $M={N}^{1/2}{\widehat{X}_{\mathrm{ext}}}$.

Now, apply Lemma 6.6 on perturbation bounds for a generalized invariant subspace. The unperturbed matrix (denoted A in Lemma 6.6) is ${N}^{-1/2}{C_{0}^{\top }}{C_{0}}{N}^{-1/2}$; its nullspace is the column space of the matrix ${N}^{1/2}{X_{\mathrm{ext}}^{0}}$ (which is denoted ${X_{0}}$ in Lemma 6.6). The perturbed matrix ($A+\tilde{A}$ in Lemma 6.6) is ${N}^{-1/2}({C}^{\top }C-m\varSigma ){N}^{-1/2}$. The matrix B in Lemma 6.6 equals ${N}^{-1/2}\varSigma {N}^{-1/2}$. The norm of the perturbation is denoted ϵ (it is $\| \tilde{A}\| $ in Lemma 6.6). The $(n+d)\times d$ matrix which brings the minimum to (56) is ${N}^{1/2}{\widehat{X}_{\mathrm{ext}}}$. The other conditions of Lemma 6.6 are (47), (48), and (53). We have

\[\begin{aligned}{}& {\big\| \sin \angle \big({N}^{1/2}{\widehat{X}_{\mathrm{ext}}},{N}^{1/2}{X_{\mathrm{ext}}^{0}}\big)\big\| }^{2}\\{} & \hspace{1em}\le \frac{\epsilon }{0.5}\big(1+\big\| {N}^{-1/2}\varSigma {N}^{-1/2}\big\| \hspace{0.1667em}{\lambda _{\max }}\big({\big({X_{\mathrm{ext}}^{0\hspace{0.1667em}\top }}\varSigma {X_{\mathrm{ext}}^{0}}\big)}^{-1}{X_{\mathrm{ext}}^{0\hspace{0.1667em}\top }}N{X_{\mathrm{ext}}^{0}}\big)\big).\end{aligned}\]

Again, with (55), (45) and (46), we have

(57)

\[\begin{aligned}{}& {\big\| \sin \angle \big({\widehat{X}_{\mathrm{ext}}},{X_{\mathrm{ext}}^{0}}\big)\big\| }^{2}\\{} & \hspace{1em}\le {\big\| \sin \angle \big({N}^{1/2}{\widehat{X}_{\mathrm{ext}}},{N}^{1/2}{X_{\mathrm{ext}}^{0}}\big)\big\| }^{2}\\{} & \hspace{1em}\le 2\epsilon \bigg(1+\frac{\| \varSigma \| }{{\lambda _{\min }}({A_{0}^{\top }}{A_{0}})}\hspace{0.1667em}{\lambda _{\max }}\big({\lambda _{\min }}\big({A_{0}^{\top }}{A_{0}}\big){\big({X_{\mathrm{ext}}^{0\hspace{0.1667em}\top }}\varSigma {X_{\mathrm{ext}}^{0}}\big)}^{-1}{X_{\mathrm{ext}}^{0\hspace{0.1667em}\top }}{X_{\mathrm{ext}}^{0}}\big)\bigg)\\{} & \hspace{1em}=2\epsilon \big(1+\| \varSigma \| \hspace{0.1667em}{\lambda _{\max }}\big({\big({X_{\mathrm{ext}}^{0\hspace{0.1667em}\top }}\varSigma {X_{\mathrm{ext}}^{0}}\big)}^{-1}{X_{\mathrm{ext}}^{0\hspace{0.1667em}\top }}{X_{\mathrm{ext}}^{0}}\big)\big).\end{aligned}\]

8.3 Proof of the convergence $\epsilon \to 0$

In this section, we prove the convergences

\[\begin{aligned}{}{M_{1}}& ={N}^{-1/2}{C_{0}^{\top }}\widetilde{C}{N}^{-1/2}\to 0,\\{} {M_{2}}& ={N}^{-1/2}\big({\widetilde{C}}^{\top }\widetilde{C}-m\varSigma \big){N}^{-1/2}\to 0\end{aligned}\]

in probability for Theorem 3.5, and almost surely for Theorems 3.6 and 3.7. As $\epsilon =\| {M_{1}^{}}+{M_{1}^{\top }}+{M_{2}}\| $, the convergences ${M_{1}}\to 0$ and ${M_{2}}\to 0$ imply $\epsilon \to 0$.

End of the proof of Theorem 3.5.

It holds that

\[\begin{aligned}{}\| {M_{1}}{\| _{F}^{2}}& =\big\| {N}^{-1/2}{C_{0}^{\top }}\tilde{C}{N}^{-1/2}{\big\| _{F}^{2}}=\operatorname{tr}\big({N}^{-1/2}{C_{0}^{\top }}\tilde{C}{N}^{-1}{C_{0}}{\tilde{C}}^{\top }{N}^{-1/2}\big)\\{} & =\operatorname{tr}\big({C_{0}^{}}{N}^{-1}{C_{0}^{\top }}\tilde{C}{N}^{-1}{\tilde{C}}^{\top }\big)={\sum \limits_{i=1}^{m}}{\sum \limits_{j=1}^{m}}{c_{i}^{0}}{N}^{-1}{\big({c_{j}^{0}}\big)}^{\top }{\tilde{c}_{j}}{N}^{-1}{\tilde{c}_{i}^{\top }}.\end{aligned}\]

The right-hand side can be simplified since $\mathbb{E}{\tilde{c}_{j}}{N}^{-1}{\tilde{c}_{i}^{\top }}=0$ for $i\ne j$ and $\mathbb{E}{\tilde{c}_{i}}{N}^{-1}{\tilde{c}_{i}^{\top }}=\operatorname{tr}(\varSigma {N}^{-1})$:

\[ \mathbb{E}\| {M_{1}}{\| _{F}^{2}}={\sum \limits_{i=1}^{m}}{c_{0i}}{N}^{-1}{c_{0i}^{\top }}\operatorname{tr}\big(\varSigma {N}^{-1}\big)=\operatorname{tr}\big({C_{0}}{N}^{-1}{C_{0}^{\top }}\big)\operatorname{tr}\big(\varSigma {N}^{-1}\big).\]

The first multiplier in the right-hand side is bounded due to (50) as $\operatorname{tr}({C_{0}}{N}^{-1}{C_{0}^{\top }})\le n$, for m large enough. Now, construct an upper bound for the second multiplier:

\[\begin{aligned}{}\operatorname{tr}\big(\varSigma {N}^{-1}\big)& =\big\| {N}^{-1/2}{\varSigma }^{1/2}{\big\| _{F}^{2}}\le {\big\| {N}^{-1/2}\big\| }^{2}\big\| {\varSigma }^{1/2}{\big\| _{F}^{2}}={\lambda _{\max }}\big({N}^{-1}\big)\operatorname{tr}\varSigma \\{} & =\frac{\operatorname{tr}\varSigma }{{\lambda _{\min }}(N)}=\frac{\operatorname{tr}\varSigma }{{\lambda _{\min }}({A_{0}^{\top }}{A_{0}^{}})}.\end{aligned}\]

Finally,

\[ \mathbb{E}\| {M_{1}}{\| _{F}^{2}}\le \frac{n\operatorname{tr}\varSigma }{{\lambda _{\min }}({A_{0}^{\top }}{A_{0}})}.\]

The conditions of Theorem 3.5 imply that ${\lambda _{\max }}({A_{0}^{\top }}{A_{0}})\to \infty $; therefore, ${M_{1}}\stackrel{\mathrm{P}}{\longrightarrow }0$ as $m\to \infty $.

Now, we prove that ${M_{2}}\stackrel{\mathrm{P}}{\longrightarrow }0$ as $m\to \infty $. We have

(58)

\[\begin{aligned}{}{M_{2}}& ={N}^{-1/2}\big({\tilde{C}}^{\top }\tilde{C}-m\varSigma \big){N}^{-1/2},\\{} \| {M_{2}}\| & \le \big\| {N}^{-1/2}\big\| \hspace{0.1667em}\big\| {\tilde{C}}^{\top }\tilde{C}-m\varSigma \big\| \hspace{0.1667em}\big\| {N}^{-1/2}\big\| =\frac{\| {\textstyle\sum _{i=1}^{m}}({\tilde{c}_{i}^{\top }}{\tilde{c}_{i}^{}}-\varSigma )\| }{{\lambda _{\min }}({A_{0}^{\top }}{A_{0}^{}})}.\end{aligned}\]

Now apply the Rosenthal inequality (case $1\le \nu \le 2$; Theorem 6.8) to construct a bound for $\mathbb{E}\| {M_{2}}{\| }^{r}$:

\[ \mathbb{E}\| {M_{2}}{\| }^{r}\le \frac{\mathrm{const}{\textstyle\sum _{i=1}^{m}}\mathbb{E}\| {\tilde{c}_{i}^{\top }}{\tilde{c}_{i}^{}}-\varSigma {\| }^{r}}{{\lambda _{\min }^{r}}({A_{0}^{\top }}{A_{0}^{}})}.\]

By the conditions of Theorem 3.5, the sequence $\{\mathbb{E}\| {\tilde{c}_{i}^{\top }}{\tilde{c}_{i}^{}}-\varSigma {\| }^{r},\hspace{2.5pt}i=1,2,\dots \}$ is bounded. Hence

\[\begin{aligned}{}\mathbb{E}\| {M_{2}}{\| }^{r}& \le \frac{O(m)}{{\lambda _{\min }^{r}}({A_{0}^{\top }}{A_{0}^{}})}\hspace{1em}\text{as}\hspace{2.5pt}m\to \infty ,\\{} \mathbb{E}\| {M_{2}}{\| }^{r}& \to 0\hspace{1em}\text{and}\hspace{1em}{M_{2}}\stackrel{\mathrm{P}}{\longrightarrow }0\hspace{1em}\text{as}\hspace{2.5pt}m\to \infty .\end{aligned}\]

□

End of the proof of Theorem 3.6.

\[ {M_{1}}={\sum \limits_{i=1}^{m}}{N}^{-1/2}{c_{0i}^{\top }}{\tilde{c}_{i}}{N}^{-1/2}.\]

By the Rosenthal inequality (case $\nu \ge 2$; Theorem 6.7)

\[\begin{aligned}{}\mathbb{E}\| {M_{1}}{\| }^{2r}& \le \mathrm{const}{\sum \limits_{i=1}^{m}}\mathbb{E}{\big\| {N}^{-1/2}{c_{0i}^{\top }}{\tilde{c}_{i}}{N}^{-1/2}\big\| }^{2r}+\\{} & \hspace{1em}+\mathrm{const}{\Bigg({\sum \limits_{i=1}^{m}}\mathbb{E}{\big\| {N}^{-1/2}{c_{0i}^{\top }}{\tilde{c}_{i}}{N}^{-1/2}\big\| }^{2}\Bigg)}^{r}.\end{aligned}\]

Construct an upper bound for the first summand:

\[\begin{aligned}{}{\sum \limits_{i=1}^{m}}\mathbb{E}{\big\| {N}^{-1/2}{c_{0i}^{\top }}{\tilde{c}_{i}}{N}^{-1/2}\big\| }^{2r}& \le {\sum \limits_{i=1}^{m}}{\big\| {N}^{-1/2}{c_{0i}^{\top }}\big\| }^{2r}\underset{i=1,\dots ,m}{\max }\mathbb{E}\| {\tilde{c}_{i}}{\| }^{2r}{\big\| {N}^{-1/2}\big\| }^{2r},\\{} {\sum \limits_{i=1}^{m}}{\big\| {N}^{-1/2}{c_{0i}^{\top }}\big\| }^{2r}& \le {\Bigg({\sum \limits_{i=1}^{m}}{\big\| {N}^{-1/2}{c_{0i}^{\top }}\big\| }^{2}\Bigg)}^{r}\\{} & ={\Bigg({\sum \limits_{i=1}^{m}}{c_{0i}}{N}^{-1}{c_{0i}^{\top }}\Bigg)}^{r}={\big(\operatorname{tr}\big({C_{0}}{N}^{-1}{C_{0}^{\top }}\big)\big)}^{r}\le {n}^{r}\end{aligned}\]

by inequality (50). By the conditions of Theorem 3.6, the sequence $\{\underset{i=1,\dots ,m}{\max }\mathbb{E}\| {\tilde{c}_{i}}{\| }^{2r},\hspace{2.5pt}m=1,2,\dots \}$ is bounded. Remember that $\| {N}^{-1/2}\| ={\lambda _{\min }^{-1/2}}({A_{0}^{\top }}{A_{0}})$. Thus,

\[ {\sum \limits_{i=1}^{m}}\mathbb{E}{\big\| {N}^{-1/2}{c_{0i}^{\top }}{\tilde{c}_{i}}{N}^{-1/2}\big\| }^{2r}=\frac{O(1)}{{\lambda _{\min }^{r}}({A_{0}^{\top }}{A_{0}})}\hspace{1em}\text{as}\hspace{2.5pt}m\to \infty .\]

The asymptotic relation

\[ {\sum \limits_{i=1}^{m}}\mathbb{E}{\big\| {N}^{-1/2}{c_{0i}^{\top }}{\tilde{c}_{i}}{N}^{-1/2}\big\| }^{2}=\frac{O(1)}{{\lambda _{\min }}({A_{0}^{\top }}{A_{0}})}\]

can be proved similarly; in order to prove it, we use boundedness of the sequence $\{\underset{i=1,\dots ,m}{\max }\mathbb{E}\| {\tilde{c}_{i}}{\| }^{2},\hspace{2.5pt}m=1,2,\dots \}$. Finally,

\[ \mathbb{E}\| {M_{1}}{\| }^{2r}=\frac{O(1)}{{\lambda _{\min }^{r}}({A_{0}^{\top }}{A_{0}^{}})}\hspace{1em}\text{as}\hspace{2.5pt}m\to \infty \text{.}\]

The conditions of Theorem 3.6 imply that ${\sum _{m={m_{0}}}^{\infty }}\mathbb{E}\| {M_{1}}{\| }^{2r}<\infty $, whence ${M_{1}}\to 0$ as $m\to \infty $, almost surely.

Now, prove that ${M_{2}}\to 0$ almost surely. In order to construct a bound for $\mathbb{E}\| {M_{2}}{\| }^{r}$, use the Rosenthal inequality (case $\nu \ge 2$; Theorem 6.7) as well as (58):

\[\begin{aligned}{}\mathbb{E}\| {M_{2}}{\| }^{r}& \le \frac{\mathbb{E}\| {\textstyle\sum _{i=1}^{m}}({c_{i}^{\top }}{\tilde{c}_{i}^{}}-\varSigma ){\| }^{r}}{{\lambda _{\min }^{r}}({A_{0}^{\top }}{A_{0}^{}})}\\{} & \le \frac{\mathrm{const}{\textstyle\sum _{i=1}^{m}}\mathbb{E}\| {\tilde{c}_{i}^{\top }}{\tilde{c}_{i}^{}}-\varSigma {\| }^{r}}{{\lambda _{\min }^{r}}({A_{0}^{\top }}{A_{0}^{}})}+\frac{\mathrm{const}{({\textstyle\sum _{i=1}^{m}}\mathbb{E}\| {\tilde{c}_{i}^{\top }}{\tilde{c}_{i}^{}}-\varSigma {\| }^{2})}^{r/2}}{{\lambda _{\min }^{r}}({A_{0}^{\top }}{A_{0}^{}})}.\end{aligned}\]

Under the conditions of Theorem 3.6, the sequences $\{\mathbb{E}\| {\tilde{c}_{i}^{\top }}{\tilde{c}_{i}^{}}-\varSigma {\| }^{r},\hspace{2.5pt}i=1,2,\dots \}$ and $\{\mathbb{E}\| {\tilde{c}_{i}^{\top }}{\tilde{c}_{i}^{}}-\varSigma {\| }^{2},\hspace{2.5pt}i=1,2,\dots \}$ are bounded. Thus,

\[\begin{aligned}{}\mathbb{E}\| {M_{2}}{\| }^{r}& =\frac{O({m}^{r/2})}{{\lambda _{\min }^{r}}({A_{0}^{\top }}{A_{0}^{}})}\hspace{1em}\text{as}\hspace{2.5pt}m\to \infty \text{;}\\{} {\sum \limits_{m={m_{0}}}^{\infty }}\mathbb{E}\| {M_{2}}{\| }^{r}& <\infty ,\end{aligned}\]

whence ${M_{2}}\to 0$ as $m\to \infty $, almost surely. □

End of the proof of Theorem 3.7.

The proof of the asymptotic relation

\[ \mathbb{E}\| {M_{1}}{\| }^{2r}=\frac{O(1)}{{\lambda _{\min }^{r}}({A_{0}^{\top }}{A_{0}^{}})}\hspace{1em}\text{as}\hspace{2.5pt}m\to \infty \]

from Theorem 3.6 is still valid. The almost sure convergence ${M_{1}}\to 0$ as $m\to \infty $ is proved in the same way as in Theorem 3.6.

Now, show that ${M_{2}}\to 0$ as $m\to \infty $, almost surely. Under the condition of Theorem 3.7,

\[ \mathbb{E}{\big\| {\tilde{c}_{m}^{\top }}{\tilde{c}_{m}^{}}-\varSigma \big\| }^{r}=O(1),\hspace{2em}{\sum \limits_{m={m_{0}}}^{\infty }}\frac{\mathbb{E}\| {\tilde{c}_{m}^{\top }}{\tilde{c}_{m}^{}}-\varSigma {\| }^{r}}{{\lambda _{\mathrm{min}}^{r}}({A_{0}^{\top }}{A_{0}^{}})}<\infty ,\]

and $\mathbb{E}{\tilde{c}_{i}^{\top }}{\tilde{c}_{i}^{}}-\varSigma =0$. The sequence of nonnegative numbers $\{{\lambda _{\min }}({A_{0}^{\top }}{A_{0}}),\hspace{2.5pt}m=1,2,\dots \}$ never decreases and tends to $+\infty $. Then, by the Law of large numbers in [16, Theorem 6.6, page 209]

\[ \frac{1}{{\lambda _{\mathrm{min}}}({A_{0}^{\top }}{A_{0}^{}})}{\sum \limits_{i=1}^{m}}\big({\tilde{c}_{i}^{\top }}{\tilde{c}_{i}}-\varSigma \big)\to 0\hspace{1em}\text{as}\hspace{2.5pt}m\to \infty \text{,}\hspace{1em}\text{a.s.,}\]

whence, with (58),

\[\begin{aligned}{}\| {M_{2}}\| & \le \frac{\| {\textstyle\sum _{i=1}^{m}}({\tilde{c}_{i}^{\top }}{\tilde{c}_{i}}-\varSigma )\| }{{\lambda _{\min }}({A_{0}^{\top }}{A_{0}^{}})}\to 0\hspace{1em}\text{as}\hspace{2.5pt}m\to \infty \text{, a.s.;}\\{} {M_{2}}& \to 0\hspace{1em}\text{as}\hspace{2.5pt}m\to \infty ,\hspace{1em}\text{a.s.}\end{aligned}\]

□

8.4 Proof of the uniqueness theorems

Proof of Theorem 4.1.

The random events 1, 2 and 3 are defined in the statement of this theorem on page . The random event 1 always occurs. This was proved in Section 2.2 where the estimator ${\widehat{X}_{\mathrm{ext}}}$ is defined. In order to prove the rest, we first construct the random event (59), which occurs either with high probability or eventually. Then we prove that, whenever (59) occurs, there is the existence and “more than uniqueness” in the random event 3, and then prove that the random event 2 occurs.

Now, we construct a modified version ${\widehat{X}_{\mathrm{ext}}^{\mathrm{mod}}}$ of the estimator ${\widehat{X}_{\mathrm{ext}}}$ in the following way. If there exist such solutions $(\Delta ,{\widehat{X}_{\mathrm{ext}}})$ to (7) & (8) that $\| \sin \angle ({\widehat{X}_{\mathrm{ext}}},{X_{\mathrm{ext}}^{0}})\| \ge {(1+\| {X_{0}}{\| }^{2})}^{-1/2}$, let ${\widehat{X}_{\mathrm{ext}}^{\mathrm{mod}}}$ come from one of such solutions. Otherwise, if for every solution $(\Delta ,{\widehat{X}_{\mathrm{ext}}})$ to (7) & (8) $\| \sin \angle ({\widehat{X}_{\mathrm{ext}}},{X_{\mathrm{ext}}^{0}})\| <{(1+\| {X_{0}}{\| }^{2})}^{-1/2}$, let ${\widehat{X}_{\mathrm{ext}}^{\mathrm{mod}}}$ come from one of these solutions. In any case, let us construct ${\widehat{X}_{\mathrm{ext}}^{\mathrm{mod}}}$ in such a way that it is a random matrix. It is possible; that follows from [17].

Thus we construct a matrix ${\widehat{X}_{\mathrm{ext}}^{\mathrm{mod}}}$ such that:

1. ${\widehat{X}_{\mathrm{ext}}^{\mathrm{mod}}}$ is a $(d+n)\times n$ random matrix;
2. for some $\Delta \in {\mathbb{R}}^{m\times (d+n)}$, $(\Delta ,{\widehat{X}_{\mathrm{ext}}^{\mathrm{mod}}})$ is a solution to (7) & (8);
3. if $\| \sin \angle ({\widehat{X}_{\mathrm{ext}}^{\mathrm{mod}}},{X_{\mathrm{ext}}^{0}})\| <{(1+\| {X_{0}}{\| }^{2})}^{-1/2}$, then $\| \sin \angle ({\widehat{X}_{\mathrm{ext}}},{X_{\mathrm{ext}}^{0}})\| <{(1+\| {X_{0}}{\| }^{2})}^{-1/2}$ for any solution $(\Delta ,{\widehat{X}_{\mathrm{ext}}})$ to (7) & (8).

From the proof of Theorem 3.5 it follows that $\| \sin \angle ({\widehat{X}_{\mathrm{ext}}^{\mathrm{mod}}},{X_{\mathrm{ext}}^{0}})\| \to 0$ in probability as $m\to \infty $. From the proof of Theorem 3.6 or 3.7 it follows that $\| \sin \angle ({\widehat{X}_{\mathrm{ext}}^{\mathrm{mod}}},{X_{\mathrm{ext}}^{0}})\| \to 0$ almost surely. Then

(59)

\[ \big\| \sin \angle \big({\widehat{X}_{\mathrm{ext}}^{\mathrm{mod}}},{X_{\mathrm{ext}}^{0}}\big)\big\| <\frac{1}{\sqrt{1+\| {X_{0}}{\| }^{2}}}\]

either with high probability or almost surely.

Whenever the random event (59) occurs, for any solution Δ to (7) and the corresponding full-rank solution ${\widehat{X}_{\mathrm{ext}}}$ to (8) (which always exists) it holds that $\| \sin \angle ({\widehat{X}_{\mathrm{ext}}},{X_{\mathrm{ext}}^{0}})\| <{(1+\| {X_{0}}{\| }^{2})}^{-1/2}$, whence, due to Theorem 8.3, the bottom $d\times d$ block of the matrix ${\widehat{X}_{\mathrm{ext}}}$ is nonsingular. Right-multiplying ${\widehat{X}_{\mathrm{ext}}}$ by a nonsingular matrix, we can transform it into a form $(\begin{array}{c}\widehat{X}\\{} -I\end{array})$. The constructed matrix $\widehat{X}$ is a solution to equation (9) for given Δ. Thus, we have just proved that if the random event (59) occurs, then for any Δ which is a solution to (7), equation (9) has a solution.

Now, prove the uniqueness of $\widehat{X}$. Let $({\Delta _{1}},{\widehat{X}_{1}})$ and $({\Delta _{2}},{\widehat{X}_{2}})$ be two solutions to (7) & (9). Show that ${\widehat{X}_{1}}={\widehat{X}_{2}}$. (If we can for ${\Delta _{1}}={\Delta _{2}}$, then the random event 3 occurs.) Denote ${\widehat{X}_{1}^{\mathrm{ext}}}=(\begin{array}{c}{\widehat{X}_{1}}\\{} -I\end{array})$ and ${\widehat{X}_{2}^{\mathrm{ext}}}=(\begin{array}{c}{\widehat{X}_{2}}\\{} -I\end{array})$. By Proposition 7.9, $\operatorname{span}\langle {\widehat{X}_{1}^{\mathrm{ext}}}\rangle \subset \operatorname{span}\langle {u_{k}},\hspace{0.2778em}{\nu _{k}}\le d\rangle $ and $\operatorname{span}\langle {\widehat{X}_{2}^{\mathrm{ext}}}\rangle \subset \operatorname{span}\langle {u_{k}},\hspace{0.2778em}{\nu _{k}}\le d\rangle $, where ${\nu _{k}}$ and ${u_{k}}$ are generalized eigenvalues (arranged in ascending order) and respective eigenvectors of the matrix pencil $\langle {X}^{\top }X,\hspace{0.1667em}\varSigma \rangle $.

Assume by contradiction that ${\widehat{X}_{1}}\ne {\widehat{X}_{2}}$. Then $\operatorname{rk}[{\widehat{X}_{1}^{\mathrm{ext}}},\hspace{0.2778em}{\widehat{X}_{2}^{\mathrm{ext}}}]\ge d+1$, where $[{\widehat{X}_{1}^{\mathrm{ext}}},\hspace{0.2778em}{\widehat{X}_{2}^{\mathrm{ext}}}]$ is an $(n+d)\times 2d$ matrix constructed of ${\widehat{X}_{1}^{\mathrm{ext}}}$ and ${\widehat{X}_{2}^{\mathrm{ext}}}$. Then

\[ {d}^{\ast }=\operatorname{rk}\langle {u_{k}},\hspace{0.2778em}{\nu _{k}}\le d\rangle \ge \operatorname{rk}\left[\begin{array}{c@{\hskip10.0pt}c}{\widehat{X}_{1}^{\mathrm{ext}}},& {\widehat{X}_{2}^{\mathrm{ext}}}\end{array}\right]\ge d+1\]

(which means ${\nu _{d}}={\nu _{d+1}}$). Then ${d_{\ast }}-1<d<{d}^{\ast }$, where ${d_{\ast }}-1=\dim \operatorname{span}\langle {u_{k}},\hspace{0.2778em}{\nu _{k}}<d\rangle $, $d=\dim \operatorname{span}\langle {X_{\mathrm{ext}}^{0}}\rangle $ and ${d}^{\ast }=\dim \operatorname{span}\langle {u_{k}},\hspace{0.2778em}{\nu _{k}}\le d\rangle $ (notation ${d_{\ast }}$ and ${d}^{\ast }$ comes from the proof of Proposition 7.9). By Lemma 6.4, there exists a d-dimensional subspace ${V_{12}}$ for which $\operatorname{span}\langle {u_{k}},\hspace{0.1667em}{\nu _{k}}<d\rangle \subset {V_{12}}\subset \operatorname{span}\langle {u_{k}},\hspace{0.1667em}{\nu _{k}}\le d\rangle $ and $\| \sin \angle ({V_{12}},{X_{\mathrm{ext}}^{0}})\| =1$. Bind a basis of the d-dimensional subspace ${V_{12}}\subset {\mathbb{R}}^{(n+d)}$ into the $(n+d)\times d$ matrix ${\widehat{X}_{3}^{\mathrm{ext}}}$, so $\operatorname{span}\langle {\widehat{X}_{3}^{\mathrm{ext}}}\rangle ={V_{12}}$. Again, by Proposition 7.9 for some matrix Δ, $(\Delta ,{\widehat{X}_{3}^{\mathrm{ext}}})$ is a solution to (7) & (9). Then $\| \sin \angle ({\widehat{X}_{3}^{\mathrm{ext}}},{X_{\mathrm{ext}}^{0}})\| =1\ge {(1+\| {X_{0}}{\| }^{2})}^{-1/2}$. Then $\| \sin \angle ({\widehat{X}_{\mathrm{ext}}^{\mathrm{mod}}},{X_{\mathrm{ext}}^{0}})\| \ge {(1+\| {X_{0}}{\| }^{2})}^{-1/2}$, which contradicts (59). Thus, the random event 3 occurs.

Now prove that the random event 2 occurs. Let ${\Delta _{1}}$ and ${\Delta _{2}}$ be two solutions to the optimization problem (7). Whenever the random event (59) occurs, the respective solutions ${\widehat{X}_{1}}$ and ${\widehat{X}_{2}}$ to equation (9) exist. By already proved uniqueness, they are equal, i.e., ${\widehat{X}_{1}}={\widehat{X}_{2}}$. Then both ${\Delta _{1}}$ and ${\Delta _{2}}$ are solutions to the optimization problem

(60)

\[ \left\{\begin{array}{l}\| \Delta \hspace{0.1667em}{({\varSigma }^{1/2})}^{\dagger }{\| _{F}}\to \min ;\hspace{1em}\\{} \Delta \hspace{0.1667em}(I-{P_{\varSigma }})=0;\hspace{1em}\\{} (C-\Delta ){\widehat{X}_{1}^{\mathrm{ext}}}=0\hspace{1em}\end{array}\right.\]

for the fixed ${\widehat{X}_{1}^{\mathrm{ext}}}=(\begin{array}{c}{\widehat{X}_{1}}\\{} -I\end{array})=(\begin{array}{c}{\widehat{X}_{2}}\\{} -I\end{array})$. By Proposition 7.2 and Remark 7.2-1, the least element in the optimization problem (28) for $X={\widehat{X}_{1}^{\mathrm{ext}}}$ is attained for the unique matrix $\Delta =C{\widehat{X}_{1}^{\mathrm{ext}}}{({\widehat{X}_{1}^{\mathrm{ext}\hspace{0.1667em}\top }}\varSigma {\widehat{X}_{1}^{\mathrm{ext}}})}^{\dagger }{\widehat{X}_{1}^{\mathrm{ext}\hspace{0.1667em}\top }}\varSigma $. Since it is attained, it is also attained for both ${\Delta _{1}}$ and ${\Delta _{2}}$. Hence, ${\Delta _{1}}={\Delta _{2}}$. Thus, the random event 2 occurs.

We proved that the random event 1 always occurs, and the random events 2 and 3 occur whenever (59) occurs, which occurs either with high probability or eventually as desired. □

Remark 8.1.

This uniqueness of the solution Δ to the optimization problem (7) agrees with the uniqueness result in [6]. The solution is unique if ${\nu _{d}}<{\nu _{d+1}}$.

Proof of Theorem 4.2.

1. In Theorem 4.1, the event 1 occurs always, not just with high probability or eventually. The solution Δ to (7) exists and also solves (11) due to Proposition 7.6. Thus, the first sentence of Theorem 4.2 is true. The second sentence of Theorem 4.2 has been already proved, since the constraints in the optimization problems (7) and (11) are the same.

2 & 3. The proof of consistency of the estimator defined with (11) & (9) and of the existence of the solution is similar to the proof for the estimator defined with (7) & (9) in Theorems 3.5–3.7 and 4.1. The only difference is skipping the use of Proposition 7.6. Notice that we do not prove the uniqueness of the solution because we cannot use Proposition 7.9. □

To Remark 4.2-1. The amended Theorem 4.2 can be proved similarly. In the proof of part 1, read “The solution Δ to (7) … solves (12) due to Proposition 7.8.” In the proof of parts 2 and 3, read “The only difference is using Proposition 7.8, part 2 instead of Proposition 7.6.”

Proofs of auxiliary results

8.5 Proof of lemmas on perturbation bounds for invariant subspaces

Proof of Lemma 6.5 and Remark 6.5-1.

For the proof of Lemma 6.5 itself, see parts 2 and 3 of the proof below. For the proof of Remark 6.5-1, see parts 2, 3 and 4 below. Part 1 is a mere discussion of why the conditions of Remark 6.5-1 are more general than ones of Lemma 6.5.

In the proof, we assume that $\{x:{x}^{\top }Bx>0\}$ is the domain of the function $f(x)$. The assumption affects the definition of ${\lim _{x\to {x_{\ast }}}}f(x)$, and $\inf f$ is the infimum of $f(x)$ over the domain.

1. At first, clarify the conditions of Remark 6.5-1. As it is, the existence of a point x such that

(61)

\[ \underset{\vec{t}\to x}{\liminf }f(\vec{t})=\underset{{\vec{t}}^{\top }\hspace{-0.1667em}B\vec{t}>0}{\inf }f(\vec{t})\]

is assumed in Remark 6.5-1. Now, prove that, under the preceding condition of Remark 6.5-1, there exists a vector $x\ne 0$ that satisfies (61).

The function $f(x)$ is homogeneous of degree 0, i.e.,

\[ f(kx)=f(x)\hspace{1em}\text{if}\hspace{2.5pt}k\in \mathbb{R}\setminus \{0\}\hspace{2.5pt}\text{and}\hspace{2.5pt}{x}^{\top }Bx>0.\]

Hence, all values which are attained by $f(x)$ on its domain $\{x:{x}^{\top }Bx>0\}$, are also attained on the bounded set $\{x:\| x\| =1,\hspace{0.1667em}{x}^{\top }Bx>0\}$:

\[ f\big(\big\{x:\| x\| =1,\hspace{0.1667em}{x}^{\top }Bx>0\big\}\big)=f\big(\big\{x:{x}^{\top }Bx>0\big\}\big).\]

Then

\[ \underset{\begin{array}{c}\| x\| =1\\{} {x}^{\top }Bx>0\end{array}}{\inf }f(x)=\underset{{x}^{\top }Bx>0}{\inf }f(x).\]

Let F be a closure of $\{x:\| x\| =1,\hspace{0.1667em}{x}^{\top }Bx>0\}$. There is a sequence $\{{x_{k}},k=1,2,\dots \}$ such that $\| {x_{k}}\| =1$ and ${x_{k}^{\top }}B{x_{k}}>0$ for all k, and ${\lim _{k\to \infty }}f({x_{k}})={\inf _{{x}^{\top }Bx>0}}f(x)$. Since F is a compact set, there exists ${x_{\ast }}\in F$ which is a limit of some subsequence $\{{x_{{k_{i}}}},\hspace{0.2222em}i=1,2,\dots \}$ of $\{{x_{k}},\hspace{0.2222em}k=1,2,\dots \}$. Then either

(62)

\[ \underset{x\to {x_{\ast }}}{\liminf }f(x)\le \underset{{x}^{\top }Bx>0}{\inf }f(x)\]

or, if ${x_{{k_{i}}}}={x_{\ast }}$ for i large enough,

(63)

\[ f({x_{\ast }})\le \underset{{x}^{\top }Bx>0}{\inf }f(x).\]

(In equations (62) and (63), we assume that $\{x:{x}^{\top }Bx>0\}$ is a domain of $f(x)$, so (63) implies ${x_{\ast }^{\top }}B{x_{\ast }^{}}>0$.) Again, due to the homogeneity, $\underset{x\to {x_{\ast }}}{\liminf }f(x)\le f({x_{\ast }})$ if $f({x_{\ast }})$ makes sense. Hence (62) follows from (63) and thus holds true either way.

Taking the limit in the relation $f(x)\ge \inf f$, we obtain the opposite inequality

\[ \underset{x\to {x_{\ast }}}{\liminf }f(x)\ge \underset{{x}^{\top }Bx>0}{\inf }f(x).\]

Thus, the equality (25) holds true for some ${x_{\ast }}\in F$. Note that $\| {x_{\ast }}\| =1$, so ${x_{\ast }}\ne 0$.

2. Prove that under the conditions of Lemma 6.5 or Remark 6.5-1

\[ \left[\begin{array}{l@{\hskip10.0pt}l}\text{either}& f({x_{\ast }})\le f(x)\\{} \text{or}& {x_{\ast }^{\top }}(A+\tilde{A}){x_{\ast }}\le 0.\end{array}\right.\]

Because the matrix B is symmetric and positive semidefinite, ${x}^{\top }Bx=0$ if and only if $Bx=0$, and ${x}^{\top }Bx>0$ if and only if $Bx\ne 0$. As $B{x_{0}}\ne 0$, ${x_{0}^{\top }}B{x_{0}}>0$ and the function $f(x)$ is well-defined at ${x_{0}}$.

Under the conditions of Lemma 6.5 the function $f(x)$ is well-defined at ${x_{0}}$ and attains its minimum at ${x_{\ast }}$, so $f({x_{\ast }})\le f({x_{0}})$.

Under the conditions of Remark 6.5-1 we consider 3 cases concerning the value of ${x_{\ast }^{\top }}B{x_{\ast }}$.

Case 1. ${x_{\ast }^{\top }}B{x_{\ast }}<0$. But on the domain of $f(x)$ the inequality ${x}^{\top }Bx>0$ holds true. Since ${x_{\ast }}$ is a limit point of the domain of $f(x)$, the inequality ${x_{\ast }^{\top }}B{x_{\ast }}\ge 0$ holds true, and Case 1 is impossible.

Case 2. ${x_{\ast }^{\top }}B{x_{\ast }}=0$. Prove that ${x_{\ast }^{\top }}(A+\tilde{A}){x_{\ast }}\le 0$. On the contrary, let ${x_{\ast }^{\top }}(A+\tilde{A}){x_{\ast }}>0$. Remember once again that ${x}^{\top }Bx>0$ on the domain of $f(x)$. Then

\[ \underset{x\to {x_{\ast }}}{\lim }f(x)=\underset{x\to {x_{\ast }}}{\lim }\frac{{x}^{\top }(A+\tilde{A})x}{{x}^{\top }Bx}=+\infty ,\]

which cannot be $\inf f(x)$. The contradiction obtained implies that ${x_{\ast }^{\top }}(A+\tilde{A}){x_{\ast }}\le 0$.

Case 3. ${x_{\ast }^{\top }}B{x_{\ast }}>0$. Then the function $f(x)$ is well-defined at ${x_{\ast }}$, and

\[ f({x_{\ast }})=\underset{x\to {x_{\ast }}}{\lim }f(x)=\inf f(x)\le f({x_{0}}).\]

So, $f({x_{\ast }})\le f({x_{0}})$ in Case 3.

3. Proof of Lemma 6.5 and proof of Remark 6.5-1 when $f({x_{\ast }})\le f({x_{\ast }})$. Then

\[ \frac{{x}^{\top }(A+\tilde{A})x}{{x}^{\top }Bx}\le \frac{{x_{0}^{\top }}(A+\tilde{A}){x_{0}}}{{x_{0}^{\top }}B{x_{0}}}\hspace{0.1667em}.\]

As $A{x_{0}}=0$,

\[ {x}^{\top }Ax\le -{x}^{\top }\tilde{A}x+\frac{{x_{0}^{\top }}\tilde{A}{x_{0}}\hspace{0.1667em}{x}^{\top }Bx}{{x_{0}^{\top }}B{x_{0}}}\le \| \tilde{A}\| \bigg(\| x{\| }^{2}+\frac{\| {x_{0}}{\| }^{2}{x}^{\top }Bx}{{x_{0}^{\top }}B{x_{0}}}\bigg).\]

With use of eigendecomposition of A, the inequality ${x}^{\top }Ax\ge {\lambda _{2}}(A)\hspace{0.1667em}\| x{\| }^{2}\times {\sin }^{2}\angle (x,{x_{0}})$ can be proved. Hence the desired inequality follows:

\[ {\lambda _{2}}(A){\sin }^{2}\angle (x,{x_{0}})\le \| \tilde{A}\| \bigg(1+\frac{\| {x_{0}}{\| }^{2}}{{x_{0}^{\top }}B{x_{0}^{}}}\cdot \frac{{x}^{\top }Bx}{\| x{\| }^{2}}\bigg).\]

4. Proof of Remark 6.5-1 when ${x_{\ast }^{\top }}(A+\tilde{A}){x_{\ast }}\le 0$. Then

\[\begin{aligned}{}{x}^{\top }Ax& \le -{x}^{\top }\tilde{A}x,\\{} {\lambda _{2}}(A)\| x{\| }^{2}{\sin }^{2}\angle (x,{x_{0}})& \le \| \tilde{A}\| \hspace{0.1667em}\| x{\| }^{2},\\{} {\lambda _{2}}(A){\sin }^{2}\angle (x,{x_{0}})& \le \| \tilde{A}\| ,\end{aligned}\]

whence the desired inequality follows. □

Notation.

If A and B are symmetric matrices of the same size, and furthermore the matrix B is positive definite, denote

\[ \max \frac{A}{B}={\lambda _{\max }}\big({B}^{-1}A\big).\]

The notation is used in the proof of Lemma 6.6.

Lemma 8.2.

Let $1\le {d_{1}}\le n$, $0\le {d_{2}}\le n$. Let $X\in {\mathbb{R}}^{n\times {d_{1}}}$ be a matrix of full rank, and V be a ${d_{2}}$-dimensional subspace in ${\mathbb{R}}^{n}$. Then

\[\begin{aligned}{}\max \frac{{X}^{\top }(I-{P_{V}})X}{{X}^{\top }X}& ={\big\| \sin \angle (X,V)\big\| }^{2}\hspace{1em}\textit{if}\hspace{1em}{d_{1}}\le {d_{2}},\\{} \max \frac{{X}^{\top }(I-{P_{V}})X}{{X}^{\top }X}& =1\hspace{1em}\textit{if}\hspace{1em}{d_{1}}>{d_{2}}.\end{aligned}\]

Proof.

Using the min-max theorem, the relation $\operatorname{span}\langle X\rangle =\operatorname{span}\langle {P_{\operatorname{span}\langle X\rangle }}\rangle $ and simple properties of orthogonal projectors, construct the inequality

\[\begin{aligned}{}& \max \frac{{X}^{\top }(I-{P_{V}})X}{{X}^{\top }X}\\{} & \hspace{1em}=\underset{v\in {\mathbb{R}}^{{d_{1}}}\setminus \{0\}}{\max }\frac{{v}^{\top }{X}^{\top }(I-{P_{V}})Xv}{{v}^{\top }{X}^{\top }Xv}\\{} & \hspace{1em}=\underset{w\in \operatorname{span}\langle X\rangle \setminus \{0\}}{\max }\frac{{w}^{\top }(I-{P_{V}})w}{{w}^{\top }w}=\underset{v\in {\mathbb{R}}^{n}\setminus \{0\}}{\max }\frac{{v}^{\top }{P_{\operatorname{span}\langle X\rangle }}(I-{P_{V}}){P_{\operatorname{span}\langle X\rangle }}v}{{v}^{\top }{P_{\operatorname{span}\langle X\rangle }}{P_{\operatorname{span}\langle X\rangle }}v}\\{} & \hspace{1em}\ge \underset{v\in {\mathbb{R}}^{n}\setminus \{0\}}{\max }\frac{{v}^{\top }{P_{\operatorname{span}\langle X\rangle }}(I-{P_{V}}){P_{\operatorname{span}\langle X\rangle }}v}{{v}^{\top }v}={\lambda _{\max }}\big({P_{\operatorname{span}\langle X\rangle }}(I-{P_{V}}){P_{\operatorname{span}\langle X\rangle }}\big)\\{} & \hspace{1em}={\lambda _{\max }}\big({P_{\operatorname{span}\langle X\rangle }}(I-{P_{V}})(I-{P_{V}}){P_{\operatorname{span}\langle X\rangle }}\big)={\big\| {P_{\operatorname{span}\langle X\rangle }}(I-{P_{V}})\big\| }^{2}.\end{aligned}\]

On the other hand,

\[\begin{aligned}{}\underset{w\in \operatorname{span}\langle X\rangle \setminus \{0\}}{\max }\frac{{w}^{\top }(I-{P_{V}})w}{{w}^{\top }w}& =\underset{w\in \operatorname{span}\langle X\rangle \setminus \{0\}}{\max }\frac{{w}^{\top }{P_{\operatorname{span}\langle X\rangle }}(I-{P_{V}}){P_{\operatorname{span}\langle X\rangle }}w}{{w}^{\top }w}\\{} & \le \underset{v\in {\mathbb{R}}^{n}\setminus \{0\}}{\max }\frac{{v}^{\top }{P_{\operatorname{span}\langle X\rangle }}(I-{P_{V}}){P_{\operatorname{span}\langle X\rangle }}v}{{v}^{\top }v}.\end{aligned}\]

Thus,

\[ \max \frac{{X}^{\top }(I-{P_{V}})X}{{X}^{\top }X}={\big\| {P_{\operatorname{span}\langle X\rangle }}(I-{P_{V}})\big\| }^{2}.\]

If ${d_{1}}\le {d_{2}}$, then $\| {P_{\operatorname{span}\langle X\rangle }}(I-{P_{V}})\| =\| \sin \angle (X,V)\| $ due to (23). Otherwise, if ${d_{1}}>{d_{2}}$, then

\[ \dim \operatorname{span}\langle X\rangle +\dim {V}^{\perp }=\operatorname{rk}X+n-\dim V={d_{1}}+n-{d_{2}}>n.\]

Hence the subspaces $\operatorname{span}\langle X\rangle $ and ${V}^{\perp }$ have nontrivial intersection, i.e., there exists $w\ne 0$, $w\in \operatorname{span}\langle X\rangle \cap {V}^{\perp }$. Then ${P_{\operatorname{span}\langle X\rangle }}(I-{P_{V}})w=w$, whence $\| {P_{\operatorname{span}\langle X\rangle }}(I-{P_{V}})\| \ge 1$. On the other hand, $\| {P_{\operatorname{span}\langle X\rangle }}(I-{P_{V}})\| \le \| {P_{\operatorname{span}\langle X\rangle }}\| \times \| (I-{P_{V}})\| \le 1$. Thus, $\| {P_{\operatorname{span}\langle X\rangle }}(I-{P_{V}})\| =1$. This completes the proof. □

Proof of Lemma 6.6.

The matrix B is positive semidefinite, the matrix ${X_{0}^{\top }}B{X_{0}}$ is positive definite, and the matrix ${X_{0}}$ is of full rank d (hence, $n\ge d$). The matrix A satisfies inequality $A\ge {\lambda _{d+1}}(A)(I-{P_{\operatorname{span}\langle {X_{0}}\rangle }})$ in the Loewner order.

Let X be a point where the functional $f(x)$ defined in (26) attains its minimum. Since ${X_{0}^{\top }}B{X_{0}}$ is positive definite, $f({X_{0}})$ makes sense. Thus, $f(X)\le f({X_{0}})$,

\[ \max \frac{{X}^{\top }(A+\tilde{A})X}{{X}^{\top }BX}\le \max \frac{{X_{0}^{\top }}(A+\tilde{A}){X_{0}}}{{X_{0}^{\top }}B{X_{0}^{}}}.\]

Using the relations

\[\begin{aligned}{}{X}^{\top }\tilde{A}X& \ge -\| \tilde{A}\| \hspace{0.1667em}{X}^{\top }X,\hspace{2em}{X_{0}^{\top }}\tilde{A}{X_{0}}\le \| \tilde{A}\| \hspace{0.1667em}{X_{0}^{\top }}{X_{0}},\\{} {X}^{\top }BX& \le \| B\| \hspace{0.1667em}{X}^{\top }X,\hspace{2em}A{X_{0}}=0,\end{aligned}\]

we have

(64)

\[\begin{aligned}{}\max \frac{{X}^{\top }AX-\| \tilde{A}\| {X}^{\top }X}{\| B\| \hspace{0.1667em}{X}^{\top }X}& \le \max \frac{\| \tilde{A}\| \hspace{0.1667em}{X_{0}^{\top }}{X_{0}^{}}}{{X_{0}^{\top }}B{X_{0}^{}}},\\{} \frac{1}{\| B\| }\cdot \bigg(\max \frac{{X}^{\top }AX}{{X}^{\top }X}-\| \tilde{A}\| \bigg)& \le \| \tilde{A}\| \max \frac{{X_{0}^{\top }}{X_{0}^{}}}{{X_{0}^{\top }}B{X_{0}^{}}}.\end{aligned}\]

Since $A\ge {\lambda _{d+1}}(A)(I-{P_{\operatorname{span}\langle {X_{0}}\rangle }})$, by Lemma 8.2

\[ {\lambda _{d+1}}(A)\hspace{0.1667em}{\big\| \sin \angle (X,{X_{0}})\big\| }^{2}\le {\lambda _{d+1}}(A)\max \frac{{X}^{\top }(I-{P_{\operatorname{span}\langle {X_{0}}\rangle }})}{{X}^{\top }X}\le \max \frac{{X}^{\top }AX}{{X}^{\top }X}.\]

Then the desired inequality follows from (64):

\[ {\big\| \sin \angle (X,{X_{0}})\big\| }^{2}\le \frac{\| \tilde{A}\| }{{\lambda _{d+1}}(A)}\bigg(1+\| B\| \max \frac{{X_{0}^{\top }}{X_{0}^{}}}{{X_{0}^{\top }}B{X_{0}^{}}}\bigg).\]

□

8.6 Comparison of $\| \sin \angle ({\widehat{X}_{\mathrm{ext}}},{X_{\mathrm{ext}}^{0}})\| $ and $\| \widehat{X}-{X_{0}}\| $

In the next theorem and in its proof, matrices A, B and Σ have different meaning than elsewhere in the paper.

Theorem 8.3.

Let $(\begin{array}{c}A\\{} B\end{array})$ and $(\begin{array}{c}{X_{0}}\\{} -I\end{array})$ be full-rank $(n+d)\times d$ matrices. If

(65)

\[ \left\| \sin \angle \left(\left(\begin{array}{c}A\\{} B\end{array}\right),\hspace{0.2222em}\left(\begin{array}{c}{X_{0}}\\{} -I\end{array}\right)\right)\right\| <\frac{1}{\sqrt{1+\| {X_{0}}{\| }^{2}}},\]

then:

1) the matrix B is nonsingular;
2) $\| A{B}^{-1}+{X_{0}}\| \le \frac{(1+\| {X_{0}}{\| }^{2})\hspace{0.2222em}(\| {X_{0}}\| {s}^{2}+s\sqrt{1-{s}^{2}})}{1-(1+\| {X_{0}}{\| }^{2})\hspace{0.2222em}{s}^{2}}$ with $s=\| \sin \angle ((\begin{array}{c}A\\{} B\end{array}),\hspace{0.2222em}(\begin{array}{c}{X_{0}}\\{} -I\end{array}))\| $.

Proof.

1. Split the matrix ${P_{\left(\begin{array}{c}{X_{0}}\\{} -I\end{array}\right)}^{\perp }}$, which is an orthogonal projector along the column space of the matrix $(\begin{array}{c}{X_{0}}\\{} -I\end{array})$, into four blocks:

\[ I-{P_{\left(\begin{array}{c}{X_{0}}\\{} -I\end{array}\right)}}={P_{\left(\begin{array}{c}{X_{0}}\\{} -I\end{array}\right)}^{\perp }}=\left(\begin{array}{c@{\hskip10.0pt}c}{\mathbf{P}_{1}}& {\mathbf{P}_{2}}\\{} {\mathbf{P}_{2}^{\top }}& {\mathbf{P}_{4}}\end{array}\right).\]

Up to the end of the proof, ${\mathbf{P}_{1}}$ means the upper-left $n\times n$ block of the $(n+p)\times (n+p)$ matrix ${P_{\left(\begin{array}{c}{X_{0}}\\{} -I\end{array}\right)}^{\perp }}$. Prove that ${\lambda _{\min }}({\mathbf{P}_{1}})=\frac{1}{1+\| {X_{0}}{\| }^{2}}$.

Let ${X_{0}}=U\varSigma {V}^{\top }$ be a singular value decomposition of the matrix ${X_{0}}$ (here Σ is a diagonal $n\times d$ matrix, U and V are orthogonal matrices). Then

\[\begin{aligned}{}{P_{\left(\begin{array}{c}{X_{0}}\\{} -I\end{array}\right)}^{\perp }}& =I-\left(\begin{array}{c}{X_{0}}\\{} -I\end{array}\right){\left({\left(\begin{array}{c}{X_{0}}\\{} -I\end{array}\right)}^{\hspace{-0.1667em}\top }\left(\begin{array}{c}{X_{0}}\\{} -I\end{array}\right)\right)}^{\hspace{-0.1667em}-1}\left(\begin{array}{c}{X_{0}}\\{} -I\end{array}\right)\\{} & =\left(\begin{array}{c@{\hskip10.0pt}c}U(I-\varSigma {({\varSigma }^{\top }\varSigma +1)}^{-1}{\varSigma }^{\top }){U}^{\top }& U\varSigma {({\varSigma }^{\top }\varSigma +I)}^{-1}{V}^{\top }\\{} V{({\varSigma }^{\top }\varSigma +I)}^{-1}{\varSigma }^{\top }{U}^{\top }& V(I-{({\varSigma }^{\top }\varSigma +I)}^{-1}){V}^{\top }\end{array}\right).\end{aligned}\]

The $n\times n$ matrix $I-\varSigma {({\varSigma }^{\top }\varSigma +I)}^{-1}{\varSigma }^{\top }$ is diagonal; its diagonal entries are $\frac{1}{1+{\sigma _{i}^{2}}({X_{0}})}$, $i=1,\dots ,n$, where

${\sigma _{i}}({X_{0}})$ is the i-th singular value of ${X_{0}}$ if $1\le i\le \min (n,d)$,

${\sigma _{i}}({X_{0}})=0$ if $\min (n,d)<i\le n$.

Those diagonal entries comprise all the eigenvalues of ${\mathbf{P}_{1}}$;

\[ {\lambda _{\min }}({\mathbf{P}_{1}})=\frac{1}{1+{\sigma _{\mathrm{max}}^{2}}(\| {X_{0}}\| )}=\frac{1}{1+\| {X_{0}}{\| }^{2}}.\]

2. Due to equation (23), the square of the largest of sines of canonical eigenvalues between the subspaces ${V_{1}}$ and ${V_{2}}$ is equal to

\[ {\big\| \sin \angle ({V_{1}},{V_{2}})\big\| }^{2}=\underset{v\in {V_{1}}\setminus \{0\}}{\max }\frac{{v}^{\top }{P_{{V_{2}}}^{\perp }}v}{\| v{\| }^{2}}.\]

Hence for $v\in {V_{1}}$, $v\ne 0$,

(66)

\[ {\big\| \sin \angle ({V_{1}},{V_{2}})\big\| }^{2}\ge \frac{{v}^{\top }{P_{{V_{2}}}^{\perp }}v}{\| v{\| }^{2}}.\]

3. Prove the first statement of Theorem 8.3 by contradiction. Suppose that the matrix B is singular. Then there exist $f\in {\mathbb{R}}^{d}\setminus \{0\}$ and $u=Af\in {\mathbb{R}}^{n}$ such that $Bf=0$ and

\[ \left(\begin{array}{c}u\\{} {0_{d\times 1}}\end{array}\right)=\left(\begin{array}{c}Af\\{} Bf\end{array}\right)\in {V_{1}},\]

where ${V_{1}}\subset {\mathbb{R}}^{n+d}$ is the column space of the matrix $(\begin{array}{c}A\\{} B\end{array})$. Asthe columns of the matrix $(\begin{array}{c}A\\{} B\end{array})$ are linearly independent, $(\begin{array}{c}u\\{} 0\end{array})\ne 0$. Then, by (66),

\[\begin{aligned}{}{\left\| \sin \angle \left(\left(\begin{array}{c}A\\{} B\end{array}\right),\hspace{0.2222em}\left(\begin{array}{c}{X_{0}}\\{} -I\end{array}\right)\right)\right\| }^{2}& \ge \frac{{\left(\begin{array}{c}u\\{} 0\end{array}\right)}^{\top }{P_{\left(\begin{array}{c}{X_{0}}\\{} -I\end{array}\right)}^{\perp }}\left(\begin{array}{c}u\\{} 0\end{array}\right)}{\| (\begin{array}{c}u\\{} 0\end{array}){\| }^{2}}=\frac{{u}^{\top }{\mathbf{P}_{1}}u}{\| u{\| }^{2}}\ge \\{} & \ge {\lambda _{\min }}({\mathbf{P}_{1}})=\frac{1}{1+\| {X_{0}}{\| }^{2}},\end{aligned}\]

which contradicts condition (65).

4. Prove inequality (67). (Later on we will show that the second statement of Theorem 8.3 follows from (67)). There exists such a vector $f\in {\mathbb{R}}^{d}\setminus \{0\}$ that $\| (A{B}^{-1}+{X_{0}})\hspace{0.2222em}f\| =\| A{B}^{-1}+{X_{0}}\| \hspace{0.2222em}\| f\| $. Denote

\[\begin{aligned}{}u& =\big(A{B}^{-1}+{X_{0}}\big)f,\\{} z& =\left(\begin{array}{c}A\\{} B\end{array}\right){B}^{-1}f=\left(\begin{array}{c}A{B}^{-1}f\\{} f\end{array}\right)=\left(\begin{array}{c}u\\{} 0\end{array}\right)-\left(\begin{array}{c}{X_{0}}\\{} -I\end{array}\right)f\in {V_{1}}.\end{aligned}\]

Since $({X_{0}^{\top }},-I){P_{\left(\begin{array}{c}{X_{0}}\\{} -I\end{array}\right)}^{\perp }}=0$ and ${P_{\left(\begin{array}{c}{X_{0}}\\{} -I\end{array}\right)}^{\perp }}(\begin{array}{c}{X_{0}}\\{} -I\end{array})=0$,

\[\begin{aligned}{}{z}^{\top }{P_{\left(\begin{array}{c}{X_{0}}\\{} -I\end{array}\right)}^{\perp }}z& ={\left(\left(\begin{array}{c}u\\{} 0\end{array}\right)-\left(\begin{array}{c}{X_{0}}\\{} -I\end{array}\right)f\right)}^{\hspace{-0.1667em}\top }{P_{\left(\begin{array}{c}{X_{0}}\\{} -I\end{array}\right)}^{\perp }}\left(\left(\begin{array}{c}u\\{} 0\end{array}\right)-\left(\begin{array}{c}{X_{0}}\\{} -I\end{array}\right)f\right)\\{} & ={\left(\begin{array}{c}u\\{} 0\end{array}\right)}^{\hspace{-0.1667em}\top }{P_{\left(\begin{array}{c}{X_{0}}\\{} -I\end{array}\right)}^{\perp }}\left(\begin{array}{c}u\\{} 0\end{array}\right)={u}^{\top }{\mathbf{P}_{1}}u\\{} & \ge \| u{\| }^{2}{\lambda _{\min }}({\mathbf{P}_{1}})=\frac{\| A{B}^{-1}+{X_{0}}{\| }^{2}\hspace{0.2222em}\| f{\| }^{2}}{1+\| {X_{0}}{\| }^{2}}.\end{aligned}\]

Notice that $z\ne 0$ because ${B}^{-1}f\ne 0$ and the columns of the matrix $(\begin{array}{c}A\\{} B\end{array})$ are linearly independent. Thus,

\[ 0<\| z{\| }^{2}={\big\| A{B}^{-1}f\big\| }^{2}+\big\| {f}^{2}\big\| \le \big(1+{\big\| A{B}^{-1}\big\| }^{2}\big)\hspace{0.1667em}\| f{\| }^{2}.\]

By (66),

(67)

\[\begin{aligned}{}{\left\| \sin \angle \left(\left(\begin{array}{c}A\\{} B\end{array}\right),\hspace{0.1667em}\left(\begin{array}{c}{X_{0}}\\{} -I\end{array}\right)\right)\right\| }^{2}& \ge \frac{{z}^{\top }{P_{(\begin{array}{c}{X_{0}}\\{} -I\end{array})}^{\perp }}z}{\| z{\| }^{2}}\ge \frac{\| A{B}^{-1}+{X_{0}}{\| }^{2}}{(1+\| {X_{0}}{\| }^{2})\hspace{0.1667em}(1+\| A{B}^{-1}{\| }^{2})},\\{} \left\| \sin \angle \left(\left(\begin{array}{c}A\\{} B\end{array}\right),\hspace{0.1667em}\left(\begin{array}{c}{X_{0}}\\{} -I\end{array}\right)\right)\right\| & \ge \frac{\| A{B}^{-1}+{X_{0}}\| }{\sqrt{1+\| {X_{0}}{\| }^{2}}\hspace{0.1667em}\sqrt{1+{(\| {X_{0}}\| +\| A{B}^{-1}+{X_{0}}\| )}^{2}}}.\end{aligned}\]

5. Prove that the second statement of Theorem 8.3 follows from (67). The function

(68)

\[ s(\delta ):=\frac{\delta }{\sqrt{1+\| {X_{0}}{\| }^{2}}\hspace{0.1667em}\sqrt{1+{(\| {X_{0}}\| +\delta )}^{2}}}\]

is strictly increasing on $[0,+\infty )$, with $s(0)=0$ and ${\lim _{\delta \to +\infty }}s(\delta )=\frac{1}{\sqrt{1+\| {X_{0}}{\| }^{2}}}$. Therefore, inequality (67) implies the implication:

\[\begin{aligned}{}\text{if}\hspace{2.5pt}\big\| A{B}^{-1}+{X_{0}}\big\| & >\delta ,\\{} \text{then}\hspace{2.5pt}\left\| \sin \angle \left(\left(\begin{array}{c}A\\{} B\end{array}\right),\hspace{0.1667em}\left(\begin{array}{c}{X_{0}}\\{} -I\end{array}\right)\right)\right\| & >\frac{\delta }{\sqrt{1+\| {X_{0}}{\| }^{2}}\hspace{0.1667em}\sqrt{1+{(\| {X_{0}}\| +\delta )}^{2}}}.\end{aligned}\]

The equivalent contrapositive implication is as follows:

(69)

\[\begin{aligned}{}\text{if}\hspace{2.5pt}\left\| \sin \angle \left(\left(\begin{array}{c}A\\{} B\end{array}\right),\hspace{0.1667em}\left(\begin{array}{c}{X_{0}}\\{} -I\end{array}\right)\right)\right\| & \le \frac{\delta }{\sqrt{1+\| {X_{0}}{\| }^{2}}\hspace{0.1667em}\sqrt{1+{(\| {X_{0}}\| +\delta )}^{2}}},\\{} \text{then}\hspace{2.5pt}\big\| A{B}^{-1}+{X_{0}}\big\| & \le \delta .\end{aligned}\]

The inverse function to $s(\delta )$ in (68) is

\[ \delta (s):=\frac{(1+\| {X_{0}}{\| }^{2})\hspace{0.2222em}({s}^{2}\hspace{0.1667em}\| {X_{0}}\| +s\sqrt{1-{s}^{2}})}{1-(1+\| {X_{0}}{\| }^{2}){s}^{2}}.\]

Substitute $\delta =\delta (\| \sin \angle ((\begin{array}{c}A\\{} B\end{array}\phantom{smallxxmat}),(\begin{array}{c}{X_{0}}\\{} -I\end{array}))\| )$ into (69) and obtain the following statement:

\[\begin{aligned}{}\text{if}\hspace{2.5pt}\left\| \sin \angle \left(\left(\begin{array}{c}A\\{} B\end{array}\right)\hspace{0.1667em}\left(\begin{array}{c}{X_{0}}\\{} -I\end{array}\right)\right)\right\| & \le \left\| \sin \angle \left(\left(\begin{array}{c}A\\{} B\end{array}\right),\left(\begin{array}{c}{X_{0}}\\{} -I\end{array}\right)\right)\right\| ,\\{} \text{then}\hspace{2.5pt}\big\| A{B}^{-1}+{X_{0}}\big\| & \le \delta \big(\big\| \sin \angle \big((\begin{array}{c}{A_{}}\\{} B\end{array}\phantom{smallxxmat}),(\begin{array}{c}{X_{0}}\\{} -I\end{array})\big)\big\| \big),\end{aligned}\]

whence the second statement of Theorem 8.3 follows.

In part 5 of the proof, condition (65) is used twice. First, it is one of conditions of the first statement of the theorem: without it, the matrix B might be singular. Second, the function $\delta (s)$ is defined on interval $[0,\frac{1}{\sqrt{1+\| {X_{0}}{\| }^{2}}})$. □

Corollary.

Let $(\begin{array}{c}{X_{0}}\\{} -I\end{array})$ be an $(n+d)\times d$ matrix, and let $\{(\begin{array}{c}{A_{m}}\\{} {B_{m}}\end{array}),\hspace{10.0pt}\phantom{\begin{array}{c}{A_{m}}\\{} {B_{m}}\end{array}}m=1,2,\dots \}$ be a sequence of $(n+d)\times d$ matrices of rank d. If $\| \sin \angle ((\begin{array}{c}{A_{m}}\\{} {B_{m}}\end{array}),\hspace{0.1667em}(\begin{array}{c}{X_{0}}\\{} -I\end{array}))\| \to 0$ as $m\to \infty $, then:

1) the matrix ${B_{m}}$ is nonsingular for m large enough,
2) $-{A_{m}^{}}{B_{m}^{-1}}\to {X_{0}}$ as $m\to \infty $.

8.7 Generalized eigenvalue problem for positive semidefinite matrices: proofs

Proof of Lemma 7.1.

For fixed i, split the matrix T in two blocks. Let $T=[{T_{i1}},{T_{i2}}]$, where ${T_{i1}}$ is the matrix constructed of the first i columns of T, and ${T_{i2}}$ is the matrix constructed of the last $n-i+1$ columns of T. Denote ${V_{1}}$ and ${V_{2}}$ the column spaces of the matrices ${T_{i1}}$ and ${T_{i2}}$, respectively. Then $\dim {V_{1}}=i$ and $\dim {V_{2}}=n-i+1$.

1. The proof of the fact that ${\nu _{i}}\in \{\lambda \ge 0|\textit{``}\exists V,\hspace{2.5pt}\dim V=i:(A-\lambda B){|_{V}}\le 0\textit{''}\}$ if ${\nu _{i}}<\infty $. In other words, if ${\nu _{i}}<\infty $, then relations

(70)

\[ \lambda \ge 0,\hspace{2em}\dim (V)=i,\hspace{2em}(A-\lambda B){|_{V}}\le 0\]

hold true for $\lambda ={\nu _{i}}$ and $V={V_{1}}$.

If $v\in {V_{1}}$, then $v={T_{i1}}x$ for some $x\in {\mathbb{R}}^{i}$. Hence

\[\begin{aligned}{}{v}^{\top }(A-{\nu _{i}}B)v& ={x}^{\top }{T_{i1}^{\top }}(A-{\nu _{i}}B){T_{i1}^{}}x\\{} & ={x}^{\top }\operatorname{diag}({\lambda _{1}}-{\nu _{i}}{\mu _{1}},\hspace{0.1667em}\dots ,\hspace{0.1667em}{\lambda _{i}}-{\nu _{i}}{\mu _{1}})x={\sum \limits_{j=1}^{i}}{x_{j}^{2}}({\lambda _{j}}-{\nu _{i}}{\mu _{j}}).\end{aligned}\]

The inequality ${\lambda _{j}}-{\nu _{i}}{\mu _{j}}\le 0$ holds true for all j such that either ${\lambda _{j}}={\mu _{j}}=0$ or ${\lambda _{j}}/{\mu _{j}}\le {\nu _{i}}$; particularly, it holds true for $j=1,\dots ,i$. Hence ${v}^{\top }(A-{\nu _{i}}B)v\le 0$.

2. The proof of the fact that ${\nu _{i}}$ is a lower bound of the set $\{\lambda \ge 0|\textit{``}\exists V,\hspace{2.5pt}\dim V=i:(A-\lambda B){|_{V}}\le 0\textit{''}\}$. In other words, if there exists a subspace $V\subset {\mathbb{R}}^{n}$ such that the relations (70) hold true, then ${\nu _{i}}\le \lambda $.

By contradiction, suppose that $\dim V=i$, $(A-\lambda B){|_{V}}\le 0$, ${\nu _{i}}>\lambda \ge 0$. Then ${\nu _{i}}>0$.

Now prove that $(A-\lambda B){|_{{V_{2}}}}>0$. If $v\in {V_{2}}\setminus \{0\}$, then $v={T_{i2}}x$ for some $x\in {\mathbb{R}}^{n-i+1}\setminus \{0\}$. Then

\[ {v}^{\top }(A-\lambda B)v={\sum \limits_{j=i}^{n}}{x_{j+1-i}^{2}}({\lambda _{j}}-\lambda {\mu _{j}}).\]

For $j\ge i$, due to the inequality ${\nu _{j}}\ge {\nu _{i}}>0$ and the conditions of the lemma, the case ${\lambda _{j}}=0$ is impossible; thus ${\lambda _{j}}>0$. Prove the inequality ${\lambda _{j}}-\lambda {\mu _{j}}>0$. If ${\mu _{j}}>0$, then ${\lambda _{j}}-\lambda {\mu _{j}}=({\nu _{j}}-\lambda ){\mu _{j}}$. Since ${\nu _{j}}\ge {\nu _{i}}>\lambda $, the first factor ${\nu _{i}}-\lambda $ is a positive number. Hence, ${\lambda _{j}}-\lambda {\mu _{j}}>0$. Otherwise, if ${\mu _{j}}=0$, then ${\lambda _{j}}-\lambda {\mu _{j}}={\lambda _{j}}>0$. Thus the inequality ${\lambda _{j}}-\lambda {\mu _{j}}>0$ holds true in both cases. Hence ${v}^{\top }(A-\lambda B)v>0$. Since this holds for all $v\in {V_{2}}\setminus \{0\}$, the restriction of the quadratic form $A-\lambda B$ onto the linear subspace ${V_{2}}$ is positive definite.

On the one hand, since $(A-\lambda B){|_{V}}\le 0$ and $(A-\lambda B){|_{{V_{2}}}}>0$, the subspaces V and ${V_{2}}$ have a trivial intersection. On the other hand, since $\dim V+\dim {V_{2}}=n+1>n$, the subspaces V and ${V_{2}}$ cannot have a trivial intersection. We got a contradiction.

Hence ${\nu _{i}}\le \lambda $, and ${\nu _{i}}$ is a lower bound of $\{\lambda \ge 0|\text{``}\exists V,\hspace{2.5pt}\dim V=i:(A-\lambda B){|_{V}}\le 0\text{''}\}$. That completes the proof of Lemma 7.1. □

Remember that ${M}^{\dagger }$ is the Moore–Penrose pseudoinverse matrix to M; $\operatorname{span}\langle M\rangle $ is the column span of the matrix M. If matrices M and N are compatible for multiplication, then $\operatorname{span}\langle MN\rangle \subset \operatorname{span}\langle M\rangle $. (Furthermore, $\operatorname{span}\langle {M_{1}}\rangle \subset \operatorname{span}\langle {M_{2}}\rangle $ if and only if ${M_{1}}={M_{2}}N$ for some matrix N). Hence, $\operatorname{span}\langle M{M}^{\top }\rangle =\operatorname{span}\langle M\rangle $ (to prove it, we can use the identity $M=M{M}^{\top }{({M}^{\top })}^{\dagger }$).

Since the $n\times n$ covariance matrix Σ is positive semidefinite, for every $k\times n$ matrix M the equality $\operatorname{span}\langle M\varSigma {M}^{\top }\rangle =\operatorname{span}\langle M\varSigma \rangle $ holds true. This can be proved with use of the matrix square root.

If what follows, for a fixed $(n+d)\times d$ matrix X denote

\[ {\Delta _{\mathrm{pm}}}=CX{\big({X}^{\top }\varSigma X\big)}^{\dagger }{X}^{\top }\varSigma ,\]

where C is an $m\times (n+d)$ matrix, Σ is an $n\times n$ positive semidefinite matrix.

Proof of Proposition 7.2.

1, necessity. Relation (30) is a necessary condition for compatibility of the constraints in (28). Let $\Delta \hspace{0.1667em}(I-{P_{\varSigma }})=0$ and $(C-\Delta )X=0$ for some $m\times (n+d)$ matrix Δ. Due to $\Delta \hspace{0.1667em}(I-{P_{\varSigma }})=0$, $\Delta =M\varSigma $ for some matrix M. Then $CX=\Delta X=M\varSigma X$, ${X}^{\top }{C}^{\top }={X}^{\top }\varSigma {M}^{\top }$, whence $\operatorname{span}({X}^{\top }{C}^{\top })\subset \operatorname{span}({X}^{\top }\varSigma )$.

1, sufficiency. Relation (30) is a sufficient condition for compatibility of the constraints in (28). Let $\operatorname{span}({X}^{\top }{C}^{\top })\subset \operatorname{span}({X}^{\top }\varSigma )$. Then ${X}^{\top }{C}^{\top }={X}^{\top }\varSigma M$ for some matrix M. The constraints $\Delta \hspace{0.1667em}(I-{P_{\varSigma }})=0$, $(C-\Delta )X=0$ are satisfied for $\Delta ={M}^{\top }\varSigma $, so they are compatible.

2a, eqns. (31). If the constraints are compatible, they are satisfied for $\Delta ={\Delta _{\mathrm{pm}}}$. Indeed,

\[ {\Delta _{\mathrm{pm}}}\hspace{0.1667em}(I-{P_{\varSigma }})=CX{\big({X}^{\top }\varSigma X\big)}^{\dagger }{X}^{\top }\varSigma \hspace{0.1667em}(I-{P_{\varSigma }})=0,\]

since $\varSigma \hspace{0.1667em}(I-{P_{\varSigma }})=0$. If the constraints are compatible, then

\[ \operatorname{span}\big({X}^{\top }\varSigma X\big)=\operatorname{span}\big({X}^{\top }\varSigma \big)\subset \operatorname{span}\big({X}^{\top }{C}^{\top }\big),\]

whence

\[\begin{aligned}{}{X}^{\top }\varSigma X{\big({X}^{\top }\varSigma X\big)}^{\dagger }{X}^{\top }{C}^{\top }& ={P_{{X}^{\top }\varSigma X}}{X}^{\top }{C}^{\top }={X}^{\top }{C}^{\top },\\{} {\Delta _{\mathrm{pm}}}X& =CX{\big({X}^{\top }\varSigma X\big)}^{\dagger }{X}^{\top }\varSigma X=CX,\\{} (C-{\Delta _{\mathrm{pm}}})X& =0.\end{aligned}\]

2a, eqn. (32) and 2b. If the constraints are compatible, then the constrained least element of $\Delta {\varSigma }^{\dagger }{\Delta }^{\top }$ is attained for $\Delta ={\Delta _{\mathrm{pm}}}$. The least element is equal to $CX{({X}^{\top }\varSigma X)}^{\dagger }{X}^{\top }{C}^{\top }$. Let Δ satisfy the constraints, which imply $\Delta {P_{\varSigma }}=\Delta $ and $\Delta X=CX$. Expand the product

(71)

\[ (\Delta -{\Delta _{\mathrm{pm}}}){\varSigma }^{\dagger }{(\Delta -{\Delta _{\mathrm{pm}}})}^{\top }=\Delta {\varSigma }^{\dagger }{\Delta }^{\top }-{\Delta _{\mathrm{pm}}}{\varSigma }^{\dagger }{\Delta }^{\top }-\Delta {\varSigma }^{\dagger }{\Delta _{\mathrm{pm}}^{\top }}+{\Delta _{\mathrm{pm}}}{\varSigma }^{\dagger }{\Delta _{\mathrm{pm}}^{\top }}.\]

Simplify the expressions for three (of four) summands:

\[\begin{aligned}{}\Delta {\varSigma }^{\dagger }{\Delta _{\mathrm{pm}}^{\top }}& =\Delta {\varSigma }^{\dagger }\varSigma X{\big({X}^{\top }\varSigma X\big)}^{\dagger }{X}^{\top }{C}^{\top }\\{} & =\Delta {P_{\varSigma }}X{\big({X}^{\top }\varSigma X\big)}^{\dagger }{X}^{\top }{C}^{\top }\\{} & =\Delta X{\big({X}^{\top }\varSigma X\big)}^{\dagger }{X}^{\top }{C}^{\top }=CX{\big({X}^{\top }\varSigma X\big)}^{\dagger }{X}^{\top }{C}^{\top }.\end{aligned}\]

Applying matrix transposition to both sides of the last chain of equalities, we get

\[ {\Delta _{\mathrm{pm}}}{\varSigma }^{\dagger }{\Delta }^{\top }=CX{\big({X}^{\top }\varSigma X\big)}^{\dagger }{X}^{\top }{C}^{\top }.\]

For the last summand,

\[\begin{aligned}{}{\Delta _{\mathrm{pm}}}{\varSigma }^{\dagger }{\Delta _{\mathrm{pm}}^{\top }}& =CX{\big({X}^{\top }\varSigma X\big)}^{\dagger }{X}^{\top }\varSigma {\varSigma }^{\dagger }\varSigma X{\big({X}^{\top }\varSigma X\big)}^{\dagger }{X}^{\top }{C}^{\top }\\{} & =CX{\big({X}^{\top }\varSigma X\big)}^{\dagger }{X}^{\top }\varSigma X{\big({X}^{\top }\varSigma X\big)}^{\dagger }{X}^{\top }{C}^{\top }\\{} & =CX{\big({X}^{\top }\varSigma X\big)}^{\dagger }{X}^{\top }{C}^{\top }.\end{aligned}\]

Thus, (71) implies that

(72)

\[ \Delta {\varSigma }^{\dagger }{\Delta }^{\top }=(\Delta -{\Delta _{\mathrm{pm}}}){\varSigma }^{\dagger }{(\Delta -{\Delta _{\mathrm{pm}}})}^{\top }+CX{\big({X}^{\top }\varSigma X\big)}^{\dagger }{X}^{\top }{C}^{\top }.\]

Hence

\[ \Delta {\varSigma }^{\dagger }{\Delta }^{\top }\ge CX{\big({X}^{\top }\varSigma X\big)}^{\dagger }{X}^{\top }{C}^{\top },\]

and statement 2b of the theorem is proved. For $\Delta ={\Delta _{\mathrm{pm}}}$, equality is attained, which coincides with (32).

Remark 7.2-1. The least point is attained for a unique Δ. It is enough to show that if Δ satisfies the constraints and $\Delta {\varSigma }^{\dagger }{\Delta }^{\top }=CX{({X}^{\top }\varSigma X)}^{\dagger }{X}^{\top }{C}^{\top }$, then $\Delta ={\Delta _{\mathrm{pm}}}$.

Indeed, if Δ satisfies the constraints $\Delta \hspace{0.1667em}(I-{P_{\varSigma }})=0$ and $(C-\Delta )X=0$, and $\Delta {\varSigma }^{\dagger }{\Delta }^{\top }=CX{({X}^{\top }\varSigma X)}^{\dagger }{X}^{\top }{C}^{\top }$, then due to (72)

\[ (\Delta -{\Delta _{\mathrm{pm}}}){\varSigma }^{\dagger }{(\Delta -{\Delta _{\mathrm{pm}}})}^{\top }=0.\]

As ${\varSigma }^{\dagger }$ is a positive semidefinite matrix, $(\Delta -{\Delta _{\mathrm{pm}}}){\varSigma }^{\dagger }=0$ and $(\Delta -{\Delta _{\mathrm{pm}}}){P_{\varSigma }}=(\Delta -{\Delta _{\mathrm{pm}}}){\varSigma }^{\dagger }\varSigma =0$. Add the equality $\Delta \hspace{0.1667em}(I-{P_{\varSigma }})=0$ (which is one of the constraints) and subtract the equality ${\Delta _{\mathrm{pm}}}\hspace{0.1667em}(I-{P_{\varSigma }})=0$ (which is one of equalities (31) and holds true due part 2a of the theorem). Obtain

\[ \Delta -{\Delta _{\mathrm{pm}}}=(\Delta -{\Delta _{\mathrm{pm}}}){P_{\varSigma }}+\Delta \hspace{0.1667em}(I-{P_{\varSigma }})-{\Delta _{\mathrm{pm}}}\hspace{0.1667em}(I-{P_{\varSigma }})=0,\]

whence $\Delta ={\Delta _{\mathrm{pm}}}$. □

Proof of Proposition 7.3.

1. Necessity. Since the matrices ${C}^{\top }C$ and Σ are positive semidefinite, the matrix pencil $\langle {C}^{\top }C,\varSigma \rangle $ is definite if and only if the matrix ${C}^{\top }C+\varSigma $ is positive semidefinite. Thus, if the matrix pencil $\langle {C}^{\top }C,\varSigma \rangle $ is definite, then the matrix ${C}^{\top }C+\varSigma $ is positive definite. As the columns of the matrix X are linearly independent, the matrix $X({C}^{\top }C+\varSigma ){X}^{\top }={X}^{\top }{C}^{\top }CX+{X}^{\top }\varSigma X$ is positive definite as well, whence $\operatorname{span}({X}^{\top }{C}^{\top }CX+{X}^{\top }\varSigma X)={\mathbb{R}}^{n}$.

If the constraints are compatible, then the condition (30) holds true, whence

\[\begin{aligned}{}{\mathbb{R}}^{n}& =\operatorname{span}\big\langle {X}^{\top }{C}^{\top }CX+{X}^{\top }\varSigma X\big\rangle \\{} & \subset \operatorname{span}\big\langle {X}^{\top }{C}^{\top }CX\big\rangle +\operatorname{span}\big\langle {X}^{\top }\varSigma X\big\rangle \\{} & =\operatorname{span}\big\langle {X}^{\top }{C}^{\top }\big\rangle +\operatorname{span}\big\langle {X}^{\top }\varSigma \big\rangle \\{} & =\operatorname{span}\big\langle {X}^{\top }\varSigma \big\rangle =\operatorname{span}\big\langle {X}^{\top }\varSigma X\big\rangle .\end{aligned}\]

Since $\operatorname{span}\langle {X}^{\top }\varSigma X\rangle ={\mathbb{R}}^{n}$, the matrix ${X}^{\top }\varSigma X$ is nonsingular.

2. Sufficiency. If the matrix ${X}^{\top }\varSigma X$ is nonsingular, then

\[ \operatorname{span}\big\langle {X}^{\top }\varSigma \big\rangle =\operatorname{span}\big\langle {X}^{\top }\varSigma X\big\rangle ={\mathbb{R}}^{n}\supset \operatorname{span}\big\langle {X}^{\top }{C}^{\top }\big\rangle .\]

Thus the condition (30), which is the necessary and sufficient condition for compatibility of the constraints, holds true. □

Proof of Proposition 7.4.

Construct simultaneous diagonalization of matrices $XC{C}^{\top }{X}^{\top }$ and $X\varSigma {X}^{\top }$ (according to Theorem 6.2) that satisfies Remark 6.2-2:

\[ {X}^{\top }{C}^{\top }CX={\big({T}^{-1}\big)}^{\top }\varLambda {T}^{-1},\hspace{2em}{X}^{\top }\varSigma X={\big({T}^{-1}\big)}^{\top }\mathrm{M}{T}^{-1}.\]

Notations Λ, M, $T=\left[\begin{array}{c@{\hskip10.0pt}c}{T_{1}}& {T_{2}}\end{array}\right]$, ${\mu _{i}}$, ${\lambda _{i}}$, ${\nu _{i}}$ are taken from Theorem 6.2, Remark 7.2-1, and Lemma 7.1.

The subspace

\[ \operatorname{span}\big\langle {X}^{\top }{C}^{\top }\big\rangle =\operatorname{span}\big\langle {X}^{\top }{C}^{\top }CX\big\rangle =\operatorname{span}\big\langle {\big({T}^{-1}\big)}^{\top }\varLambda {T}^{-1}\big\rangle =\operatorname{span}\big\langle {\big({T}^{-1}\big)}^{\top }\varLambda \big\rangle \]

is spanned by columns of the matrix ${({T}^{-1})}^{\top }$ that correspond to nonzero ${\lambda _{i}}$’s. Similarly, the subspace $\operatorname{span}\langle {X}^{\top }\varSigma \rangle =\operatorname{span}\langle {({T}^{-1})}^{\top }\text{M}\rangle $ is spanned by columns of the matrix ${({T}^{-1})}^{\top }$ that correspond to non-zero ${\mu _{i}}$’s. Note that the columns of the matrix ${({T}^{-1})}^{\top }$ are linearly independent. The condition $\operatorname{span}\langle {X}^{\top }{C}^{\top }\rangle \subset \operatorname{span}\langle {X}^{\top }\varSigma \rangle $ is satisfied if and only if ${\lambda _{i}}\ne 0$ for all i such that ${\mu _{i}}\ne 0$ (that is ${\nu _{i}}<\infty $, $i=1,\dots ,d$, where notation ${\nu _{i}}={\lambda _{i}}/{\nu _{i}}$ comes from Theorem 6.2). Thus, due to Proposition 6.3,

\[ {\big({X}^{\top }\varSigma X\big)}^{\dagger }={T}^{}{\text{M}}^{\dagger }{T}^{\top }.\]

Construct the chain of equalities:

\[\begin{aligned}{}& \underset{\begin{array}{c}\Delta \hspace{0.1667em}(I-{P_{\varSigma }})=0\\{} (C-\Delta )X=0\end{array}}{\min }{\lambda _{k+m-d}}\big(\Delta {\varSigma }^{\dagger }{\Delta }^{\top }\big)\\{} & \hspace{1em}\stackrel{(\mathrm{a})}{=}{\lambda _{k+m-d}}\big(CX{\big({X}^{\top }\varSigma X\big)}^{\dagger }{X}^{\top }{C}^{\top }\big)={\lambda _{k+m-d}}\big(CX\hspace{0.1667em}{T}^{}{\mathrm{M}}^{\dagger }{T}^{\top }\hspace{0.1667em}{X}^{\top }{C}^{\top }\big)\\{} & \hspace{1em}\stackrel{(\mathrm{b})}{=}{\lambda _{k}}\big({\text{M}}^{\dagger }{T}^{\top }{X}^{\top }{C}^{\top }CX{T}^{}\big)={\lambda _{k}}\big({\text{M}}^{\dagger }\varLambda \big)={\nu _{k}}\\{} & \hspace{1em}\stackrel{(\mathrm{c})}{=}\min \big\{\lambda \ge 0:\text{``}\exists {V_{1}}\subset {\mathbb{R}}^{d},\hspace{0.2778em}\dim {V_{1}}=k:\big({X}^{\top }{C}^{\top }CX-\lambda {X}^{\top }\varSigma X\big){|_{{V_{1}}}}\le 0\text{''}\big\}\\{} & \hspace{1em}\stackrel{(\mathrm{d})}{=}\min \big\{\lambda \ge 0:\text{``}\exists V\subset \operatorname{span}\langle X\rangle ,\hspace{0.2778em}\dim V=k:\big({C}^{\top }C-\lambda \varSigma \big){|_{V}}\le 0\text{''}\big\}.\end{aligned}\]

Equality (a) follows from 7.2 because the matrix $CX{({X}^{\top }\varSigma X)}^{\dagger }{X}^{\top }{C}^{\top }$ is the least value of the expression $\Delta {\varSigma }^{\dagger }{\Delta }^{\top }$ with constraints $(I-{P_{\varSigma }}){\Delta }^{\top }=0$ and $(C-\Delta )X=0$.

Equality (b) follows from the relation between characteristic polynomials of two products of two rectangular matrices:

\[ {\chi _{CXT\hspace{0.1667em}{\mathrm{M}}^{\dagger }{T}^{\top }{X}^{\top }{C}^{\top }}}(\lambda )={(-\lambda )}^{m-d}{\chi _{{\mathrm{M}}^{\dagger }{T}^{\top }{X}^{\top }{C}^{\top }\hspace{0.1667em}CXT}}(\lambda )\]

because $CXT$ is an $m\times d$ matrix and ${\mathrm{M}}^{\dagger }{T}^{\top }{X}^{\top }{C}^{\top }$ is a $d\times m$ matrix. Thus, the matrix $CXT\hspace{0.1667em}{\mathrm{M}}^{\dagger }{T}^{\top }{X}^{\top }{C}^{\top }$ has all the eigenvalues of the matrix ${\mathrm{M}}^{\dagger }{T}^{\top }{X}^{\top }{C}^{\top }\times CXT={\mathrm{M}}^{\dagger }\varLambda $ and, besides them, the eigenvalue 0 of multiplicity $m-d$. All these eigenvalues are nonnegative.

Equality (c) holds true due to Lemma 7.1.

Since the columns of the matrix X are linearly independent, there is a one-to-one correspondence between subspaces of $\operatorname{span}\langle X\rangle $ and of ${\mathbb{R}}^{d}$: if V is a subspace of $\operatorname{span}\langle X\rangle $, then there exists a unique subspace ${V_{1}}\subset {\mathbb{R}}^{d}$, and for those V and ${V_{1}}$,

• $\dim V=\dim {V_{1}}$;
• the restriction of the quadratic form ${C}^{\top }C-\lambda \varSigma $ to the subspace V is negative semidefinite if and only if the restriction of the quadratic form ${X}^{\top }{C}^{\top }CX-\lambda {X}^{\top }\varSigma X$ to the subspace ${V_{1}}$ is negative semidefinite.

Hence, equality (d) holds true.

Equation (34) is proved. As to Remark 7.4-1, the minimum in the left-hand side of (34) is attained for $\Delta ={\Delta _{\mathrm{pm}}}$. The minimum in the right-hand side of (34) is attained if the subspace V is a linear span of k columns of the matrix $XT$ that correspond to the k least ${\nu _{i}}$’s. □

Proof of Proposition 7.5.

By Lemma 7.1 and Proposition 7.4, the inequality (37) is equivalent to the obvious inequality

\[\begin{aligned}{}& \min \big\{\lambda \ge 0:\text{``}\exists V\subset \operatorname{span}\langle X\rangle ,\hspace{0.2778em}\dim V=k:\big({C}^{\top }C-\lambda \varSigma \big){|_{V}}\le 0\text{''}\big\}\\{} & \hspace{1em}\ge \min \big\{\lambda \ge 0|\text{``}\exists V,\hspace{2.5pt}\dim V=k:(A-\lambda B){|_{V}}\le 0\text{''}\big\}.\end{aligned}\]

From the proof it follows that if ${\nu _{d}}=\infty $, then for any $(n+d)\times d$ matrix X of rank d the constraints in (28) are not compatible.

Now prove that if ${\nu _{d}}<\infty $ and $X=[{u_{1}},{u_{2}},\dots ,{u_{d}}]$, then the inequality in Proposition 7.5 becomes an equality. Indeed, then the constraints in (28) are compatible because they are satisfied for $\Delta =CTD{T}^{-1}$, where

\[\begin{aligned}{}D& =\operatorname{diag}({d_{1}},{d_{2}},\dots ,{d_{d+n}}),\\{} {d_{k}}& =\left\{\begin{array}{l@{\hskip10.0pt}l}1\hspace{1em}& \text{if}\hspace{2.5pt}{\mu _{k}}>0\hspace{2.5pt}\text{and}\hspace{2.5pt}k\le d,\\{} 0\hspace{1em}& \text{if}\hspace{2.5pt}{\mu _{k}}=0\hspace{2.5pt}\text{or}\hspace{2.5pt}k>d.\end{array}\right.\end{aligned}\]

By Proposition 7.2

\[\begin{aligned}{}\underset{\begin{array}{c}\Delta (I-{P_{\varSigma }})=0\\{} (C-\Delta )X=0\end{array}}{\min }{\lambda _{k+m-d}}\big(\Delta {\varSigma }^{\dagger }{\Delta }^{\top }\big)& ={\lambda _{k+m-d}}\big(CX{\big({X}^{\top }\varSigma X\big)}^{\dagger }{X}^{\top }{C}^{\top }\big)\\{} & ={\lambda _{k}}\big({\big({X}^{\top }\varSigma X\big)}^{\dagger }{X}^{\top }{C}^{\top }CX\big)\\{} & ={\lambda _{k}}\big({\mathrm{M}_{d}^{\dagger }}{\varLambda _{d}}\big)={\nu _{k}},\end{aligned}\]

where ${\mathrm{M}_{d}}=\operatorname{diag}({\mu _{1}},\dots ,{\mu _{d}})$ and ${\varLambda _{d}}=\operatorname{diag}({\lambda _{1}},\dots ,{\lambda _{d}})$ are principal submatrices of the matrices M and Λ, respectively. □

Proof of Proposition 7.6.

For every matrix Δ that satisfies the constraints $(I-{P_{\varSigma }})\Delta =0$ and $\operatorname{rk}(C-\Delta )\le n$, there exists an $(n+d)\times d$ matrix X of rank d such that $(C-\Delta )X=0$. Assuming that such Δ exists, we get $\nu <+\infty $ because the equalities $\nu =+\infty $, $(I-{P_{\varSigma }})\Delta =0$, $\operatorname{rk}X=d$, and $(C-\Delta )X=0$ cannot hold simultaneously.

We have

(73)

\[\begin{aligned}{}\big\| \Delta \hspace{0.1667em}{\big({\varSigma }^{1/2}\big)}^{\dagger }{\big\| _{F}^{2}}& =\operatorname{tr}\big(\Delta {\varSigma }^{\dagger }{\Delta }^{\top }\big)={\sum \limits_{i=1}^{m}}{\lambda _{i}}\big(\Delta {\varSigma }^{\dagger }{\Delta }^{\top }\big)\\{} & ={\sum \limits_{i=1}^{m-d}}{\lambda _{i}}\big(\Delta {\varSigma }^{\dagger }{\Delta }^{\top }\big)+{\sum \limits_{k=1}^{d}}{\lambda _{k+m-d}}\big(\Delta {(\varSigma )}^{\dagger }{\Delta }^{\top }\big)\\{} & \ge 0+{\sum \limits_{k=1}^{d}}{\nu _{k}},\end{aligned}\]

where the inequalities hold true due to positive semidefiniteness of Σ and due to Proposition 7.5.

If ${\nu _{d}}=\infty $, than the constraints $\Delta \hspace{0.1667em}(I-{P_{\varSigma }})=0$ and $\operatorname{rk}(C-\Delta )\le n$ are not compatible. Otherwise, the equality in (73) is attained for $\Delta ={\Delta _{\mathrm{em}}}:=CX{({X}^{\top }\varSigma X)}^{\dagger }\times {X}^{\top }\varSigma $, where the matrix X consists of first d rows of the matrix T, where T comes from decomposition (35).

Thus, if the constraints in (7) are compatible, then the minimum is equal to ${({\sum _{k=1}^{d}}{\nu _{k}})}^{1/2}$ and is attained at ${\Delta _{\mathrm{em}}}$. Otherwise, if the constraints are incompatible, then by contraposition to the second statement of Proposition 7.5 ${\nu _{d}}=+\infty $ and ${({\sum _{k=1}^{d}}{\nu _{k}})}^{1/2}=+\infty $.

If the minimum in (7) is attained at Δ, then the inequality (73) becomes an equality, whence

(74)

\[\begin{aligned}{}{\lambda _{i}}\big(\Delta {\varSigma }^{\dagger }{\Delta }^{\top }\big)& =0,\hspace{1em}i=1,\dots ,m-d;\end{aligned}\]

(75)

\[\begin{aligned}{}{\lambda _{k+m-d}}\big(\Delta {\varSigma }^{\dagger }{\Delta }^{\top }\big)& ={\nu _{k}},\hspace{1em}k=1,\dots ,d;\end{aligned}\]

in particular,

\[ {\lambda _{\max }}\big(\Delta {\varSigma }^{\dagger }{\Delta }^{\top }\big)={\nu _{d}}.\]

Remember that ${\nu _{d}}$ is the minimum value in (11). Thus, the minimum in (11) is attained at Δ, although it may be also attained elsewhere. □

Proof of Proposition 7.7.

1. The monotonicity follows from results of [14]. The unitarily invariant norm is a symmetric gauge function of the singular values, and the symmetric gauge function is monotonous in non-negative inputs (see [14, ineq. (2.5)]).

2. Let ${\sigma _{1}}({M_{1}})<{\sigma _{1}}({M_{2}})$ and ${\sigma _{i}}({M_{1}})\le {\sigma _{i}}({M_{2}})$ for all $i=2,\dots ,\min (m,n)$. Then for all $k=1,\dots ,\min (m,n)$

\[ {\sum \limits_{i=1}^{k}}{\sigma _{i}}({M_{1}})\le \frac{{\sigma _{1}}({M_{1}})+{\sigma _{2}}({M_{1}})+\cdots +{\sigma _{\min (m,n)}}({M_{1}})}{{\sigma _{1}}({M_{2}})+{\sigma _{2}}({M_{1}})+\cdots +{\sigma _{\min (m,n)}}({M_{1}})}{\sum \limits_{i=1}^{k}}{\sigma _{i}}({M_{2}}).\]

Due to Ky Fan [3, Theorem 4] or [14, Theorem 1], this implies that

\[ \| {M_{1}}{\| _{\mathrm{U}}}\le \frac{{\sigma _{1}}({M_{1}})+{\sigma _{2}}({M_{1}})+\cdots +{\sigma _{\min (m,n)}}({M_{1}})}{{\sigma _{1}}({M_{2}})+{\sigma _{2}}({M_{1}})+\cdots +{\sigma _{\min (m,n)}}({M_{1}})}\| {M_{2}}{\| _{\mathrm{U}}}.\]

Since

\[ 0\le \frac{{\sigma _{1}}({M_{1}})+{\sigma _{2}}({M_{1}})+\cdots +{\sigma _{\min (m,n)}}({M_{1}})}{{\sigma _{1}}({M_{2}})+{\sigma _{2}}({M_{1}})+\cdots +{\sigma _{\min (m,n)}}({M_{1}})}<1\hspace{1em}\text{and}\hspace{1em}\| {M_{2}}{\| _{\mathrm{U}}}>0,\]

$\| {M_{1}}{\| _{\mathrm{U}}}<\| {M_{2}}{\| _{\mathrm{U}}}$. □

Proof of Proposition 7.8.

Notice that the optimization problems (7), (11), and (12) have the same constraints. If the constraints are compatible, then the minimum in (7) is attained for $\Delta ={\Delta _{\mathrm{em}}}:=CX{({X}^{\top }\varSigma X)}^{\dagger }{X}^{\top }\varSigma $.

1. Let ${\Delta _{\min \text{(7)}}}$ minimize (7), and let ${\Delta _{\mathrm{feas}}}$ satisfy the constraints. Then, by Proposition 7.5 and eqn. (75),

\[\begin{aligned}{}{\lambda _{k+m-d}}\big({\Delta _{\min \text{(7)}}}{\varSigma }^{\dagger }{\Delta _{\min \text{(7)}}^{\top }}\big)& ={\nu _{k}}\le {\lambda _{k+m-d}}\big({\Delta _{\mathrm{feas}}}{\varSigma }^{\dagger }{\Delta _{\mathrm{feas}}^{\top }}\big),\hspace{1em}k=1,\dots ,d;\\{} {\sigma _{d+1-k}}\big({\Delta _{\min \text{(7)}}}{\big({\varSigma }^{1/2}\big)}^{\dagger }\big)& \le {\sigma _{d+1-k}}\big({\Delta _{\mathrm{feas}}}{\big({\varSigma }^{1/2}\big)}^{\dagger }\big),\\{} k& =\max (1,d+1-m),\dots ,d;\\{} {\sigma _{j}}\big({\Delta _{\min \text{(7)}}}{\big({\varSigma }^{1/2}\big)}^{\dagger }\big)& \le {\sigma _{j}}\big({\Delta _{\mathrm{feas}}}{\big({\varSigma }^{1/2}\big)}^{\dagger }\big),\hspace{1em}j=1,\dots ,\min (d,m);\end{aligned}\]

by eqn. (74)

\[\begin{aligned}{}{\lambda _{i}}\big({\Delta _{\min \text{(7)}}}{\varSigma }^{\dagger }{\Delta _{\min \text{(7)}}^{\top }}\big)& =0,\hspace{1em}i=1,\dots ,m-d,\\{} {\sigma _{m+1-i}}\big({\Delta _{\min \text{(7)}}}{\big({\varSigma }^{1/2}\big)}^{\dagger }\big)& =0\le {\sigma _{m+1-i}}\big({\Delta _{\mathrm{feas}}}{\big({\varSigma }^{1/2}\big)}^{\dagger }\big),\hspace{1em}i\le m-d;\\{} {\sigma _{j}}\big({\Delta _{\min \text{(7)}}}{\big({\varSigma }^{1/2}\big)}^{\dagger }\big)& =0\le {\sigma _{j}}\big({\Delta _{\mathrm{feas}}}{\big({\varSigma }^{1/2}\big)}^{\dagger }\big),\hspace{1em}d+1\le j\le \min (m,\hspace{0.2222em}n+d).\end{aligned}\]

Thus

(76)

\[ {\sigma _{j}}\big({\Delta _{\min \text{(7)}}}{\big({\varSigma }^{1/2}\big)}^{\dagger }\big)\le {\sigma _{j}}\big({\Delta _{\mathrm{feas}}}{\big({\varSigma }^{1/2}\big)}^{\dagger }\big)\hspace{1em}\text{for all}\hspace{2.5pt}j\le \min (m,n+d),\]

whence by Proposition 7.7 $\| {\Delta _{\min \text{(7)}}}{({\varSigma }^{1/2})}^{\dagger }{\| _{\mathrm{U}}}\le \| {\Delta _{\mathrm{feas}}}{({\varSigma }^{1/2})}^{\dagger }{\| _{\mathrm{U}}}$. Thus ${\Delta _{\min \text{(7)}}}$ indeed minimizes (12).

2. Let ${\Delta _{\min \text{(12)}}}$ minimize (12), so the constraints are compatible. Then ${\Delta _{\mathrm{em}}}$ minimizes both (7) and (11), see Proposition 7.6. Thus,

\[ {\big\| {\Delta _{\min \text{(12)}}}{\big({\varSigma }^{1/2}\big)}^{\dagger }\big\| }_{\mathrm{U}}\le {\big\| {\Delta _{\mathrm{em}}}{\big({\varSigma }^{1/2}\big)}^{\dagger }\big\| }_{\mathrm{U}},\]

and by (76)

\[ {\sigma _{j}}\big({\Delta _{\mathrm{em}}}{\big({\varSigma }^{1/2}\big)}^{\dagger }\big)\le {\sigma _{j}}\big({\Delta _{\min \text{(12)}}}{\big({\varSigma }^{1/2}\big)}^{\dagger }\big)\hspace{1em}\text{for all}\hspace{2.5pt}j\le \min (m,n+d).\]

Then by Proposition 7.7 (contraposition to part 2)

\[\begin{aligned}{}{\sigma _{1}}\big({\Delta _{\mathrm{em}}}{\big({\varSigma }^{1/2}\big)}^{\dagger }\big)& ={\sigma _{1}}\big({\Delta _{\min \text{(12)}}}{\big({\varSigma }^{1/2}\big)}^{\dagger }\big),\\{} \underset{\begin{array}{c}\Delta (I-{P_{\varSigma }})=0\\{} \operatorname{rk}(C-\Delta )\le n\end{array}}{\min }\big(\Delta {\varSigma }^{\dagger }{\Delta }^{\top }\big)& ={\lambda _{\max }}\big({\Delta _{\mathrm{em}}}{\varSigma }^{\dagger }{\Delta _{\mathrm{em}}^{\top }}\big)={\lambda _{\max }}\big({\Delta _{\min \text{(12)}}}{\varSigma }^{\dagger }{\Delta _{\min \text{(12)}}^{\top }}\big).\end{aligned}\]

Thus ${\Delta _{\min \text{(12)}}}$ indeed minimizes (11). □

Proof of Proposition 7.9.

We can assume that ${\mu _{i}}\in \{0,1\}$ in (35).

The set of matrices Δ that satisfy (8) depends only on $\operatorname{span}\langle {\widehat{X}_{\mathrm{ext}}}\rangle $ and does not change after linear transformations of columns of ${\widehat{X}_{\mathrm{ext}}}$.

By linear transformations of the columns, the matrix ${T}^{-1}{\widehat{X}_{\mathrm{ext}}}$ can be transformed to the reduced column echelon form. Thus, there exists such an $(n+d)\times d$ matrix ${T_{5}}$ in the column echelon form that

\[ \operatorname{span}\langle {\widehat{X}_{\mathrm{ext}}}\rangle =\operatorname{span}\langle T{T_{5}}\rangle .\]

Notice that $\operatorname{rk}{T_{5}}=\operatorname{rk}{\widehat{X}_{\mathrm{ext}}}=d$.

Denote by ${d_{\ast }}$ and ${d}^{\ast }$ the first and the last of the indices i such that ${\nu _{i}}={\nu _{d}}$. Then

\[\begin{aligned}{}{\nu _{{d_{\ast }}-1}}& <{\nu _{{d_{\ast }}}}\hspace{1em}\text{if}\hspace{2.5pt}{d_{\ast }}\ge 2\text{;}\\{} {\nu _{{d_{\ast }}}}& =\cdots ={\nu _{d}}=\cdots ={\nu _{{d}^{\ast }}};\\{} {\nu _{{d}^{\ast }}}& <{\nu _{{d}^{\ast }+1}}\hspace{1em}\text{if}\hspace{2.5pt}{d}^{\ast }<n+d\text{.}\end{aligned}\]

Necessity. Let Δ be a point where the constrained minimum in (7) is attained. Then equalities (74)–(75) from the proof of Proposition 7.6 hold true. Thus, due to Propositions 7.4 and 7.5, for all $k=1,\dots ,d$

\[ \min \big\{\lambda \ge 0:\text{``}\exists V\subset \operatorname{span}\langle {\widehat{X}_{\mathrm{ext}}}\rangle ,\hspace{0.2778em}\dim V=k:\big({C}^{\top }C-\lambda \varSigma \big){|_{V}}\le 0\text{''}\big\}={\nu _{k}}.\]

According to 7.4-1, we can construct a stack of subspaces

\[ {V_{1}}\subset {V_{2}}\subset \cdots \subset {V_{d}}=\operatorname{span}\langle {\widehat{X}_{\mathrm{ext}}}\rangle ,\]

such that $\dim {V_{k}}=k$ and the restriction of the quadratic form ${C}^{\top }C-{\nu _{k}}\varSigma $ to the subspace ${V_{k}}$ is negative semidefinite, for all $k\le d$.

Now, prove that

(77)

\[ \operatorname{span}\langle {u_{i}}:{\nu _{i}}<{\nu _{d}}\rangle \subset \operatorname{span}\langle {\widehat{X}_{\mathrm{ext}}}\rangle .\]

Suppose the contrary: $\operatorname{span}\langle {u_{i}}:{\nu _{i}}<{\nu _{d}}\rangle \not\subset \operatorname{span}\langle {\widehat{X}_{\mathrm{ext}}}\rangle $. Then there exists $i<{d_{\ast }}$ such that ${u_{i}}\notin \operatorname{span}\langle {\widehat{X}_{\mathrm{ext}}}\rangle $, and, as a consequence, ${u_{i}}\notin {V_{\max \{j\hspace{0.2778em}:\hspace{0.2778em}{\nu _{j}}\le {\nu _{i}}\}}}$. Find the least k such that ${u_{k}}\notin {V_{\max \{j\hspace{0.2778em}:\hspace{0.2778em}{\nu _{j}}\le {\nu _{k}}\}}}$. Let ${k_{\ast }}$ and ${k}^{\ast }$ denote the first and the last indices i such that ${\nu _{i}}={\nu _{k}}$. Then $1\le {k_{\ast }}\le k\le {k}^{\ast }<{d_{\ast }}\le d\le {d}^{\ast }$ and ${u_{k}}\notin {V_{{k}^{\ast }}}$.

Since $\operatorname{span}\langle {u_{1}},\dots ,{u_{{k_{\ast }}-1}}\rangle \subset {V_{{k_{\ast }}-1}}\subset {V_{{k}^{\ast }}}$,

\[\begin{aligned}{}\dim \big({V_{{k}^{\ast }}}\cap \operatorname{span}\langle {u_{{k_{\ast }}}},\dots ,{u_{n+d}}\rangle \big)& =\dim \big({V_{{k}^{\ast }}}/\operatorname{span}\langle {u_{1}},\dots ,{u_{{k_{\ast }}-1}}\rangle \big)\\{} & =\dim {V_{{k}^{\ast }}}-({k_{\ast }}-1)={k}^{\ast }-{k_{\ast }}+1.\end{aligned}\]

Since ${u_{k}}\notin {V_{{k}^{\ast }}}$, ${u_{k}}\notin {V_{{k}^{\ast }}}\cap \operatorname{span}\langle {u_{{k_{\ast }}}},\dots ,{u_{n+d}}\rangle $,

\[ \dim \operatorname{span}\big\langle {V_{{k}^{\ast }}}\cap \operatorname{span}\langle {u_{{k_{\ast }}}},\dots {u_{n+d}}\rangle ,\hspace{0.2222em}{u_{k}}\big\rangle ={k}^{\ast }-{k_{\ast }}+2.\]

Now, consider the $(n+d-{k_{\ast }}+1)\times (n+d-{k_{\ast }}+1)$ diagonal matrix

\[\begin{aligned}{}D(\lambda )& :={[{u_{{k_{\ast }}}},\dots ,{u_{n+d}}]}^{\top }\big({C}^{\top }C-\lambda \varSigma \big)[{u_{{k_{\ast }}}},\dots ,{u_{n+d}}]\\{} & =\operatorname{diag}({\lambda _{j}}-\lambda {\mu _{j}},\hspace{0.2778em}j={k_{\ast }},\dots ,n+d)\end{aligned}\]

for various λ. For $\lambda ={\nu _{k}}={\nu _{{k_{\ast }}}}$, the inequality ${\lambda _{j}}-{\nu _{k}}{\mu _{j}}\ge 0$ holds true for all $j\ge {k_{\ast }}$, so the matrix $D({\nu _{k}})$ is positive semidefinite. For $\lambda ={\nu _{{k}^{\ast }+1}}$, the inequality ${\lambda _{j}}-{\nu _{{k}^{\ast }+1}}{\mu _{j}}\le 0$ holds true for all ${k_{\ast }}\le j\le {k}^{\ast }+1$, so there exists a ${k}^{\ast }-{k_{\ast }}+2$-dimensional subspace of ${\mathbb{R}}^{n+d-{k_{\ast }}+1}$ where the quadratic form $D({\nu _{{k}^{\ast }+1}})$ is negative semidefinite. For $\lambda <{\nu _{{k}^{\ast }+1}}$, the inequality ${\lambda _{j}}-\lambda {\mu _{j}}>0$ holds true for all ${k}^{\ast }+1\le j\le n+d$. Therefore, there exists an $n+d-{k}^{\ast }$-dimensional subspace of ${\mathbb{R}}^{n+d-{k_{\ast }}+1}$ where the quadratic form $D(\lambda )$ is positive definite. According to the proof of Sylvester’s law of inertia, there is no subspace of dimension ${k}^{\ast }-{k_{\ast }}+2=(n+d-{k_{\ast }}+1)-(n+d-{k}^{\ast })+1$ where the quadratic form $D(\lambda )$ is negative semidefinite. Thus, ${\nu _{{k}^{\ast }+1}}$ is the least number such that there exists a ${k}^{\ast }-{k_{\ast }}+2$-dimensional subspace where the quadratic form $D(\lambda )$ is negative semidefinite.

Similarly to the chain of equalities in the proof of Proposition 7.4,

(78)

\[\begin{aligned}{}{\nu _{{k}^{\ast }+1}}& =\min \big\{\lambda \ge 0:\text{``}\exists {V_{1}},\hspace{0.2778em}\dim {V_{1}}={k}^{\ast }-{k_{\ast }}+2\hspace{0.2778em}:\hspace{0.2778em}D(\lambda ){|_{{V_{1}}}}\le 0\text{''}\big\}\\{} & =\min \big\{\lambda \ge 0:\text{``}\exists {V_{1}},\hspace{0.2778em}\dim {V_{1}}={k}^{\ast }-{k_{\ast }}+2\hspace{0.2778em}:\\{} & \hspace{2em}{[{u_{{k_{\ast }}}},\dots ,{u_{n+d}}]}^{\top }\big({C}^{\top }C-\lambda \varSigma \big)[{u_{{k_{\ast }}}},\dots ,{u_{n+d}}]{|_{{V_{1}}}}\le 0\text{''}\big\}\\{} & =\min \big\{\lambda \ge 0:\text{``}\exists {V_{1}},\hspace{0.2778em}V\subset \operatorname{span}\langle {u_{{k_{\ast }}}},\dots ,{u_{n+d}}\rangle ,\hspace{0.2778em}\dim V={k}^{\ast }-{k_{\ast }}+2\hspace{0.2778em}:\\{} & \hspace{2em}\big({C}^{\top }C-\lambda \varSigma \big){|_{V}}\le 0\text{''}\big\}\end{aligned}\]

The restriction of the quadratic form ${C}^{\top }C-{\nu _{k}}\varSigma $ to the subspace $\operatorname{span}\langle {u_{{k_{\ast }}}},\dots ,{u_{n+d}}\rangle $ is positive semidefinite because ${[{u_{{k_{\ast }}}},\dots ,{u_{n+d}}]}^{\top }({C}^{\top }C-{\nu _{k}}\varSigma )\times [{u_{{k_{\ast }}}},\dots ,{u_{n+d}}]=D({\nu _{k}})$ is a positive semidefinite diagonal matrix. Then

(79)

\[\begin{aligned}{}& \big\{v\in \operatorname{span}\langle {u_{{k_{\ast }}}},\dots {u_{n+d}}\rangle :{v}^{\top }\big({C}^{\top }C-{\nu _{k}}\varSigma \big)v\le 0\big\}\\{} & \hspace{1em}=\big\{v\in \operatorname{span}\langle {u_{{k_{\ast }}}},\dots ,{u_{n+d}}\rangle :\big({C}^{\top }C-{\nu _{k}}\varSigma \big)v=0\big\}\end{aligned}\]

is a linear subspace. Since this subspace contains the subspace ${V_{k}}\cap \operatorname{span}\langle {u_{{k_{\ast }}}},\dots ,{u_{n+d}}\rangle $ (as the quadratic form ${C}^{\top }C-{\nu _{k}}\varSigma $ is negative semidefinite on ${V_{k}}$) and the vector ${u_{k}}$ (as ${u_{k}}\in \operatorname{span}\langle {u_{{k_{\ast }}}},\dots ,{u_{n+d}}\rangle $ and ${u_{k}^{\top }}({C}^{\top }C-{\nu _{k}}\varSigma ){u_{k}^{}}={\lambda _{k}}-{\nu _{k}}{\mu _{k}}=0$), it contains $\operatorname{span}\langle {V_{{k}^{\ast }}}\cap \operatorname{span}\langle {u_{{k_{\ast }}}},\dots ,{u_{n+d}}\rangle ,\hspace{0.2222em}{u_{k}}\rangle $. But, as ${\nu _{k}}<{\nu _{{k}^{\ast }+1}}$, this contradicts (78).

Now, prove that

(80)

\[ \operatorname{span}\langle {\widehat{X}_{\mathrm{ext}}}\rangle \subset \operatorname{span}\langle {u_{i}}:{\nu _{i}}\le {\nu _{d}}\rangle .\]

Due to (77),

\[ \operatorname{span}\langle {\widehat{X}_{\mathrm{ext}}}\rangle =\operatorname{span}\big\langle \operatorname{span}\langle {\widehat{X}_{\mathrm{ext}}}\rangle \cap \operatorname{span}\langle {u_{{d_{\ast }}}},\dots ,{u_{n+d}}\rangle ,\hspace{0.2222em}{u_{1}},\dots ,{u_{{d_{\ast }}-1}}\big\rangle .\]

Hence, to prove (80), it is enough to show that

(81)

\[ \operatorname{span}\langle {\widehat{X}_{\mathrm{ext}}}\rangle \cap \operatorname{span}\langle {u_{{d_{\ast }}}},\dots ,{\nu _{n+d}}\rangle \subset \operatorname{span}\langle {u_{{d_{\ast }}}},\dots ,{\nu _{{d}^{\ast }}}\rangle .\]

The restriction of the quadratic form ${C}^{\top }C-{\nu _{d}}\varSigma $ to the subspace $\operatorname{span}\langle {u_{{d_{\ast }}}},\dots ,{u_{n+d}}\rangle $ is positive semidefinite. Hence

(82)

\[\begin{aligned}{}& \big\{v\in \operatorname{span}\langle {u_{{d_{\ast }}}},\dots {u_{n+d}}\rangle :{v}^{\top }\big({C}^{\top }C-{\nu _{d}}\varSigma \big)v\le 0\big\}\\{} & \hspace{1em}=\big\{v\in \operatorname{span}\langle {u_{{d_{\ast }}}},\dots {u_{n+d}}\rangle :{v}^{\top }\big({C}^{\top }C-{\nu _{d}}\varSigma \big)v=0\big\}\end{aligned}\]

is a linear subspace (see equation (79)). This subspace contains the subspaces $\operatorname{span}\langle {\widehat{X}_{\mathrm{ext}}}\rangle \cap \operatorname{span}\langle {u_{{d_{\ast }}}},\dots ,{\nu _{n+d}}\rangle $ and $\operatorname{span}\langle {u_{{d_{\ast }}}},\dots ,{\nu _{{d}^{\ast }}}\rangle $. Denote the dimension of the subspace (82):

\[ {d_{2}}=\dim \big\{v\in \operatorname{span}\langle {u_{{d_{\ast }}}},\dots {u_{n+d}}\rangle :{v}^{\top }\big({C}^{\top }C-{\nu _{d}}\varSigma \big)v=0\big\}.\]

If (81) does not hold, then ${d_{2}}>{d}^{\ast }-{d_{\ast }}+1$; ${d_{2}}\ge {d}^{\ast }-{d_{\ast }}+2$. Then

\[ \exists V\subset \operatorname{span}\langle {u_{{d_{\ast }}}},\dots {u_{n+d}}\rangle ,\hspace{0.2778em}\dim V={d_{2}}\hspace{0.2778em}:\hspace{0.2778em}\big({C}^{\top }C-{\nu _{d}}\varSigma \big){|_{V}}\le 0\]

(as an instance of such a subspace V, we can take the one defined in (82)). Then, taking a ${d}^{\ast }-{d_{\ast }}+2$-dimensional subspace of V, we get

\[ \exists V\subset \operatorname{span}\langle {u_{{d_{\ast }}}},\dots {u_{n+d}}\rangle ,\hspace{0.2778em}\dim V={d}^{\ast }-{d_{\ast }}+2\hspace{0.2778em}:\hspace{0.2778em}\big({C}^{\top }C-{\nu _{d}}\varSigma \big){|_{V}}\le 0.\]

Due to (78) (for $k=d$), ${\nu _{{d}^{\ast }+1}}\le {\nu _{d}}$, which does not hold true.

Assuming the contrary to (81), we got a contradiction. Hence, (81) and (80) hold true.

Sufficiency. Remember that $T=[{u_{1}},\dots ,{u_{n+d}}]$ is an $(n+d)\times (n+d)$ matrix of generalized eigenvectors of the matrix pencil $\langle {C}^{\top }C,\hspace{0.1667em}\varSigma \rangle $, and respective generalized eigenvalues are arranged in ascending order. By means of linear operations of the columns, the matrix ${T}^{-1}{\widehat{X}_{\mathrm{ext}}}$ can be transformed into the reduced column echelon form. In other words, there exists such an $n\times n$ nonsingular matrix ${T_{8}}$, that the $(n+d)\times n$ matrix

(83)

\[ {T_{5}}={T}^{-1}{\widehat{X}_{\mathrm{ext}}}{T_{8}}\]

is in the reduced column echelon form. The equality (83) implies that

(84)

\[ \operatorname{span}\langle {\widehat{X}_{\mathrm{ext}}}\rangle =\operatorname{span}\langle T{T_{5}}\rangle .\]

If condition (37) holds, then in representation (84) the matrix ${T_{5}}$ has the following block structure

where ${T_{61}}$ is a $({d}^{\ast }-{d_{\ast }}+1)\times (d-{d_{\ast }}+1)$ reduced column echelon matrix. (Any of the blocks except ${T_{61}}$ may be an “empty matrix”.)

Since the columns of ${T_{5}}$ are linearly independent, the columns of ${T_{61}}$ are linearly independent as well. Hence the matrix ${T_{61}}$ may be appended with columns such that the resulting matrix ${T_{6}}=[{T_{61}},{T_{62}}]$ is nonsingular. Perform the Gram–Schmidt orthogonalization of columns of the matrix ${T_{6}}$ by constructing such an upper-triangular matrix

that ${T_{7}^{\top }}{T_{6}^{\top }}{T_{6}^{}}{T_{7}^{}}={I_{{d}^{\ast }-{d_{\ast }}+1}}$.

Change the basis in the simultaneous diagonalization of the matrices ${C}^{\top }C$ and Σ. Denote

\[ {T_{\mathrm{new}}}=\big[{u_{1}},\dots {u_{{d_{\ast }}-1}},[{u_{{d_{\ast }}}},\dots {u_{{d}^{\ast }}}]{T_{6}}{T_{7}},{u_{{d}^{\ast }+1}},\dots {u_{n+d}}\big].\]

If ${\nu _{d}}>0$, the equation (35) with ${T_{\mathrm{new}}}$ substituted for T holds true, since

\[ {T_{\mathrm{new}}^{\top }}{C}^{\top }C{T_{\mathrm{new}}^{}}=\varLambda ,\hspace{2em}{T_{\mathrm{new}}^{\top }}\varSigma {T_{\mathrm{new}}^{}}=\mathrm{M}.\]

(Here we use that ${\lambda _{{d_{\ast }}}}=\cdots ={\lambda _{{d}^{\ast }}}$, ${\mu _{{d_{\ast }}}}=\cdots ={\mu _{{d}^{\ast }}}$. If ${\nu _{d}}=0$, then the latter equation may or may not hold true.) The subspace

\[\begin{aligned}{}\operatorname{span}\langle {\widehat{X}_{\mathrm{ext}}}\rangle =\operatorname{span}\langle T{T_{5}}\rangle & =\operatorname{span}\big\langle {u_{1}},\dots {u_{{d_{\ast }}-1}},[{u_{{d_{\ast }}}},\dots {u_{{d}^{\ast }}}]{T_{61}}\big\rangle \\{} & =\operatorname{span}\big\langle {u_{1}},\dots {u_{{d_{\ast }}-1}},[{u_{{d_{\ast }}}},\dots {u_{{d}^{\ast }}}]{T_{61}}{T_{71}}\big\rangle \end{aligned}\]

is spanned by the first d columns of the matrix ${T_{\mathrm{new}}}$.

It can be easily verified that $\operatorname{span}\langle {\widehat{X}_{\mathrm{ext}}^{\top }}{C}^{\top }\rangle =\operatorname{span}\langle {T_{8}^{\top }}{T_{5}^{\top }}\varLambda \rangle $ and $\operatorname{span}\langle {\widehat{X}_{\mathrm{ext}}^{\top }}\varSigma \rangle =\operatorname{span}\langle {T_{8}^{\top }}{T_{5}^{\top }}\mathrm{M}\rangle $. The condition $\operatorname{span}\langle {\widehat{X}_{\mathrm{ext}}^{\top }}{C}^{\top }\rangle \subset \operatorname{span}\langle {\widehat{X}_{\mathrm{ext}}^{\top }}\varSigma \rangle $ holds true if (and only if) ${\nu _{d}}<\infty $. Thus, due to Proposition 7.2, if the condition ${\nu _{d}}<\infty $ holds true, then the constraints $\Delta \hspace{0.1667em}(I-{P_{\varSigma }})=0$ and $(C-\Delta ){\widehat{X}_{\mathrm{ext}}}=0$ are compatible.

Let ${\Delta _{\mathrm{pm}}}$ be a common point of minimum in

\[ {\lambda _{k+m-d}}\big({\Delta _{\mathrm{pm}}}{\varSigma }^{\dagger }{\Delta _{\mathrm{pm}}^{\top }}\big)=\underset{\begin{array}{c}\Delta (I-{P_{\varSigma }})=0\\{} (C-\Delta ){\widehat{X}_{\mathrm{ext}}}=0\end{array}}{\min }{\lambda _{k+m-d}}\big(\Delta {\varSigma }^{\dagger }{\Delta }^{\top }\big)\]

for all $k=1,\dots ,d$, such that ${\Delta _{\mathrm{pm}}}\hspace{0.1667em}(I-{P_{\varSigma }})=0$ and $(C-{\Delta _{\mathrm{pm}}}){\widehat{X}_{\mathrm{ext}}}=0$; such ${\Delta _{\mathrm{pm}}}$ exists due to Remark 7.4-1. By Proposition 7.5,

\[ {\lambda _{k+m-d}}\big({\Delta _{\mathrm{pm}}}{\varSigma }^{\dagger }{\Delta _{\mathrm{pm}}^{\top }}\big)={\nu _{k}},\hspace{1em}k=1,\dots ,d,\]

and, from the proof of Preposition 7.6,

\[ {\lambda _{i}}\big({\Delta _{\mathrm{pm}}}{\varSigma }^{\dagger }{\Delta _{\mathrm{pm}}^{\top }}\big)=0,\hspace{1em}i=1,\dots ,m-d.\]

The minimum in (7) is attained at $\Delta ={\Delta _{\mathrm{pm}}}$.

The case ${\nu _{d}}=0$ is trivial: then (37) imply that $C{\widehat{X}_{\mathrm{ext}}}=0$. Then $\Delta =0$ satisfies the constraints $\Delta \hspace{0.1667em}(I-{P_{\varSigma }})=0$ and $(C-\Delta ){\widehat{X}_{\mathrm{ext}}}=0$ and minimizes the criterion function in (7). □

Proof of Proposition 7.10.

Remember that if ${\nu _{d}}<\infty $, then the constraints in (11) are compatible, and the minimum is attained and is equal to ${\nu _{d}}$; see Proposition 7.5. Otherwise, if ${\nu _{d}}=\infty $, then the constraints in (11) are incompatible.

Transform the expression for the functional (38):

(85)

\[\begin{aligned}{}{Q_{1}}(X)& :={\lambda _{\max }}\big({\big({X}^{\top }\varSigma X\big)}^{-1}{X}^{\top }{C}^{\top }CX\big)\\{} & ={\lambda _{\max }}\big(CX{\big({X}^{\top }\varSigma X\big)}^{-1}{X}^{\top }{C}^{\top }\big)\\{} & =\underset{{\Delta _{1}}\in {\mathbb{R}}^{m\times (n+d)}\hspace{0.1667em}:\hspace{0.1667em}{\Delta _{1}}(I-{P_{\varSigma }})=0,\hspace{0.1667em}(C-{\Delta _{1}})X=0}{\min }{\lambda _{\max }}\big({\Delta _{1}^{}}{\varSigma }^{\dagger }{\Delta _{1}^{\top }}\big).\end{aligned}\]

Here we used the rule how eigenvalues of the matrix product change when the matrices are swapped, and we also used Propositions 7.2 and 7.3. By Proposition 7.5, ${Q_{1}}(X)\ge {\nu _{d}}$.

If the minimum in (11) & (8) is attained (say at some point $(\Delta ,{\widehat{X}_{\mathrm{ext}}})$), then the constraints in the right-hand side of (85) are compatible for $X={\widehat{X}_{\mathrm{ext}}}$ (particularly, Δ is a matrix that satisfies the constraints). Then by Proposition 7.3 the matrix ${\widehat{X}_{\mathrm{ext}}^{\top }}\varSigma {\widehat{X}_{\mathrm{ext}}^{}}$ is nonsingular. Thus, for $X={\widehat{X}_{\mathrm{ext}}}$, minimum in the right-hand of (85) is attained at ${\Delta _{1}}=\Delta $ (because Δ satisfies stronger constraints of (85) and brings a minimum to the same functional with weaker constraints of (11)).

Hence,

\[\begin{aligned}{}{Q_{1}}({\widehat{X}_{\mathrm{ext}}})& :=\underset{{\Delta _{1}}\in {\mathbb{R}}^{m\times (n+d)}\hspace{0.1667em}:\hspace{0.1667em}{\Delta _{1}}(I-{P_{\varSigma }})=0,\hspace{0.1667em}(C-{\Delta _{1}}){\widehat{X}_{\mathrm{ext}}}=0}{\min }{\lambda _{\max }}\big({\Delta _{1}^{}}{\varSigma }^{\dagger }{\Delta _{1}^{\top }}\big)\\{} & =\big(\Delta {\varSigma }^{\dagger }{\Delta }^{\top }\big)={\nu _{d}},\end{aligned}\]

which is the minimum value of ${Q_{1}}$.

Transform the expression for the functional (39):

\[\begin{aligned}{}& {\lambda _{\max }}\big({\big({X}^{\top }\varSigma X\big)}^{-1}{X}^{\top }\big({C}^{\top }C-m\varSigma \big)X\big)\\{} & \hspace{1em}={\lambda _{\max }}\big({\big({X}^{\top }\varSigma X\big)}^{-1}{X}^{\top }\big({C}^{\top }C\big)X-m{I_{n+d}}\big)={Q_{1}}(X)-m.\end{aligned}\]

Hence, the functionals (38) and (39) attain their minimal values at the same points. □

9 Conclusion

The linear errors-in-variables model is considered. The errors are assumed to have the same covariance matrix for each observation and to be independent between different observations, however some variables may be observed without errors. Detailed proofs of the consistency theorems for the TLS estimator, which were first stated in [18], are presented.

It is proved that that the final estimator $\widehat{X}$ for explicit-notation regression coefficients (i.e., for ${X_{0}}$ in (1) or (2), and not the estimator ${\widehat{X}_{\mathrm{ext}}}$ for ${X_{\mathrm{ext}}^{0}}$ in equation (3), which sets the relationship between the regressors and response variables implicitly) is unique, either with high probability or eventually. This means that in the classification used in [8], the TLS problem is of 1st class set ${\mathcal{F}_{1}}$ (the solution is unique and “generic”), with high probability or eventually.

As by-product, we get that if in the definition of the estimator the Frobenius norm is replaced by the spectral norm, then the consistency theorems still hold true. The disadvantage of using spectral norm is that the estimator $\widehat{X}$ is not unique then. (The set of solutions to the minimal spectral norm problem contains the set of solutions to the TLS problem. On the other hand, it is possible that the minimal spectral norm problem has solutions, but the TLS problem has not – this is the TLS problem of 1st class set ${\mathcal{F}_{3}}$; the probability of this random event tends to 0.)

Results can be generalized to any unitary invariant matrix norm. I do not know whether they hold true for non-invariant norms such as the maximum absolute entry, which is studied in [7].

References

[1]

Cheng, C.-L., Van Ness, J.W.: Statistical Regression with Measurement Error. Wiley (2010). MR1719513

[2]

de Leeuw, J.: Generalized eigenvalue problems with positive semi-definite matrices. Psychometrika 47(1), 87–93 (1982). MR0668507. https://doi.org/10.1007/BF02293853

[3]

Fan, K.: Maximum properties and inequalities for the eigenvalues of completely continuous operators. Proceedings of the National Academy of Sciences of the USA 37(11), 760–766 (1951). MR0045952. https://doi.org/10.1073/pnas.37.11.760

[4]

Gallo, P.P.: Consistency of regression estimates when some variables are subject to error. Communications in Statistics – Theory and Methods 11(9), 973–983 (1982). https://doi.org/10.1080/03610928208828287

[5]

Gleser, L.J.: Estimation in a multivariate “errors in variables” regression model: Large sample results. The Annals of Statistics 9(1), 24–44 (1981). MR0600530

[6]

Golub, G.H., Hoffman, A., Stewart, G.W.: A generalization of the Eckart–Young–Mirsky matrix approximation theorem. Linear Algebra and its Applications 88–89(Supplement C), 317–327 (1987). MR0882452. https://doi.org/10.1016/0024-3795(87)90114-5

[7]

Hladík, M., Černý, M., Antoch, J.: EIV regression with bounded errors in data: total ‘least squares’ with Chebyshev norm. Statistical Papers (2017). https://doi.org/10.1007/s00362-017-0939-z

[8]

Hnětynková, I., Plešinger, M., Sima, D.M., Strakoš, Z., Van Huffel, S.: The total least squares problem in $AX\approx B$: A new classification with the relationship to the classical works. SIAM Journal on Matrix Analysis and Applications 32(3), 748–770 (2011). MR2825323. https://doi.org/10.1137/100813348

[9]

Kukush, A., Markovsky, I., Van Huffel, S.: Consistency of the structured total least squares estimator in a multivariate errors-in-variables model. Journal of Statistical Planning and Inference 133(2), 315–358 (2005). MR2194481. https://doi.org/10.1016/j.jspi.2003.12.020

[10]

Kukush, A., Van Huffel, S.: Consistency of elementwise-weighted total least squares estimator in a multivariate errors-in-variables model $AX=B$. Metrika 59(1), 75–97 (2004). MR2043433. https://doi.org/10.1007/s001840300272

[11]

Marcinkiewicz, J., Zygmund, A.: Sur les fonctions indépendantes. Fundamenta Mathematicae 29, 60–90 (1937). MR0115885

[12]

Markovsky, I., Sima, D.M., Van Huffel, S.: Total least squares methods. Wiley Interdisciplinary Reviews: Computational Statistics 2(2), 212–217 (2010). https://doi.org/10.1002/wics.65

[13]

Markovsky, I., Willems, J.C., Van Huffel, S., De Moor, B.: Exact and Approximate Modeling of Linear Systems: A Behavioral Approach. SIAM, Philadelphia (2006). MR2207544. https://doi.org/10.1137/1.9780898718263

[14]

Mirsky, L.: Symmetric gauge functions and unitarily invariant norms. The Quarterly Journal of Mathematics 11(1), 50–59 (1960). MR0114821. https://doi.org/10.1093/qmath/11.1.50

[15]

Newcomb, R.W.: On the simultaneous diagonalization of two semi-definite matrices. Quarterly of Applied Mathematics 19(2), 144–146 (1961). MR0124336. https://doi.org/10.1090/qam/124336

[16]

Petrov, V.V.: Limit Theorems of Probability Theory: Sequences of Independent Random Variables. Clarendon Press, Oxford (1995). MR1353441

[17]

Pfanzagl, J.: On the measurability and consistency of minimum contrast estimates. Metrika 14, 249–272 (1969). https://doi.org/10.1007/BF02613654

[18]

Shklyar, S.V.: Conditions for the consistency of the total least squares estimator in an errors-in-variables linear regression model. Theory of Probability and Mathematical Statistics 83, 175–190 (2011). MR2768857. https://doi.org/10.1090/S0094-9000-2012-00850-8

[19]

Stewart, G., Sun, J.-g.: Matrix Perturbation Theory. Academic Press, San Diego (1990). MR1061154

[20]

Van Huffel, S., Vandewalle, J.: The Total Least Squares Problem: Computational Aspects and Analysis. SIAM, Philadelphia (1991). MR1118607. https://doi.org/10.1137/1.9781611971002

Reading mode

Table of contents

1 Introduction
2 The model and the estimator
3 Known consistency results
4 Existence and uniqueness of the estimator
5 Sketch of the proof of Theorems 3.5–3.7
6 Relevant classical results
7 Generalized eigenvalue problem for positive semidefinite matrices
8 Appendix: Proofs
9 Conclusion
References

Open access article under the CC BY license.

Keywords

Errors in variables functional model linear regression measurement error model multivariate regression total least squares strong consistency

MSC2010

62J05 62H12

Metrics

since March 2018

1320

Article info
views

628

Full article
views

559

PDF
downloads

178

XML
downloads

RSS

Theorems
14

Theorem 3.1 (Gallo [4], Theorem 2).

Theorem 3.2 (Kukush and Van Huffel [10], Theorem 4a).

Theorem 3.3 (Kukush and Van Huffel [10], Theorem 4b).

Theorem 3.4 (Kukush and Van Huffel [10], Theorem 5b).

Theorem 3.5 (Shklyar [18], Theorem 4.1, generalization of Theorems 3.2 and 3.4).

Theorem 3.6 (Shklyar [18], Theorem 4.2, generalization of Theorem 3.3).

Theorem 3.7 (Shklyar [18], Theorem 4.3).

Theorem 4.1.

Theorem 4.2.

Theorem 6.1 (Simultaneous diagonalization of a definite matrix pair).

Theorem 6.2.

Theorem 6.7.

Theorem 6.8.

Theorem 8.3.

Authors

Abstract

1 Introduction

(1)

Notations

2 The model and the estimator

2.1 Statistical model

(2)

(3)

(4)

(5)

(6)

Example 2.1 (simple univariate linear regression with intercept).

Remark 2.1.

2.2 Total least squares (TLS) estimator

(7)

(8)

(9)

(10)

(11)

(12)

Remark 2.2.

3 Known consistency results

Theorem 3.1 (Gallo [4], Theorem 2).

Theorem 3.2 (Kukush and Van Huffel [10], Theorem 4a).

Theorem 3.3 (Kukush and Van Huffel [10], Theorem 4b).

Theorem 3.4 (Kukush and Van Huffel [10], Theorem 5b).

Theorem 3.5 (Shklyar [18], Theorem 4.1, generalization of Theorems 3.2 and 3.4).

Theorem 3.6 (Shklyar [18], Theorem 4.2, generalization of Theorem 3.3).

Theorem 3.7 (Shklyar [18], Theorem 4.3).

4 Existence and uniqueness of the estimator

Theorem 4.1.

Theorem 4.2.

Remark 4.2-1.

5 Sketch of the proof of Theorems 3.5–3.7

(13)

(14)

(15)

6 Relevant classical results

6.1 Generalized eigenvectors and eigenvalues

Theorem 6.1 (Simultaneous diagonalization of a definite matrix pair).

Theorem 6.2.

(16)

Remark 6.2-1.

(17)

Remark 6.2-2.

Proposition 6.3.

Proof.

(18)

(19)

(20)

6.2 Angle between two linear subspaces

(21)

(22)

(23)

(24)

Lemma 6.4.

Proof.

6.3 Perturbation of eigenvectors and invariant spaces

Lemma 6.5.

Remark 6.5-1.

(25)

Lemma 6.6.

(26)

6.4 Rosenthal inequality

Theorem 6.7.

Theorem 6.8.

Proof.

7 Generalized eigenvalue problem for positive semidefinite matrices

Lemma 7.1.

(27)

Remark 7.1-1.

Remark 7.1-2.

(28)

(29)

Proposition 7.2.

(30)

(31)

(32)

(33)