Modern Stochastics: Theory and Applications logo


  • Help
Login Register

  1. Home
  2. Issues
  3. Volume 8, Issue 4 (2021)
  4. Modeling temporally uncorrelated compone ...

Modern Stochastics: Theory and Applications

Submit your article Information Become a Peer-reviewer
  • Article info
  • Full article
  • Cited by
  • More
    Article info Full article Cited by

Modeling temporally uncorrelated components of complex-valued stationary processes
Volume 8, Issue 4 (2021), pp. 475–508
Niko Lietzén ORCID icon link to view author Niko Lietzén details   Lauri Viitasaari   Pauliina Ilmonen  

Authors

 
Placeholder
https://doi.org/10.15559/21-VMSTA190
Pub. online: 10 November 2021      Type: Research Article      Open accessOpen Access

Received
4 June 2021
Revised
6 October 2021
Accepted
6 October 2021
Published
10 November 2021

Abstract

A complex-valued linear mixture model is considered for discrete weakly stationary processes. Latent components of interest are recovered, which underwent a linear mixing. Asymptotic properties are studied of a classical unmixing estimator which is based on simultaneous diagonalization of the covariance matrix and an autocovariance matrix with lag τ. The main contributions are asymptotic results that can be applied to a large class of processes. In related literature, the processes are typically assumed to have weak correlations. This class is extended, and the unmixing estimator is considered under stronger dependency structures. In particular, the asymptotic behavior of the unmixing estimator is estimated for both long- and short-range dependent complex-valued processes. Consequently, this theory covers unmixing estimators that converge slower than the usual $\sqrt{T}$ and unmixing estimators that produce non-Gaussian asymptotic distributions. The presented methodology is a powerful preprocessing tool and highly applicable in several fields of statistics.

1 Introduction

In modern statistics, there is an increasing demand for theoretically solid methodology which can be applied beyond standard real-valued data, in the realm of more exotic data structures. In this paper, we consider a complex-valued linear mixture model based on temporally uncorrelated components. We consider the problem under discrete weakly stationary processes. We aim to find latent processes of interest when only linear mixtures of them are observable. The recovery of the latent processes is referred to as the unmixing procedure. The properties of the recovered processes themselves can be the source of interest, or alternatively, the approach can be used to reduce multivariate models into several univariate models, which can simplify the modeling burden.
In the context of linear mixture models, the assumption of uncorrelated components or stronger conditions that imply uncorrelated components is considered to be natural in several applications, for example, in finance and signal processing, see the article [14] and the book [10]. Applications where the observed time series are naturally complex-valued are frequent in, e.g., signal processing and biomedical applications, see [19, 41]. In such applications, the interest can lie in the shapes of the unobservable signals, such as the shapes of three-dimensional image valued signals, which correspond to different parts of the brain under different activities. In the signal processing literature, the problem is often referred to as the blind source separation (BSS) problem. In the BSS literature, it has been argued that discarding the special complex-valued structure can lead to loss of information, see [1]. Thus, it is often beneficial to natively work with complex-valued signals in such applications. We wish to emphasize that our ${\mathbb{C}^{d}}$-valued model does not directly correspond to existing ${\mathbb{R}^{2d}}$ valued models, e.g., the one in [23]. In ${\mathbb{R}^{2d}}$ valued BSS, it is assumed that all $2d$ components are uncorrelated. In our model, the real part and the corresponding complex part are allowed to be correlated. For a collection of BSS applications, see [10].
In parallel to the signal processing community, linear mixture models, with similar model assumptions as in BSS literature, have recently received notable attention in finance, see for example [14, 22]. In financial applications, the term blind source separation is rarely used, and usually more descriptive model naming conventions are utilized. Note that our complex-valued approach is highly applicable in many real-valued financial applications. In our approach, we assume little concerning the relationship between the real and imaginary parts of a single component. Thus, from the real-valued perspective, the problem can be equivalently considered as modeling real-valued temporally uncorrelated pairs, where the elements contained in a single pair are not necessarily uncorrelated. However, transforming this complex-valued problem into a real-valued counterpart is often unfeasible, as it usually introduces additional constraints that would be otherwise embedded within the complex-valued formulation. Once the temporally uncorrelated components are recovered, one can then, for example, model volatilities bivariately (or univariately) or assess risk by modeling tail behavior with the tools of bivariate (or univariate) extreme value theory. Moreover, our approach is natural in applications where a single observation is vector valued, that is, the observations have both a magnitude and a direction, e.g., modeling the latent components of wind at an airport.
The main focus in this paper is on asymptotic behavior of a classical unmixing estimator. We consider an algorithm that is identical to the so-called Algorithm for Multiple Unknown Signals Extraction (AMUSE), [35]. In the financial side, asymptotic properties of an alternative approach, with slightly differing model assumptions compared to our approach, are given in, e.g., [14]. In the context of real-valued BSS, asymptotics have been considered in, e.g., [23–25]. Compared to the above mentioned approaches, we consider a substantially wider class of processes. In particular, we analyze the asymptotic behavior of the corresponding estimators under both long- and short-range dependent complex-valued processes. Furthermore, we take a semiparametric approach, that is, our theory is not limited to specific parametric family of distributions. As a pinnacle of the novel theory presented in this paper, we consider square-integrable processes without summable autocovariance structures, which causes the limiting distribution of the corresponding unmixing estimator to be non-Gaussian. Instead of using the classical central limit theorem, we take a modern approach and utilize general central limit theorem type results that are also applicable for processes with strong temporal dependencies. Moreover, we consider convergence rates that differ from the usual $\sqrt{T}$. We wish to emphasize that modeling long-range dependent processes, that is, processes without summable autocovariance structures, are of paramount importance in many financial applications.
The paper is structured as follows. In Section 2, we recall some basic theory regarding complex-valued random variables. In Section 3, we present the temporally uncorrelated components mixing model. In Section 4, we formulate the estimation procedure and study the asymptotic properties of the estimators. In Section 5, we consider a data example involving photographs, which can be presented as complex-valued time series. The proofs for the theoretical results are presented in the supplementary Appendix.

2 Random variables in the complex plane

Throughout, let $(\Omega ,\mathcal{F},\mathbb{P})$ be a common probability space and let ${\textbf{z}_{\text{•}}}:={({\textbf{z}_{t}})_{t\in \mathbb{N}}}$ be a collection of random variables ${\textbf{z}_{t}}:\Omega \to {\mathbb{C}^{d}}$, $t\in \mathbb{N}=\{1,2,\dots \}$, where the dimension parameter d is a finite constant. Furthermore, let ${z_{t}^{(k)}}={\text{pr}_{k}}\circ {\textbf{z}_{t}}$, where ${\text{pr}_{k}}$ is a projection to the kth complex coordinate. We refer to the process ${z_{\text{•}}^{(k)}}:={({z_{t}^{(k)}})_{t\in \mathbb{N}}}$ as the kth component of the ${\mathbb{C}^{d}}$-valued stochastic process ${\textbf{z}_{\text{•}}}$. Note that the components can be expressed in the form ${z_{t}^{(k)}}={a_{t}^{(k)}}+i{b_{t}^{(k)}}$, where ${a_{t}^{(k)}}$, ${b_{t}^{(k)}}$ are real-valued $\forall k\in \left\{1,2,\dots ,d\right\}$ and i is the imaginary unit. We denote the complex conjugate of ${\textbf{z}_{t}}$ as ${\textbf{z}_{t}^{\ast }}$ and we denote the conjugate transpose of ${\textbf{z}_{t}}$ as ${\textbf{z}_{t}^{\text{H}}}:={({\textbf{z}_{t}^{\ast }})^{\top }}$. Furthermore, we use $\mathcal{B}({\mathbb{C}^{d}})$ and $\mathcal{B}({\mathbb{R}^{2d}})$ to denote the Borel-sigma algebras on ${\mathbb{C}^{d}}$ and ${\mathbb{R}^{2d}}$, respectively.
We use the following notation for the multivariate mean and the (unsymmetrized) autocovariance matrix with lag $\tau \in \left\{0\right\}\cup \mathbb{N}$:
\[\begin{aligned}{}& {\boldsymbol{\mu }_{{\textbf{z}_{t}}}}=\mathbb{E}\left[{\textbf{z}_{t}}\right]\in {\mathbb{C}^{d}},\hspace{1em}{\ddot{\textbf{S}}_{\tau }}\left[{\textbf{z}_{t}}\right]=\mathbb{E}\left[\left({\textbf{z}_{t}}-{\boldsymbol{\mu }_{{\textbf{z}_{t}}}}\right){\left({\textbf{z}_{t+\tau }}-{\boldsymbol{\mu }_{{\textbf{z}_{t+\tau }}}}\right)^{\text{H}}}\right]\in {\mathbb{C}^{d\times d}}.\end{aligned}\]
Furthermore, we use the notation ${\textbf{S}_{\tau }}\left[{\textbf{z}_{t}}\right]=\frac{1}{2}[{\ddot{\textbf{S}}_{\tau }}[{\textbf{z}_{t}}]+{({\ddot{\textbf{S}}_{\tau }}[{\textbf{z}_{t}}])^{\text{H}}}]\in {\mathbb{C}^{d\times d}}$ for the symmetrized autocovariance matrix with lag $\tau \in \left\{0\right\}\cup \mathbb{N}$.
In the case of univariate stochastic processes, we use ${\ddot{\text{S}}_{\tau }}[{x_{t}},{y_{s}}]$ to denote ${\ddot{\text{S}}_{\tau }}[{x_{t}},{y_{s}}]=\mathbb{E}\left[({x_{t}}-{\mu _{{x_{t}}}}){({y_{s+\tau }}-{\mu _{{y_{s+\tau }}}})^{\ast }}\right]$. Note that the unsymmetrized autocovariance matrix ${\ddot{\textbf{S}}_{\tau }}$ is not necessarily conjugate symmetric, i.e., Hermite symmetric or Hermitian, and it can have complex-valued diagonal entries. In contrast, the symmetrized autocovariance matrix ${\textbf{S}_{\tau }}$ is always conjugate symmetric, that is, ${\textbf{S}_{\tau }}={\textbf{S}_{\tau }^{\text{H}}}$, and the diagonal entries of the symmetrized autocovariance matrix are by definition real-valued.
In the literature, the definition of stationarity for complex-valued stochastic processes varies. In this paper, we use the following definition.
Definition 1.
The ${\mathbb{C}^{d}}$-valued process ${\textbf{z}_{\text{•}}}:={\left({\textbf{z}_{t}}\right)_{t\in \mathbb{N}}}$ is weakly stationary if the components of ${\textbf{z}_{\text{•}}}$ are square-integrable and if for any $\tau \in \left\{0\right\}\cup \mathbb{N}$ and for any pair $\left(u,v\right)\in \mathbb{N}\times \mathbb{N}$, we have that $\mathbb{E}[{\textbf{z}_{u}}]=\mathbb{E}[{\textbf{z}_{v}}]$ and ${\ddot{\textbf{S}}_{\tau }}[{\textbf{z}_{u}}]={\ddot{\textbf{S}}_{\tau }}[{\textbf{z}_{v}}]$.
Let $\textbf{Z}:={\left({\textbf{Z}_{j}}\right)_{j\in \mathcal{T}}}$, $\mathcal{T}=\left\{1,2,\dots ,T\right\}$, be a sampled process generated by ${\textbf{z}_{\text{•}}}$, where ${\textbf{Z}_{j}}$ is ${\mathbb{C}^{d}}$-valued for every $j\in \mathcal{T}$. We use the following classical finite sample estimators for the mean vector and the autocovariance matrix with lag $\tau \in \left\{0,1,\dots ,T-1\right\}$:
\[\begin{array}{r@{\hskip0pt}l@{\hskip0pt}r}& \displaystyle \hat{\boldsymbol{\mu }}[\textbf{Z}]=\frac{1}{T}{\sum \limits_{j=1}^{T}}{\textbf{Z}_{j}},\hspace{1em}\hspace{2em}& \displaystyle {\tilde{\textbf{S}}_{\tau }}\left[\textbf{Z}\right]=\frac{1}{T-\tau }{\sum \limits_{j=1}^{T-\tau }}\left({\textbf{Z}_{j}}-\hat{\boldsymbol{\mu }}[\textbf{Z}]\right){\left({\textbf{Z}_{j+\tau }}-\hat{\boldsymbol{\mu }}[\textbf{Z}]\right)^{\text{H}}},\end{array}\]
where for $\tau =0$, the scaling used in ${\tilde{\textbf{S}}_{\tau }}$ is $1/(T-1)$. Furthermore, the symmetrized autocovariance matrix estimator with lag $\tau \in \left\{0,1,\dots ,T-1\right\}$ is defined as ${\hat{\textbf{S}}_{\tau }}\left[\textbf{Z}\right]=\frac{1}{2}[{\tilde{\textbf{S}}_{\tau }}[\textbf{Z}]+{({\tilde{\textbf{S}}_{\tau }}[\textbf{Z}])^{\text{H}}}]$. Note that the symmetrized autocovariance matrix estimator is always conjugate symmetric.
Let ${\textbf{y}_{1}}$ and ${\textbf{y}_{2}}$ be ${\mathbb{R}^{d}}$-valued random vectors such that the concatenated random vector, that is, ${\left(\begin{array}{c@{\hskip10.0pt}c}{\textbf{y}_{1}^{\top }}& {\textbf{y}_{2}^{\top }}\end{array}\right)^{\top }}$, follows a ${\mathbb{R}^{2d}}$-valued Gaussian distribution. Then, the ${\mathbb{C}^{d}}$-valued random vector $\textbf{y}={\textbf{y}_{1}}+i{\textbf{y}_{2}}$ follows the ${\mathbb{C}^{d}}$-valued Gaussian distribution with the location parameter ${\boldsymbol{\mu }_{\textbf{y}}}$, the covariance matrix ${\boldsymbol{\Sigma }_{\textbf{y}}}$ and the relation matrix ${\textbf{P}_{\textbf{y}}}$, defined as
\[\begin{aligned}{}& {\boldsymbol{\mu }_{\textbf{y}}}=\mathbb{E}\left[\textbf{y}\right]=\mathbb{E}\left[{\textbf{y}_{1}}\right]+i\mathbb{E}\left[{\textbf{y}_{2}}\right]\in {\mathbb{C}^{d}},\hspace{1em}{\boldsymbol{\Sigma }_{\textbf{y}}}=\mathbb{E}\left[\left(\textbf{y}-{\boldsymbol{\mu }_{\textbf{y}}}\right){\left(\textbf{y}-{\boldsymbol{\mu }_{\textbf{y}}}\right)^{\text{H}}}\right]\in {\mathbb{C}^{d\times d}},\\ {} & {\textbf{P}_{\textbf{y}}}=\mathbb{E}\left[\left(\textbf{y}-{\boldsymbol{\mu }_{\textbf{y}}}\right){\left(\textbf{y}-{\boldsymbol{\mu }_{\textbf{y}}}\right)^{\top }}\right]\in {\mathbb{C}^{d\times d}}.\end{aligned}\]
We distinguish between complex- and real-valued Gaussian distributions with the number of given parameters — ${\mathcal{N}_{d}}({\boldsymbol{\mu }_{y}},\boldsymbol{\Sigma })$ denotes a d-variate real-valued Gaussian distribution and ${\mathcal{N}_{d}}({\boldsymbol{\mu }_{y}},\boldsymbol{\Sigma },\textbf{P})$ denotes a d-variate complex-valued Gaussian distribution.
The classical central limit theorem (CLT) for independent and identically distributed complex-valued random variables is given as follows. Let $\{{\textbf{x}_{1}},{\textbf{x}_{2}},\dots ,{\textbf{x}_{n}}\}$ be a collection of i.i.d. ${\mathbb{C}^{d}}$-vectors with square-integrable components, mean ${\boldsymbol{\mu }_{\textbf{x}}}$, covariance matrix Σ and relation matrix $\textbf{P}$. Then,
\[ \frac{1}{\sqrt{n}}{\sum \limits_{j=1}^{n}}\left({\textbf{x}_{j}}-{\boldsymbol{\mu }_{\textbf{x}}}\right){\xrightarrow[n\to \infty ]{\mathcal{D}}}\textbf{x}\sim {\mathcal{N}_{d}}\left(\textbf{0},\boldsymbol{\Sigma },\textbf{P}\right),\]
where ${\xrightarrow[]{\mathcal{D}}}$ denotes convergence in distribution. We can harness the rich literature on real-valued limiting theorems via the following lemma.
Lemma 1.
Let ${\{{\boldsymbol{v}_{n}}\}_{n\in \mathbb{N}}}$ be a collection of ${\mathbb{R}^{2d}}$-valued random vectors ${\boldsymbol{v}_{j}^{\top }}=\left(\begin{array}{c@{\hskip10.0pt}c}{\boldsymbol{x}_{j}^{\top }}& {\boldsymbol{y}_{j}^{\top }}\end{array}\right)$, where ${\boldsymbol{x}_{j}}$, ${\boldsymbol{y}_{j}}$ are ${\mathbb{R}^{d}}$-valued for every $j\in \mathbb{N}$, and let ${\boldsymbol{v}^{\top }}=\left(\begin{array}{c@{\hskip10.0pt}c}{\boldsymbol{x}^{\top }}& {\boldsymbol{y}^{\top }}\end{array}\right)$. Then, ${\boldsymbol{v}_{n}}{\xrightarrow[n\to \infty ]{\mathcal{D}}}\boldsymbol{v}$ if and only if ${\boldsymbol{x}_{n}}+i{\boldsymbol{y}_{n}}{\xrightarrow[n\to \infty ]{\mathcal{D}}}\boldsymbol{x}+i\boldsymbol{y}$.
Lemma 1 can be, for example, applied to Gaussian vectors in a straightforward manner, see Corollary 1 in the supplementary Appendix.
In our work, we utilize Lemma 1 in order to apply real-valued results for complex-valued scenarios, but we wish to emphasize that it also works in the other way — the complex-valued asymptotic distribution automatically grants the corresponding real-valued limiting distribution. For additional details regarding complex-valued statistics, see, e.g., [13, 15, 31].

3 Temporally uncorrelated components model

In this section, we consider a linear temporally uncorrelated components model for discrete time complex-valued processes. We define the following version of the model.
Definition 2.
The ${\mathbb{C}^{d}}$-process ${\textbf{x}_{\text{•}}}:={\left({\textbf{x}_{t}}\right)_{t\in \mathbb{N}}}$ follows the temporally uncorrelated components mixing model, if
\[ {\textbf{x}_{t}}=\textbf{A}{\textbf{z}_{t}}+{\boldsymbol{\mu }_{\textbf{x}}},\hspace{1em}\forall t\in \mathbb{N},\]
where $\textbf{A}$ is a constant nonsingular ${\mathbb{C}^{d\times d}}$-matrix, ${\boldsymbol{\mu }_{\textbf{x}}}\in {\mathbb{C}^{d}}$ is a constant location parameter and ${\textbf{z}_{\text{•}}}$ is a ${\mathbb{C}^{d}}$-process with components having continuous marginal distributions. In addition, the process ${\textbf{z}_{\text{•}}}$ satisfies the following four conditions for every $t,\kappa \in \mathbb{N}$,
\[\begin{aligned}{}& (1)\hspace{2.5pt}{\boldsymbol{\mu }_{{\textbf{z}_{t}}}}=\textbf{0},\hspace{2em}(2)\hspace{2.5pt}{\textbf{S}_{0}}\left[{\textbf{z}_{t}}\right]={\textbf{I}_{d}},\\ {} & (3)\hspace{2.5pt}{\ddot{\textbf{S}}_{\kappa }}\left[{\textbf{z}_{t}}\right]\hspace{2.5pt}\text{is finite and depends only on}\hspace{2.5pt}\kappa ,\\ {} & (4)\hspace{2.5pt}\exists \tau \in \mathbb{N}:{\textbf{S}_{\tau }}\left[{\textbf{z}_{t}}\right]={\boldsymbol{\Lambda }_{\tau }}=\text{diag}\left({\lambda _{\tau }^{(1)}},\dots ,{\lambda _{\tau }^{(d)}}\right),\end{aligned}\]
such that $+\infty >{\lambda _{\tau }^{(1)}}>{\lambda _{\tau }^{(2)}}>\cdots >{\lambda _{\tau }^{(d)}}>-\infty $.
To improve the fluency of the paper, from hereon, the term mixing model is used to refer to the temporally uncorrelated components mixing model. The conditions (1)–(4) of Definition 2 imply that the latent process ${\textbf{z}_{\text{•}}}$ is weakly stationary in the sense of Definition 1. Note that under these model assumptions the concatenated ${\mathbb{R}^{2d}}$-process ${\left(\begin{array}{c@{\hskip10.0pt}c}\text{Re}({\textbf{z}_{\text{•}}^{\top }})& \text{Im}({\textbf{z}_{\text{•}}^{\top }})\end{array}\right)^{\top }}$ is not necessarily weakly stationary in the classical real-valued sense. One could also require that the concatenated vector is stationary, which can be done by adding to the model a condition involving the complex-valued relation matrix, defined in Section 2. However, adding the extra condition would also fix the relationship, up to heterogeneous sign-changes, between the real and imaginary parts inside a single component. In this paper, we do not wish to make assumptions regarding the complex phase of a single component. Consequently, many of the following results involve a so-called phase-shift matrix $\textbf{J}$, that is, a complex-valued diagonal matrix with diagonal entries of the form $\exp (i{\theta _{1}}),\dots ,\exp (i{\theta _{d}})$. A phase-shift matrix $\textbf{J}$ is by definition unitary, i.e., $\textbf{J}{\textbf{J}^{\text{H}}}={\textbf{J}^{\text{H}}}\textbf{J}={\textbf{I}_{d}}$. Note that all possible phase-shift matrices can be constructed by considering phases on some suitable $2\pi $-length interval, e.g., the usual $[0,2\pi )$.
Condition (3) is included in Definition 2 to ensure that we can apply limiting results that require the traditional real-valued weak stationarity. Since Condition (3) is solely used to guarantee stationarity, the only requirements for ${\ddot{\textbf{S}}_{\kappa }}\left[{\textbf{z}_{t}}\right]$ are that it has finite elements and that it is invariant with respect to the time parameter t.
For a process ${\textbf{x}_{\text{•}}}$ that follows the mixing model, the corresponding unmixing problem is to uncover the latent process ${\textbf{z}_{\text{•}}}$ by using only the information contained in the observable process ${\textbf{x}_{\text{•}}}$.
Definition 3.
Let ${\textbf{x}_{\text{•}}}:={\left({\textbf{x}_{t}}\right)_{t\in \mathbb{N}}}$ be a process that satisfies Definition 2, such that condition (4) holds for some fixed τ. The affine functional $g:\textbf{a}\mapsto \boldsymbol{\Gamma }(\textbf{a}-\boldsymbol{\mu }):{\mathbb{C}^{d}}\to {\mathbb{C}^{d}}$ is a solution to the corresponding unmixing problem if the process $g\circ {\textbf{x}_{\text{•}}}$ satisfies conditions (1)–(3) and condition (4) is satisfied with the fixed τ.
Given a process ${\textbf{x}_{\text{•}}}$ that follows the mixing model and a corresponding solution $g:\textbf{a}\mapsto \boldsymbol{\Gamma }(\textbf{a}-\boldsymbol{\mu })$, we refer to $\boldsymbol{A}$, see Definition 2, as the mixing matrix and we refer to Γ as the unmixing matrix. The objective of the unmixing matrix is to reverse the effects of the mixing matrix. However, under our model assumptions, we cannot uniquely determine the relationship between the mixing and the unmixing matrix. The relationship between the two matrices is $\boldsymbol{\Gamma }\textbf{A}=\textbf{J}$, where $\textbf{J}\in {\mathbb{C}^{d\times d}}$ is some phase-shift matrix, see Lemma 3 in the supplementary Appendix.
Under our assumptions, we cannot distinguish between solutions that contain unmixing matrices that are a phase-shift away from each other. Thus, we say that two solutions ${g_{1}}:\textbf{a}\mapsto {\boldsymbol{\Gamma }_{1}}(\textbf{a}-{\boldsymbol{\mu }_{1}})$ and ${g_{2}}:\textbf{a}\mapsto {\boldsymbol{\Gamma }_{2}}(\textbf{a}-{\boldsymbol{\mu }_{2}})$ are equivalent, if there exists a phase-shift matrix $\textbf{J}$ such that ${\boldsymbol{\Gamma }_{1}}={\boldsymbol{\Gamma }_{2}}\textbf{J}$.
We next provide a solution procedure for the unmixing problem. Recall that in the mixing model we assume that the mixing matrix $\textbf{A}$ and the latent process ${\textbf{z}_{\text{•}}}$ are unobservable. Therefore, the solution procedure relies solely on statistics of the observable process ${\textbf{x}_{\text{•}}}$.
Theorem 1.
Let ${\boldsymbol{x}_{\text{•}}}:={\left({\boldsymbol{x}_{t}}\right)_{t\in \mathbb{N}}}$ be a process that satisfies Definition 2. The functional $g:\boldsymbol{a}\mapsto \boldsymbol{\Gamma }(\boldsymbol{a}-\boldsymbol{\mu }):{\mathbb{C}^{d}}\to {\mathbb{C}^{d}}$ is a solution to the corresponding unmixing problem if and only if the eigenvector equation
(1)
\[\begin{aligned}{}& {\left({\boldsymbol{S}_{0}}\left[{\boldsymbol{x}_{t}}\right]\right)^{-1}}{\boldsymbol{S}_{\tau }}\left[{\boldsymbol{x}_{t}}\right]{\boldsymbol{\Gamma }^{\text{H}}}={\boldsymbol{\Gamma }^{\text{H}}}{\boldsymbol{\Lambda }_{\tau }},\hspace{1em}\forall t\in \mathbb{N},\end{aligned}\]
and the scaling equation
(2)
\[\begin{aligned}{}& \boldsymbol{\Gamma }{\boldsymbol{S}_{0}}\left[{\boldsymbol{x}_{t}}\right]{\boldsymbol{\Gamma }^{\text{H}}}={\boldsymbol{I}_{d}},\hspace{1em}\forall t\in \mathbb{N},\end{aligned}\]
are satisfied.
One can apply Theorem 1 to find solutions by first estimating the eigenvalues and eigenvectors of ${\left({\textbf{S}_{0}}\left[{\textbf{x}_{t}}\right]\right)^{-1}}{\textbf{S}_{\tau }}\left[{\textbf{x}_{t}}\right]$ and then scaling the eigenvectors so that the scaling equation is satisfied. In addition, see Corollary 2 in the supplementary Appendix.

4 Estimation and asymptotic properties

The algorithm for multiple unknown signals extraction (AMUSE), [35], is a widely applied blind source separation unmixing procedure. AMUSE and the corresponding asymptotic properties have been previously studied in the real-valued case, see, e.g, [23]. Here, we adopt the estimation part of the AMUSE algorithm. However, our underlying model assumptions are not identical to the ones in [23] and [35]. In this section, we consider the consistency and the limiting distribution of the corresponding unmixing estimator under complex-valued long-range and short-range dependent processes, significantly extending the results given in [23].
Definition 4.
Let ${\textbf{x}_{\text{•}}}:={\left({\textbf{x}_{t}}\right)_{t\in \mathbb{N}}}$ be a process that satisfies Definition 2 and let $\textbf{X}$ be a ${\mathbb{C}^{T\times d}}$-valued, $1\le d<T<\infty $, sampled stochastic process generated by ${\textbf{x}_{\text{•}}}$. Let $\hat{\boldsymbol{\mu }}:=\hat{\boldsymbol{\mu }}[\textbf{X}]$ and let ${\textbf{1}_{T}}$ be a ${\mathbb{R}^{T}}$-vector full of ones. The mapping $\hat{g}:\textbf{C}\mapsto (\textbf{C}-{\textbf{1}_{T}}{\hat{\boldsymbol{\mu }}^{\top }}){\hat{\boldsymbol{\Gamma }}^{\top }}:{\mathbb{C}^{T\times d}}\to {\mathbb{C}^{T\times d}}$ is a solution to the finite sample unmixing problem, if ${\hat{\lambda }_{\tau }^{(1)}}\ge \cdots \ge {\hat{\lambda }_{\tau }^{(d)}}$, such that
\[ \hat{\boldsymbol{\Gamma }}{\hat{\textbf{S}}_{0}}[\textbf{X}]{\hat{\boldsymbol{\Gamma }}^{\text{H}}}={\textbf{I}_{d}}\hspace{2em}\hspace{2.5pt}\text{and}\hspace{2.5pt}\hspace{2em}\hat{\boldsymbol{\Gamma }}{\hat{\textbf{S}}_{\tau }}[\textbf{X}]{\hat{\boldsymbol{\Gamma }}^{\text{H}}}={\hat{\boldsymbol{\Lambda }}_{\tau }}=\text{diag}\left({\hat{\lambda }_{\tau }^{(1)}},\dots ,{\hat{\lambda }_{\tau }^{(d)}}\right).\]
Recall that ${\hat{\textbf{S}}_{\tau }}$ is the symmetrized autocovariance matrix, and it is, by definition, conjugate symmetric. In addition, recall that the diagonal entries and the eigenvalues of a conjugate symmetric matrix are, again by definition, real-valued. Hereby, the ordering of ${\hat{\lambda }_{\tau }^{(j)}}$ is always well-defined. The diagonal elements (eigenvalues) ${\hat{\lambda }_{\tau }^{(j)}}$ can be seen as finite sample estimates of ${\lambda _{\tau }^{(j)}}$. Even though the population eigenvalues ${\lambda _{\tau }^{(j)}}$ are assumed to be distinct, it is possible, due to, e.g., limited computational accuracy, that ties occur among the estimates ${\hat{\lambda }_{\tau }^{(j)}}$. Hereby, in the finite sample case, strict inequalities are not imposed on the estimates ${\hat{\lambda }_{\tau }^{(j)}}$ in Definition 4.
Using Lemma 4 of the supplementary Appendix, we can straightforwardly implement the unmixing procedure. The first step is to calculate ${\hat{\textbf{S}}_{0}}:={\hat{\textbf{S}}_{0}}[\textbf{X}]$, from a realization $\textbf{X}$, and the corresponding conjugate symmetric inverse square root ${\hat{\boldsymbol{\Sigma }}_{0}}:={\hat{\textbf{S}}_{0}^{-1/2}}$. Recall that, by assumption, the components of ${\textbf{z}_{\text{•}}}$ have continuous marginal distributions and the mixing matrix is nonsingular. Thus, the eigenvalues of covariance matrix estimates are always real-valued and almost surely positive. Hereby, the matrix ${\hat{\boldsymbol{\Sigma }}_{0}}$ can be obtained by estimating the eigendecomposition of ${\hat{\textbf{S}}_{0}}$. The next step is to choose a lag-parameter τ and estimate the eigendecomposition ${\hat{\textbf{S}}_{\tau }}[\textbf{X}{\hat{\boldsymbol{\Sigma }}_{0}^{\top }}]={\hat{\boldsymbol{\Sigma }}_{0}}{\hat{\textbf{S}}_{\tau }}[\textbf{X}]{\hat{\boldsymbol{\Sigma }}_{0}}=\hat{\textbf{V}}{\hat{\boldsymbol{\Lambda }}_{\tau }}{\hat{\textbf{V}}^{\text{H}}}$. The covariance matrix and autocovariance matrix estimators are affine equivariant in the sense that ${\hat{\textbf{S}}_{j}}[\textbf{X}{\textbf{C}^{\top }}]=\textbf{C}{\hat{\textbf{S}}_{j}}[\textbf{X}]{\textbf{C}^{\text{H}}}$, $j\in \{0,\tau \}$, for all nonsingular ${\mathbb{C}^{d\times d}}$-matrices $\textbf{C}$. An estimate for the latent process can then be found via the mapping $(\textbf{X}-{\textbf{1}_{T}}{\hat{\boldsymbol{\mu }}^{\top }}){({\hat{\textbf{V}}^{\text{H}}}{\hat{\boldsymbol{\Sigma }}_{0}})^{\top }}$, where $\hat{\boldsymbol{\mu }}$ is the sample mean vector of $\textbf{X}$.
In practice, one should choose the lag parameter $\tau \ne 0$ such that the diagonal elements of ${\hat{\boldsymbol{\Lambda }}_{\tau }}$ are as distinct as possible. Strategies for choosing τ are discussed, e.g., in [9]. From a computational perspective, the estimation procedure is relatively fast. Hereby, in practice, one should try several different values of τ and study ${\hat{\boldsymbol{\Lambda }}_{\tau }}$.
We emphasize that one can apply the theory of this paper to other estimators as well. That is, under minor model assumptions, the estimators ${\hat{\textbf{S}}_{0}}$, ${\hat{\textbf{S}}_{\tau }}$ can be replaced with any matrix-valued estimators that have the so-called complex-valued affine equivariance property, see [16].
The conditions given in Definition 4 remain true if we replace $\hat{\boldsymbol{\Gamma }}$ with $\textbf{J}\hat{\boldsymbol{\Gamma }}$, where $\textbf{J}\in {\mathbb{C}^{d\times d}}$ can be any phase-shift matrix. Thus, as in the population level, we say that the estimates ${\hat{\boldsymbol{\Gamma }}_{1}}$ and ${\hat{\boldsymbol{\Gamma }}_{2}}$ are equivalent if ${\hat{\boldsymbol{\Gamma }}_{1}}=\textbf{J}{\hat{\boldsymbol{\Gamma }}_{2}}$ for some phase-shift matrix $\textbf{J}$.
Justified by the affine invariance property given in supplementary Appendix Lemmas 5 and 6, we can, without loss of generality, derive the rest of the theory under the assumption of trivial mixing, that is, in the case when the mixing matrix is $\textbf{A}={\textbf{I}_{d}}$. See also the beginning of the Proof of Lemma 2.

4.1 Limiting behavior of the AMUSE estimator

We next consider limiting properties of the finite sample solutions. Note that the finite sample statistics and the sampled stochastic process $\textbf{X}$ depend on the sample size T. In the framework of this paper, our usage of the notation is equivalent with the one presented in [38]. We use the short expression ${\textbf{X}_{T}}={o_{p}}(1)$ to denote that a sequence of ${\mathbb{C}^{d\times d}}$-matrices ${\textbf{X}_{1}},{\textbf{X}_{2}}\dots $, converges in probability to a zero matrix. In addition, we use ${\textbf{Y}_{\beta }}={\mathcal{O}_{p}}(1)$ to denote that a collection of ${\mathbb{C}^{d\times d}}$-matrices $\{{\textbf{Y}_{\beta }}:\beta \in B\}$ is uniformly tight under some nonempty indexing set B. By saying that a matrix is uniformly tight, we mean that real part and imaginary part of every element in the matrix is uniformly tight.
Lemma 2.
Let ${\boldsymbol{x}_{\text{•}}}:={\left({\boldsymbol{x}_{t}}\right)_{t\in \mathbb{N}}}$ be a process that satisfies Definition 2 and let $\boldsymbol{x}$ be a ${\mathbb{C}^{T\times d}}$-valued, $1\le d<T<\infty $, sampled stochastic process generated by ${\boldsymbol{x}_{\text{•}}}$ and let $\hat{g}:\boldsymbol{C}\mapsto (\boldsymbol{C}-{\boldsymbol{1}_{T}}{\hat{\boldsymbol{\mu }}^{\top }}){\hat{\boldsymbol{\Gamma }}^{\top }}$ be a T-indexed sequence of corresponding finite sample solutions. Furthermore, let ${\alpha _{T}}(\hat{{\boldsymbol{S}_{0}}}[\boldsymbol{x}]-{\boldsymbol{S}_{0}}[{\boldsymbol{x}_{t}}])={\mathcal{O}_{p}}(1)$ and ${\beta _{T}}(\hat{{\boldsymbol{S}_{\tau }}}[\boldsymbol{x}]-{\boldsymbol{S}_{\tau }}[{\boldsymbol{x}_{t}}])={\mathcal{O}_{p}}(1)$, for some real-valued sequences ${\alpha _{T}}$ and ${\beta _{T}}$, which satisfy ${\alpha _{T}}\uparrow \infty $, ${\beta _{T}}\uparrow \infty $ as $T\to \infty $. Then, for ${\gamma _{t}}=\min ({\alpha _{t}},{\beta _{t}})$, there exists a sequence of T-indexed phase-shift matrices $\hat{\boldsymbol{J}}$, such that
\[ {\gamma _{T}}\left(\hat{\boldsymbol{J}}\hat{\boldsymbol{\Gamma }}-\boldsymbol{\Gamma }\right)={\mathcal{O}_{p}}(1)\]
holds asymptotically.
Hereby, under the assumptions of Lemma 2, there exists a sequence of T-indexed matrices $\hat{\textbf{J}}$ such that $\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}$ converges in probability to Γ.
Theorem 2.
Let $\boldsymbol{x}$, g and $\hat{g}$ be defined as in Lemma 2 and denote the element $(j,k)$ of $\hat{{\boldsymbol{S}_{\tau }}}[\boldsymbol{x}]$ as ${[\hat{{\boldsymbol{S}_{\tau }}}]_{jk}}$. Then, under the assumptions of Lemma 2 and the trivial mixing scenario $\boldsymbol{a}={\boldsymbol{I}_{d}}$, there exists a sequence of T-indexed phase-shift matrices $\hat{\boldsymbol{J}}$, such that
\[\begin{aligned}{}& {\gamma _{T}}\left({\hat{\boldsymbol{J}}_{jj}}{\hat{\boldsymbol{\Gamma }}_{jj}}-1\right)=\frac{{\gamma _{T}}}{2}\left({\left[\hat{{\boldsymbol{S}_{0}}}\right]_{jj}}-1\right)+{\mathcal{O}_{p}}(1/{\gamma _{T}}),\hspace{1em}\forall j\in \{1,\dots ,d\},\hspace{1em}\textit{and}\\ {} & {\gamma _{T}}\left({\lambda _{\tau }^{(k)}}-{\lambda _{\tau }^{(j)}}\right){\hat{\boldsymbol{J}}_{jj}}{\hat{\boldsymbol{\Gamma }}_{jk}}={\gamma _{T}}\left({\lambda _{\tau }^{(j)}}{\left[\hat{{\boldsymbol{S}_{0}}}\right]_{jk}}-{\left[\hat{{\boldsymbol{S}_{\tau }}}\right]_{jk}}\right)+{\mathcal{O}_{p}}(1/{\gamma _{T}}),\hspace{1em}j\ne k.\end{aligned}\]
By Theorem 2, we can directly find the asymptotic distribution of the unmixing matrix estimator $\hat{\boldsymbol{\Gamma }}$, if we have the asymptotic distributions and the convergence rates of the estimators ${\hat{\textbf{S}}_{0}}$ and ${\hat{\textbf{S}}_{\tau }}$. Note that, if the rates ${\alpha _{T}}$ and ${\beta _{T}}$ differ, the estimator with the faster convergence will tend to zero in probability in Theorem 2. Furthermore, note that the asymptotic distributions of the diagonal elements of ${\gamma _{T}}(\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}-{\textbf{I}_{d}})$ only depend on the asymptotic distribution of the covariance matrix estimator ${\hat{\textbf{S}}_{0}}$. However, the rate of convergence of the off-diagonal elements depends on both ${\hat{\textbf{S}}_{0}}$ and ${\hat{\textbf{S}}_{\tau }}$, and consequently, if the covariance estimator ${\hat{\textbf{S}}_{0}}$ converges faster than the autocovariance estimator ${\hat{\textbf{S}}_{\tau }}$, the diagonal elements of ${\gamma _{T}}(\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}-{\textbf{I}_{d}})$ converge to zero in distribution and in probability. Conversely, it may happen that the autocovariance estimator ${\hat{\textbf{S}}_{\tau }}$ converges faster. In that case, the off-diagonal elements of ${\gamma _{T}}(\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}-{\textbf{I}_{d}})$ converge to zero in distribution and in probability.
Theorem 2 can alternatively be presented in matrix form as
\[\begin{aligned}{}& {\gamma _{T}}\left(\text{diag}\left[\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}-{\textbf{I}_{d}}\right]\right)={\gamma _{T}}\left(\frac{1}{2}\text{diag}\left[{\hat{\textbf{S}}_{0}}-{\textbf{I}_{d}}\right]\right)+{\mathcal{O}_{p}}(1/{\gamma _{T}}),\\ {} & {\gamma _{T}}\left(\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}-\text{diag}\left[\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}\right]\right)={\gamma _{T}}\left(\textbf{H}\odot \left[{\boldsymbol{\Lambda }_{\tau }}{\hat{\textbf{S}}_{0}}-{\hat{\textbf{S}}_{\tau }}\right]\right)+{\mathcal{O}_{p}}(1/{\gamma _{T}}),\end{aligned}\]
where ${\textbf{H}_{jj}}=0$ and ${\textbf{H}_{jk}}={({\lambda _{\tau }^{(k)}}-{\lambda _{\tau }^{(j)}})^{-1}}$, $j\ne k$, ⊙ denotes the Hadamard (i.e., entrywise) product and $\hat{\textbf{J}}$ is the T-indexed sequence of phase-shift matrices that set the diagonal elements of $\hat{\boldsymbol{\Gamma }}$ to be on the positive real-axis.

4.2 Limiting behavior under summable covariance structures

In this section, we consider a class of stochastic processes that satisfy the Breuer–Major theorem for weakly stationary processes. The Breuer–Major theorem is considered in the context of Gaussian subordinated processes, see, e.g., [2, 8, 30]. A univariate real-valued Gaussian subordinated process ${z_{\text{•}}}$, defined on a probability space $(\Omega ,\mathcal{F},\mathbb{P})$, is a weakly stationary process that can be expressed in the form ${z_{\text{•}}}=f\circ {\textbf{y}_{\text{•}}}$, where ${\textbf{y}_{\text{•}}}$ is a ${\mathbb{R}^{\ell }}$-variate Gaussian process and $f:{\mathbb{R}^{\ell }}\to \mathbb{R}$.
We emphasize that Gaussian subordinated processes form a very rich model class. For example, recently in [39], the authors showed that arbitrary one-dimensional marginal distributions and a rich class of covariance structures can be modeled by $f\circ {\textbf{y}_{\text{•}}}$ with simple univariate stationary Gaussian process ${\textbf{y}_{\text{•}}}$. While the model class is very rich, underlying driving Gaussian processes still provide a large toolbox for analyzing limiting behavior of various estimators. Such limit theorems have been a topic of active research for decades. For recent relevant papers on the topic, we refer to [27–30] and the references therein.
We next give the definitions of univariate real-valued Hermite polynomials and Hermite ranks. We define the kth, $k\in \{0\}\cup \mathbb{N}$, (probabilistic) Hermite polynomial ${H_{k}}$, using Rodrigues’ formula
\[ {H_{k}}(x)={(-1)^{k}}\exp \left({x^{2}}/2\right)\frac{{d^{k}}}{d{x^{k}}}\exp \left(-{x^{2}}/2\right).\]
The first four Hermite polynomials are hereby ${H_{0}}(x)=1$, ${H_{1}}(x)=x$, ${H_{2}}(x)={x^{2}}-1$ and ${H_{3}}(x)={x^{3}}-3x$. The set $\{{H_{k}}/\sqrt{k!}:k\in \{0\}\cup \mathbb{N}\}$ forms an orthonormal basis on the Hilbert space ${\mathcal{L}^{2}}(\mathbb{R},{P_{y}})$, where ${P_{y}}$ denotes the law of a univariate standard Gaussian random variable. Consequently, every function $f\in {\mathcal{L}^{2}}(\mathbb{R},{P_{y}})$ can be decomposed as
(3)
\[ f(x)={\sum \limits_{k=0}^{\infty }}{a_{k}}{H_{k}}(x),\]
where ${a_{k}}\in \mathbb{R}$ for every $k\in \{0\}\cup \mathbb{N}$. If x and y follow the univariate standard Gaussian distribution, the orthogonality of the Hermite polynomials and the decomposition of Equation (3) give
\[ \mathbb{E}\left[f(x)f(y)\right]={\sum \limits_{k=0}^{\infty }}k!{\alpha _{k}^{2}}{\left(\mathbb{E}\left[(x-\mathbb{E}[x])(y-\mathbb{E}[y])\right]\right)^{k}}={\sum \limits_{k=0}^{\infty }}k!{\alpha _{k}^{2}}{\left(\text{Cov}[x,y]\right)^{k}}.\]
In the following, we consider real-valued functions of several variables and apply some known results that are based on multivariate Hermite polynomials. However, our derivations (in Section 4.3) require the use of univariate Hermite polynomials only. Thus, we only define univariate Hermite polynomials here.
The Hermite rank for a function f is defined as follows.
Definition 5 (Hermite rank).
Let $\textbf{y}$ be a ${\mathbb{R}^{\ell }}$-valued Gaussian random vector, $\ell \in \mathbb{N}$, and let $f:{\mathbb{R}^{\ell }}\to \mathbb{R}$ be a function such that $f\circ \textbf{y}$ is square-integrable. The function f has Hermite rank q, with respect to $\textbf{y}$, if
\[ \mathbb{E}\left[\left(f\left(\textbf{y}\right)-\mathbb{E}\left[f\left(\textbf{y}\right)\right]\right){p_{m}}\left(\textbf{y}\right)\right]=0,\]
for all Hermite polynomials ${p_{m}}:{\mathbb{R}^{d}}\to \mathbb{R}$ that are of degree $m\le q-1$ and there exists a polynomial ${p_{q}}$ of degree q such that
\[ \mathbb{E}\left[\left(f\left(\textbf{y}\right)-\mathbb{E}\left[f\left(\textbf{y}\right)\right]\right){p_{q}}\left(\textbf{y}\right)\right]\ne 0.\]
Note that in the one-dimensional setting, the Hermite rank q of a function f is the smallest nonnegative integer in Equation (3), such that ${\alpha _{q}}\ne 0$. We next present the multivariate version of the well-known Breuer–Major theorem from [8].
Theorem 3.
Let ${\boldsymbol{y}_{\text{•}}}:={({\boldsymbol{y}_{t}})_{t\in \mathbb{N}}}$ be a ${\mathbb{R}^{\ell }}$-valued centered stationary Gaussian process. Let ${f_{1}},{f_{2}},\dots ,{f_{d}}:{\mathbb{R}^{\ell }}\to \mathbb{R}$ be measurable functions that have the Hermite rank at least $q\in \mathbb{N}$, with respect to ${\boldsymbol{y}_{\text{•}}}$, and let ${f_{j}}\circ {\boldsymbol{y}_{\text{•}}}$ be square-integrable for every $j\in \{1,\dots d\}$. Additionally, let the series of covariances satisfy
\[ {\sum \limits_{\tau =0}^{\infty }}{\left|{\ddot{\text{S}}_{\tau }}\left[{y_{1}^{(j)}},{y_{1}^{(k)}}\right]\right|^{q}}={\sum \limits_{\tau =0}^{\infty }}{\left|\mathbb{E}\left[{y_{1}^{(j)}}{y_{1+\tau }^{(k)}}\right]\right|^{q}}<+\infty ,\hspace{1em}\forall j,k\in \left\{1,\dots ,\ell \right\}.\]
Then, the ${\mathbb{R}^{d}}$-sequence
\[ \frac{1}{\sqrt{T}}{\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}{\textstyle\textstyle\sum _{t=1}^{T}}\left({f_{1}}({\boldsymbol{y}_{t}})-\mathbb{E}\left[{f_{1}}({\boldsymbol{y}_{t}})\right]\right)& \cdots & {\textstyle\textstyle\sum _{t=1}^{T}}\left({f_{d}}({\boldsymbol{y}_{t}})-\mathbb{E}\left[{f_{d}}({\boldsymbol{y}_{t}})\right]\right)\end{array}\right)^{\top }}\]
converges in distribution, as $T\to \infty $, to a centered Gaussian vector $\boldsymbol{\rho }=\left({\rho _{1}},{\rho _{2}},\dots ,{\rho _{d}}\right)$ with finite covariances $\mathbb{E}\left[{\rho _{j}}{\rho _{k}}\right]$ equal to
\[ {\text{S}_{0}}\left[{f_{j}}({\boldsymbol{y}_{1}}),{f_{k}}({\boldsymbol{y}_{1}})\right]+{\sum \limits_{\tau =1}^{\infty }}\left({\ddot{\text{S}}_{\tau }}\left[{f_{j}}({\boldsymbol{y}_{1}}),{f_{k}}({\boldsymbol{y}_{1}})\right]+{\ddot{\text{S}}_{\tau }}\left[{f_{k}}({\boldsymbol{y}_{1}}),{f_{j}}({\boldsymbol{y}_{1}})\right]\right).\]
The proof of Theorem 3 is omitted here. Theorem 3 follows directly from the univariate Breuer–Major theorem, given in [8], and by using the usual Cramér–Wold device, see, e.g., [7, Theorem 29.4]. A similar version of Theorem 3 can also be found in Section 5 of [2].
We next present the following assumption which enables us to find the asymptotic behavior of the unmixing matrix estimator using the Breuer–Major theorem.
Assumption 1.
Let ${\boldsymbol{x}_{\text{•}}}:={\left({\boldsymbol{x}_{t}}\right)_{t\in \mathbb{N}}}$ be a process that satisfies Definition 2 with the trivial mixing matrix $\boldsymbol{a}={\boldsymbol{I}_{d}}$ and let $\boldsymbol{x}$ be a ${\mathbb{C}^{T\times d}}$-valued, $1\le d<T<\infty $, sampled stochastic process generated by ${\boldsymbol{x}_{\text{•}}}$. Denote ${\boldsymbol{z}_{\text{•}}}={({\boldsymbol{b}_{t}}+i{\boldsymbol{C}_{t}})_{t\in \mathbb{N}}}={\boldsymbol{b}_{\text{•}}}+i{\boldsymbol{C}_{\text{•}}}$. We assume that there exists $\ell \in \mathbb{N}$ and a centered ${\mathbb{R}^{\ell }}$-valued stationary Gaussian process ${\boldsymbol{\eta }_{\text{•}}}$, such that, for all $k\in \{1,\dots d\}$, the component ${\boldsymbol{b}_{\text{•}}^{(k)}}$ has the same finite-dimensional distributions and, asymptotically, the same autocovariance function as ${\tilde{f}_{k}}({\boldsymbol{\eta }_{\text{•}}})$, for some function ${\tilde{f}_{k}}:{\mathbb{R}^{\ell }}\to \mathbb{R}$. Similarly, we assume that, for all $k\in \{1,\dots d\}$, the component ${\boldsymbol{C}_{\text{•}}^{(k)}}$ has the same finite-dimensional distributions and, asymptotically, the same autocovariance function as ${\tilde{f}_{k+d}}({\boldsymbol{\eta }_{\text{•}}})$, for some function ${\tilde{f}_{k+d}}:{\mathbb{R}^{\ell }}\to \mathbb{R}$. Furthermore, we assume that $\mathbb{E}[|{\tilde{f}_{k}}({\boldsymbol{\eta }_{1}}){|^{4}}]<+\infty $, $\forall k\in \{1,\dots ,2d\}$, and that ${r_{\boldsymbol{\eta }}^{(j,k)}}(t)=\mathbb{E}\left[{\boldsymbol{\eta }_{t+1}^{(j)}}{\boldsymbol{\eta }_{1}^{(k)}}\right]$ satisfies ${r_{\boldsymbol{\eta }}^{(j,j)}}(0)=1$, $\forall j\in \{1,\dots ,\ell \}$, and
\[ {\sum \limits_{t=1}^{\infty }}\left|{r_{\boldsymbol{\eta }}^{(j,k)}}(t)\right|<+\infty ,\hspace{1em}\forall j,k\in \left\{1,\dots ,\ell \right\}.\]
Note that the finite-dimensional distributions of the $\mathbb{R}$-valued stochastic process ${c_{\text{•}}}$ is the collection of probability measures defined as,
\[ \left\{\mathbb{P}\left[\left\{{c_{{t_{1}}}}\hspace{-0.1667em}\hspace{-0.1667em}\in \hspace{-0.1667em}{B_{1}},\dots ,{c_{{t_{m}}}}\hspace{-0.1667em}\hspace{-0.1667em}\in \hspace{-0.1667em}{B_{m}}\right\}\right]\mid m\hspace{-0.1667em}\in \hspace{-0.1667em}\mathbb{N}:\forall {B_{j}}\in \mathcal{B}(\mathbb{R})\hspace{2.5pt}\text{and}\hspace{2.5pt}\{{t_{1}},\dots ,{t_{m}}\}\hspace{-0.1667em}\subset \mathbb{N}\right\}.\]
We again want to emphasize that a wide class of stochastic processes satisfy Assumption 1. Indeed, we allow an arbitrary dimension ℓ for the driving Gaussian process ${\boldsymbol{\eta }_{\text{•}}}$ and arbitrary (summable) covariance structures. In comparison, it was shown in [39] that in many cases it would suffice to relate only one stationary Gaussian process ${\boldsymbol{\eta }_{\text{•}}^{(j)}}$ to each function ${\tilde{f}_{j}}$.
We are now ready to consider the asymptotic distribution for the unmixing matrix estimator under Assumption 1.
Theorem 4.
Let Assumption 1 be satisfied and let $\hat{g}:\boldsymbol{C}\mapsto (\boldsymbol{C}-{\boldsymbol{1}_{T}}{\hat{\boldsymbol{\mu }}^{\top }}){\hat{\boldsymbol{\Gamma }}^{\top }}$ be a T-indexed sequence of finite sample solutions. Then, under the trivial mixing scenario $\boldsymbol{a}={\boldsymbol{I}_{d}}$, there exists a T-indexed sequence $\hat{\boldsymbol{J}}$ of phase-shift matrices such that
\[ \sqrt{T}\cdot \text{vec}\left(\hat{\boldsymbol{J}}\hat{\boldsymbol{\Gamma }}-{\boldsymbol{I}_{d}}\right){\xrightarrow[T\to \infty ]{\mathcal{D}}}\boldsymbol{\nu }\sim {\mathcal{N}_{{d^{2}}}}(\boldsymbol{0},{\boldsymbol{\Sigma }_{\boldsymbol{\nu }}},{\boldsymbol{P}_{\boldsymbol{\nu }}}).\]
The exact forms for ${\boldsymbol{\Sigma }_{\boldsymbol{\nu }}}$ and ${\boldsymbol{P}_{\boldsymbol{\nu }}}$ are given in the supplementary Appendix.
Note that it is possible to present Theorem 4 with even weaker assumptions. In the current formulation, we require that the Gaussian process ${\textbf{y}_{\text{•}}}$ (equivalently, ${\boldsymbol{\eta }_{\text{•}}}$) has summable covariances and cross-covariances. By studying the exact Hermite ranks of the underlying functions, weaker summability conditions would suffice (cf. Theorem 3). However, given the Hermite ranks of the functions ${f_{j}}$, it is in general impossible to say anything about the Hermite ranks of transformations such as ${f_{j}^{2}}$ arising from ${\hat{\textbf{S}}_{0}}$, see the proof of Theorem 4. On the other hand, assuming Hermite rank equal to one is a very natural assumption in many occasions. For example, the modeling approach given in [39] sets the rank of ${f_{j}}$ to equal 1. Hermite rank 1 is stable in a sense that even small perturbations in a function with higher Hermite rank leads to rank 1 again. For further discussion on the stability of Hermite rank 1, we refer to [4].

4.3 Notes on noncentral limit theorems

In this section, we provide examples where the convergence rate of the unmixing estimator differs from the standard $\sqrt{T}$ and where the limiting distribution is non-Gaussian. Such situations arise, especially, when the convergence summability condition of Theorem 3 does not hold.
Assumption 2.
Let ${\boldsymbol{x}_{\text{•}}}:={\left({\boldsymbol{x}_{t}}\right)_{t\in \mathbb{N}}}$ be a process that satisfies Definition 2 with the trivial mixing matrix $\boldsymbol{a}={\boldsymbol{I}_{d}}$ and let $\boldsymbol{x}$ be a ${\mathbb{C}^{T\times d}}$-valued, $1\le d<T<+\infty $, sampled stochastic process generated by ${\boldsymbol{x}_{\text{•}}}$ and ${\boldsymbol{z}_{\text{•}}}={\boldsymbol{b}_{\text{•}}}+i{\boldsymbol{C}_{\text{•}}}$. We assume that there exists centered $\mathbb{R}$-valued stationary Gaussian processes ${\boldsymbol{\eta }_{\text{•}}^{(k)}}$, $k\in \{1,2\dots ,2d\}$, such that, for all $k\in \{1,\dots d\}$, the component ${\boldsymbol{b}_{\text{•}}^{(k)}}$ has the same finite-dimensional distributions and, asymptotically, the same autocovariance function as ${\tilde{f}_{k}}({\boldsymbol{\eta }_{\text{•}}^{(k)}})$, for some function ${\tilde{f}_{k}}:\mathbb{R}\to \mathbb{R}$. Similarly, we assume that, for all $k\in \{1,\dots d\}$, the component ${\boldsymbol{C}_{\text{•}}^{(k)}}$ has the same finite-dimensional distributions and, asymptotically, the same autocovariance function as ${\tilde{f}_{k+d}}({\boldsymbol{\eta }_{\text{•}}^{(k+d)}})$, for some function ${\tilde{f}_{k+d}}:\mathbb{R}\to \mathbb{R}$. Furthermore, we assume that $\mathbb{E}[|{\tilde{f}_{k}}({\boldsymbol{\eta }_{1}^{(k)}}){|^{4}}]<+\infty $, $\forall k\in \{1,\dots ,2d\}$, and that the Gaussian processes ${\boldsymbol{\eta }_{\text{•}}^{(1)}},{\boldsymbol{\eta }_{\text{•}}^{(2)}},\dots ,{\boldsymbol{\eta }_{\text{•}}^{(2d)}}$ are mutually independent.
In comparison to Assumption 1, we here assume that real and imaginary parts of each component are driven by a single Gaussian process. As discussed in Section 4.2, this is not a huge restriction. In addition, we assume independent components and that the real and imaginary parts of each component are independent. Although independence is a common assumption in the blind source separation literature, we note that our results can be extended to cover dependencies as long as, for every $j,k\in \{1,2,\dots ,2d\}$, the cross-covariances between ${\boldsymbol{\eta }^{(k)}}$ and ${\boldsymbol{\eta }^{(j)}}$ are negligible compared to the decay of the autocovariance functions ${r_{{\boldsymbol{\eta }^{(j)}}}}(t)$.
If the autocovariance functions ${r_{{\boldsymbol{\eta }^{(j)}}}}$ satisfy
(4)
\[ {\sum \limits_{t=1}^{\infty }}|{r_{{\boldsymbol{\eta }^{(j)}}}}(t)|<+\infty ,\hspace{1em}\forall j\in \{1,\dots 2d\},\]
then we are in the situation of Theorem 4. More generally, a weakly stationary series is called short-range dependent if the corresponding autocovariance function r satisfies Equation (4). Hence, we assume that at least one of the functions ${r_{{\boldsymbol{\eta }^{(j)}}}}$ is not summable. For such components, we assume the following long-range dependence condition.
Definition 6.
We say that a $\mathbb{R}$-valued weakly stationary process ${\eta _{\text{•}}}$ is long-range dependent if the autocovariance function ${r_{\eta }}$ satisfies
(5)
\[ \underset{k\to \infty }{\lim }{k^{2-2H}}{r_{\eta }}(k)=C\]
for some $H\in \left[1/2,1\right)$ and $C\in (0,\infty )$.
Note that the definition of long-range dependence varies in the literature. For details on long-range dependent processes and their different definitions, we refer to [5, 6, 33].
There is a large literature on limit theorems under long-range dependence. See, e.g., [3, 11, 34] for a few central works on the topic. For an insightful and detailed presentation we refer to [18], and for recent advances on the topic, see, e.g., [21, 40]. In the case of long-range dependent processes, the rate of convergence of the normalized mean is slower than the usual $\sqrt{T}$ and the limiting distribution depends on the corresponding Hermite rank. More precisely, the limiting distribution follows a so-called q-Hermite distribution, where q is the Hermite rank of the underlying function. Note that q-Hermite distributions can be fully characterized by the corresponding Hermite rank q and a self-similarity parameter H, i.e., the Hurst index. In the stable case $q=1$, we obtain a Gaussian limit, and in the case $q=2$ we obtain the so-called Rosenblatt distribution that is not Gaussian. Similarly, for $q\ge 3$ the limiting distribution is not Gaussian. For details on Hermite distributions and processes, we refer to [3, 12, 36]. In particular, we apply the following known result (see [11, Theorem 1]).
Proposition 1.
Let ${\eta _{\text{•}}}$ be a $\mathbb{R}$-valued stationary Gaussian process with autocovariance function ${r_{\eta }}$ such that ${r_{\eta }}(0)=1$ and ${r_{\eta }}$ satisfies Equation (5) for some $H\in \left[1/2,1\right)$. Let f be a function such that $\mathbb{E}[{(f({\eta _{1}}))^{2}}]<+\infty $ and the Hermite rank of f equals q. If $q(2H-2)>-1$, then, for some constant $C={C_{f,\eta }}>0$, we have
\[ {T^{q(1-H)-1}}{\sum \limits_{t=1}^{T}}\left(f({\eta _{t}})-\mathbb{E}[f({\eta _{t}})]\right){\xrightarrow[T\to \infty ]{\mathcal{D}}}C{Z_{q}},\]
where ${Z_{q}}$ follows a q-Hermite distribution.
In view of Proposition 1, in the long-range dependent case the Hermite rank plays a crucial role. Thus, one cannot pose general results without any a priori knowledge on the ranks. Indeed, it can be shown — see, e.g., [6, Equation 4.26] with $d=H-1/2$ — that if f has Hermite rank q and r satisfies Equation (5) with some H such that $q(2H-2)>-1$, then
(6)
\[ \underset{T\to \infty }{\lim }{T^{q(2-2H)}}\mathbb{E}\left[{\left(\frac{1}{T}{\sum \limits_{t=1}^{T}}\left(f({\boldsymbol{\eta }_{t}})-\mathbb{E}[f({\boldsymbol{\eta }_{t}})]\right)\right)^{2}}\right]=C.\]
This justifies the normalization in Proposition 1 and also gives the constant explicitly, as $\mathbb{E}[{Z_{q}}]=0$ and $\mathbb{E}[{({Z_{q}})^{2}}]=1$. We pose the following assumption on the autocovariances ${r_{{\boldsymbol{\eta }^{(j)}}}}$ and Hermite ranks.
Assumption 3.
We assume that for every $j\in \{1,2,\dots ,2d\}$ the autocovariance functions ${r_{{\boldsymbol{\eta }^{(j)}}}}$ with ${r_{{\boldsymbol{\eta }^{(j)}}}}(0)=1$ satisfy either Equation (4) or Equation (5), and that Equation (5) is satisfied for at least one index. Let $I\subset \{1,2,\dots ,2d\}$ denote the set of indices for which Equation (5) holds, and for every $i\in I$, denote the Hermite rank of $x\mapsto {\tilde{f}_{i}}(x)$ by ${q_{1,i}}$ and the Hermite rank of $x\mapsto {\left[{\tilde{f}_{i}}(x)-\mathbb{E}({\tilde{f}_{i}}({\boldsymbol{\eta }^{(i)}}))\right]^{2}}$ by ${q_{2,i}}$. We assume that, for any indices $j,k\in I$, $j\ne k$,
(7)
\[ \underset{i\in I}{\max }{q_{2,i}}(2{H_{i}}-2)>\underset{j,k\in I}{\max }\left\{{q_{1,k}}(2{H_{k}}-2)+{q_{1,j}}(2{H_{j}}-2),-1\right\},\]
and, for any $k\in I$,
(8)
\[ \underset{i\in I}{\max }{q_{2,i}}(2{H_{i}}-2)\ge {q_{1,k}}(4{H_{k}}-4).\]
We stress that Assumption 3 is not very restrictive. We allow a combination of short- and long-range dependent processes ${\boldsymbol{\eta }_{\text{•}}^{(j)}}$ and we assume that at least one of the processes is long-range dependent, since otherwise we may apply Theorem 4. In this respect, Condition (7) guarantees that at least one of the ${[{\tilde{f}_{i}}(x)-\mathbb{E}({\tilde{f}_{i}}({\boldsymbol{\eta }^{(i)}}))]^{2}}$ is long-range dependent. Indeed, if ${\max _{i\in I}}{q_{i}}(2{H_{i}}-2)<-1$, then Theorem 4 applies again. Moreover, one could also study the limiting case ${\max _{i\in I}}{q_{i}}(2{H_{i}}-2)=-1$. Then, one usually obtains a Gaussian limit, but with a rate given by $\sqrt{T/\log (T)}$. Our more restrictive Assumptions (7) and (8) are posed in order to apply known results on limit theorems in the long-range dependent case. Indeed, if these conditions were violated, then we would need to study the convergence of complicated dependent random vectors towards some combinations of Hermite distributions. Unfortunately, to the best of our knowledge, there are no studies in this topic that would fit into our general setting (on some results in special cases, see, e.g., [3]).
In view of Proposition 1, the rate of convergence, ${T^{\gamma }}$, of our unmixing estimator arises from Equation (7). We set
(9)
\[ \gamma :=-\frac{1}{2}\underset{i\in I}{\max }{q_{2,i}}(2{H_{i}}-2).\]
By Equation (7), we have that $\gamma <\frac{1}{2}$, meaning that our rate is slower than the usual $\sqrt{T}$. The following results show that cross-terms do not contribute to the limit.
Proposition 2.
Let Assumptions 2 and 3 be satisfied. Then for every $j\ne k$, we have that
(10)
\[ {T^{\gamma -1}}{\sum \limits_{t=1}^{T}}\left({\tilde{f}_{k}}({\boldsymbol{\eta }_{t}^{(k)}})-\mathbb{E}[{\tilde{f}_{k}}({\boldsymbol{\eta }_{t}^{(k)}})]\right)\left({\tilde{f}_{j}}({\boldsymbol{\eta }_{t}^{(j)}})-\mathbb{E}[{\tilde{f}_{j}}({\boldsymbol{\eta }_{t}^{(j)}})]\right){\xrightarrow[T\to \infty ]{{\mathcal{L}_{2}}}}0\]
and
(11)
\[ {T^{\gamma -1}}{\sum \limits_{t=1}^{T}}\left({\tilde{f}_{k}}({\boldsymbol{\eta }_{t}^{(k)}})-\mathbb{E}[{\tilde{f}_{k}}({\boldsymbol{\eta }_{t}^{(k)}})]\right)\left({\tilde{f}_{j}}({\boldsymbol{\eta }_{t+\tau }^{(j)}})-\mathbb{E}[{\tilde{f}_{j}}({\boldsymbol{\eta }_{t}^{(j)}})]\right){\xrightarrow[T\to \infty ]{{\mathcal{L}_{2}}}}0\]
where ${\xrightarrow[]{{\mathcal{L}_{2}}}}$ denotes convergence in the space ${\mathcal{L}_{2}}(\Omega ,\mathbb{P})$. Consequently, the convergence also holds in probability.
In the presence of long-range dependency, the mean estimator $\hat{\boldsymbol{\mu }}[\textbf{X}]$ in the definition of ${\tilde{\textbf{S}}_{\tau }}\left[\textbf{X}\right]$ has to be studied separately as it may contribute to the limit.
In order to obtain the limiting distribution for ${T^{\gamma }}(\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}-{\textbf{I}_{d}})$, we introduce some notation. For each $k\in \{1,2,\dots ,2d\}$ we define Hermite processes ${Z_{{q_{1,k}}}}$ and ${Z_{{q_{2,k}}}}$, such that for a given k, we allow ${Z_{{q_{1,k}}}}$ and ${Z_{{q_{2,k}}}}$ to be dependent on each other, while both are independent of ${Z_{{q_{1,j}}}}$, ${Z_{{q_{2,j}}}}$ for $j\ne k$. We also introduce constants ${C_{1,k}}$, ${C_{2,k}}$ in the following way: if $k\in I$ is such that equality holds in (8), i.e., ${\max _{i\in I}}{q_{2,i}}(2{H_{i}}-2)=2[{q_{1,k}}(2{H_{k}}-2)]$, then we let ${C_{1,k}}$ to be the constant given in Proposition 1 associated to the limit
\[ {T^{{q_{1,k}}(1-{H_{k}})-1}}{\sum \limits_{t=1}^{T}}\left({\tilde{f}_{k}}({\boldsymbol{\eta }_{t}^{(k)}})-\mathbb{E}[{\tilde{f}_{k}}({\boldsymbol{\eta }_{t}^{(k)}})]\right){\xrightarrow[T\to \infty ]{\mathcal{D}}}{C_{1,k}}{Z_{{q_{1,k}}}}.\]
Otherwise we set ${C_{1,k}}=0$. Similarly, if $k\in I$ is such that
(12)
\[ {q_{2,k}}(2{H_{k}}-2)=\underset{i\in I}{\max }{q_{2,i}}(2{H_{i}}-2),\]
we let ${C_{2,k}}$ to be the constant given in Proposition 1 associated to the limit
\[ {T^{{q_{2,k}}(1-{H_{k}})-1}}{\sum \limits_{t=1}^{T}}\left({h_{k}}({\boldsymbol{\eta }_{t}^{(k)}})-\mathbb{E}\left[{h_{k}}({\boldsymbol{\eta }_{t}^{(k)}})\right]\right){\xrightarrow[T\to \infty ]{\mathcal{D}}}{C_{2,k}}{Z_{{q_{2,k}}}},\]
where
\[ {h_{k}}({\boldsymbol{\eta }_{t}^{(k)}})={\left({\tilde{f}_{k}}({\boldsymbol{\eta }_{t}^{(k)}})-\mathbb{E}[{\tilde{f}_{k}}({\boldsymbol{\eta }_{t}^{(k)}})]\right)^{2}}.\]
Otherwise we set ${C_{2,k}}=0$. The following theorem is our main result of this subsection.
Theorem 5.
Let Assumptions 2 and 3 hold. Then,
\[ {T^{\gamma }}\left(\hat{\boldsymbol{J}}\hat{\boldsymbol{\Gamma }}-{\boldsymbol{I}_{d}}\right){\xrightarrow[T\to \infty ]{\mathcal{D}}}\frac{1}{2}\boldsymbol{\Upsilon },\]
where Υ is a ${\mathbb{R}^{d\times d}}$-valued diagonal matrix with elements given by
\[ {(\boldsymbol{\Upsilon })_{jj}}\sim {C_{2,j}}{Z_{{q_{2,j}}}}+{C_{1,j}^{2}}{Z_{{q_{1,j}}}^{2}}+{C_{2,j+d}}{Z_{{q_{2,j+d}}}}+{C_{1,j+d}^{2}}{Z_{{q_{1,j+d}}}^{2}}.\]
We remark that it might be that most of the elements vanish. Indeed, the coefficient ${C_{2,j}}$ (${C_{2,j+d}}$, respectively) is nonzero only if the real part (imaginary part, respectively) of ${\left[{f_{j}}(x)-\mathbb{E}({f_{j}}(\boldsymbol{\eta }))\right]^{2}}$ converges with the correct rate ${T^{\gamma }}$. In this case, ${C_{1,j}}$ (${C_{1,j+d}}$, respectively) is nonzero only if the corresponding element in $\tilde{\boldsymbol{\mu }}$ converges with the rate ${T^{\gamma /2}}$.

5 Image source separation

The restoration of images, which have been mixed together by some transformation, is a classical example in blind source separation (BSS), see, e.g., [26]. The image source separation is usually only considered for black and white images, where the colors can be presented using a single color parameter. Since there is only a single color parameter associated with a single pixel, the black and white image source problem can be straightforwardly formulated as a real-valued BSS problem. This does not remain true for colored photographs.
In our example, we consider colored photographs and conduct the example using the statistical software R ([32]). We start with three photographs that are of size $1536\times 2048$, $1200\times 1600$ and $960\times 720$, the figures are imported to R using the package JPEG, [37]. As a first step, we resize all of the photographs to be of equal size $960\times 720$. The resized images are stored in a $960\times 720\times 3$ array format, such that a red-green-blue color parameter triplet, denoted as $(R,G,B)$, is assigned to every pixel.
vmsta190_g001.jpg
Fig. 1.
Original images
We apply a bijective cube to unit sphere transformation for every color triplet. Then, we use the well-known stereographic projection, which is an almost bijective transformation between the unit sphere and the complex plane. The stereographic projection is bijective everywhere except for the north pole of the unit sphere. For almost all photographs, we can choose the color coordinate system such that no pixel has a color triplet located on the north pole. This holds for our example and we can apply the inverse mappings in order to get back to the color cube surface.
We then have a single complex-number corresponding to every pixel of the color corrected images and we can present the images in a ${\mathbb{C}^{960\times 720}}$-matrix form. We denote the images with complex-valued elements as ${\textbf{Z}_{1}},{\textbf{Z}_{2}}$ and ${\textbf{Z}_{3}}$ and apply the following transformation,
\[ {\textbf{X}^{\top }}={\left(\begin{array}{c}\text{vec}{({\textbf{X}_{1}})^{\top }}\\ {} \text{vec}{({\textbf{X}_{2}})^{\top }}\\ {} \text{vec}{({\textbf{X}_{3}})^{\top }}\end{array}\right)^{\top }}={\left(\begin{array}{c}\text{vec}{({\textbf{Z}_{1}})^{\top }}\\ {} \text{vec}{({\textbf{Z}_{2}})^{\top }}\\ {} \text{vec}{({\textbf{Z}_{3}})^{\top }}\end{array}\right)^{\top }}\left({\textbf{I}_{3}}+\textbf{E}+i\textbf{F}\right),\]
where ${\textbf{I}_{3}}$ is the identity matrix and the elements of $\textbf{E}$ and $\textbf{F}$ were generated independently from the univariate uniform distribution $\texttt{Unif(-1/2, 1/2)}$. The complex-valued mixed images ${\textbf{X}_{1}},{\textbf{X}_{2}}$ and ${\textbf{X}_{3}}$ can then be plotted by performing the chain of inverse mappings in reverse order. The corresponding mixed images are presented in Figure 2.
vmsta190_g002.jpg
Fig. 2.
Mixed images
We then apply the unmixing procedure presented in Section 4 to the matrix $\textbf{X}$, using several different lag parameters τ. In this example, $\textbf{X}$ has three columns and 691200 rows, such that every element of the matrix is complex-valued. In controlled settings, where the true mixing matrix is known, the minimum distance index (MD) index is a straightforward way to compare the performance of different estimators, see [20] for the complex-valued formulation. The MD index is a performance index, defined between zero and one, where zero corresponds to the separation being perfect. In addition, the MD index ensures that comparison between different estimators is fair, this is especially important in this paper since under our model assumptions, different estimators do not necessarily estimate the same population quantities. The minimum distance index is based on the minimum distance between the known mixing matrix and the unmixing matrix estimate with respect to permutations, scaling and phase-shifts. While applying the AMUSE procedure, if we can find a parameter τ such that the diagonal elements of ${\hat{\boldsymbol{\Lambda }}_{\tau }}$ are sufficiently distinct (from a computational perspective), we can recover the unmixing matrix up to phase-shifts. Hereby, the choice of the mixing matrix in this example is inconsequential from the viewpoint of minimum distance index, identical values are obtained for a fixed τ with any finite mixing matrix. Consequently, in this example, minimum distance index will produce the same value if the mixing matrix was replaced with, e.g., the identity matrix.
vmsta190_g003.jpg
Fig. 3.
Unmixed images using the AMUSE procedure with $\tau =1$
We tried several different lag parameters and the best performance, in the light of the MD index and visual inspection, was attained with $\tau =1$ and the corresponding MD index value was approximately 0.174. The unmixed images obtained using the unmixing procedure with $\tau =1$ are presented in Figure 3. In addition, for example, lag parameter choices $\tau =2,3,961,962,963$ provided only slightly worse results, the corresponding MD index values were between 0.177 and 0.183. Note that the autocovariances are calculated from the vectorized images, and for example the first and the 961st entries are neighboring pixels in the unvectorized images.
vmsta190_g004.jpg
Fig. 4.
Unmixed images produced by equivalent solutions
Comparing the original color corrected images in Figure 1 and the unmixed images in Figure 3, the shapes seem to match almost perfectly. The color schemes vary since the complex phase is not uniquely fixed in our model. In addition, recall that under our model, solutions that are a phase-shift away from each other are considered to be equivalent. In Figure 4, we present the first unmixed image, produced by the unmixing procedure with $\tau =1$, under three equivalent solutions. The images in Figure 4 are obtained such that we first find an unmixing estimate $\hat{\boldsymbol{\Gamma }}$ and then right-multiply the obtained estimate with a diagonal matrix with diagonal entries $\exp (i\theta )$. In Figure 4, the left image is obtained with $\theta =\pi /4$, the middle one is obtained with $\theta =3\pi /4$ and the right image is obtained with $\theta =5\pi /4$.

A Appendix

Proof of Lemma 1.
Both directions of the claim follow directly from the multivariate version of the continuous mapping theorem. Note that the mapping $f\hspace{-0.1667em}:\hspace{-0.1667em}{({r_{1}},{r_{2}},\dots ,{r_{2d}})^{\top }}\mapsto {({r_{1}},{r_{2}},\dots ,{r_{d}})^{\top }}+i{({r_{d+1}},{r_{d+2}},\dots ,{r_{2d}})^{\top }}:{\mathbb{R}^{2d}}\to {\mathbb{C}^{d}}$ is homeomorphic, that is, f is continuous and bijective, and the preimage ${f^{-1}}$ is also continuous.  □
Corollary 1.
Let ${\{{\boldsymbol{v}_{n}}\}_{n\in \mathbb{N}}}$ be a collection of ${\mathbb{R}^{2d}}$-valued random vectors ${\boldsymbol{v}_{j}^{\top }}=\left(\begin{array}{c@{\hskip10.0pt}c}{\boldsymbol{x}_{j}^{\top }}& {\boldsymbol{y}_{j}^{\top }}\end{array}\right)$, where ${\boldsymbol{x}_{j}}$, ${\boldsymbol{y}_{j}}$ are ${\mathbb{R}^{d}}$-valued for every $j\in \mathbb{N}$. Let
\[ {\alpha _{n}}{\sum \limits_{j=1}^{n}}\left({\boldsymbol{v}_{j}}-{\boldsymbol{\mu }_{\boldsymbol{v}}}\right){\xrightarrow[n\to \infty ]{\mathcal{D}}}\boldsymbol{v}\sim {\mathcal{N}_{2d}}\left(\boldsymbol{0},{\boldsymbol{\Sigma }_{\boldsymbol{v}}}\right),\]
where ${\alpha _{n}}\uparrow \infty $ as $n\to \infty $ and
\[ {\boldsymbol{\mu }_{\boldsymbol{v}}}=\left(\begin{array}{c}{\boldsymbol{\mu }_{\boldsymbol{x}}}\\ {} {\boldsymbol{\mu }_{\boldsymbol{y}}}\end{array}\right)\hspace{1em}\textit{and}\hspace{2.5pt}\hspace{1em}{\boldsymbol{\Sigma }_{\boldsymbol{v}}}=\left(\begin{array}{c@{\hskip10.0pt}c}{\boldsymbol{\Sigma }_{\boldsymbol{x}}}& {\boldsymbol{\Sigma }_{\boldsymbol{xy}}}\\ {} {\boldsymbol{\Sigma }_{\boldsymbol{xy}}^{\top }}& {\boldsymbol{\Sigma }_{\boldsymbol{y}}}\end{array}\right).\]
Then, for the sequence of ${\mathbb{C}^{d}}$-valued random vectors ${\boldsymbol{z}_{j}}={\boldsymbol{x}_{j}}+i{\boldsymbol{y}_{j}}$, we have that
\[ {\alpha _{n}}{\sum \limits_{j=1}^{n}}\left({\boldsymbol{z}_{j}}-{\boldsymbol{\mu }_{\boldsymbol{z}}}\right){\xrightarrow[n\to \infty ]{\mathcal{D}}}\boldsymbol{z}\sim {\mathcal{N}_{d}}\left(\boldsymbol{0},{\boldsymbol{\Sigma }_{\boldsymbol{z}}},{\boldsymbol{P}_{\boldsymbol{z}}}\right),\]
where ${\boldsymbol{\mu }_{\boldsymbol{z}}}={\boldsymbol{\mu }_{\boldsymbol{x}}}+i{\boldsymbol{\mu }_{\boldsymbol{y}}}$, ${\boldsymbol{\Sigma }_{\boldsymbol{z}}}={\boldsymbol{\Sigma }_{\boldsymbol{x}}}+{\boldsymbol{\Sigma }_{\boldsymbol{y}}}+i({\boldsymbol{\Sigma }_{\boldsymbol{xy}}^{\top }}-{\boldsymbol{\Sigma }_{\boldsymbol{xy}}})$ and ${\boldsymbol{P}_{\boldsymbol{z}}}={\boldsymbol{\Sigma }_{\boldsymbol{x}}}-{\boldsymbol{\Sigma }_{\boldsymbol{y}}}+i({\boldsymbol{\Sigma }_{\boldsymbol{xy}}^{\top }}+{\boldsymbol{\Sigma }_{\boldsymbol{xy}}})$.
Proof of Corollary 1.
The corollary follows directly by applying Lemma 1 and by calculating the mean, the covariance and the relation matrix of $\textbf{z}$.  □
Lemma 3.
Let ${\boldsymbol{x}_{\text{•}}}:={\left({\boldsymbol{x}_{t}}\right)_{t\in \mathbb{N}}}$ be a process that satisfies Definition 2. The functional $g:\boldsymbol{a}\mapsto \boldsymbol{\Gamma }(\boldsymbol{a}-\boldsymbol{\mu }):{\mathbb{C}^{d}}\to {\mathbb{C}^{d}}$ is a solution to the corresponding unmixing problem if and only if
\[ \boldsymbol{\mu }={\boldsymbol{\mu }_{\boldsymbol{x}}}\hspace{1em}\hspace{2.5pt}\textit{and}\hspace{2.5pt}\hspace{1em}\boldsymbol{\Gamma }\boldsymbol{a}=\boldsymbol{J},\]
where $\boldsymbol{J}\in {\mathbb{C}^{d\times d}}$ is a phase-shift matrix.
Proof of Lemma 3.
Let ${\textbf{y}_{\text{•}}}:=g\circ {\textbf{x}_{\text{•}}}$. First, let ${\textbf{y}_{\text{•}}}$ be a solution, that is, ${\textbf{y}_{\text{•}}}$ satisfies conditions (1)–(4) of Definition 2. Recall that Γ is assumed to be nonsingular. Using condition (1), we get that
\[ \mathbb{E}\left[{\textbf{y}_{t}}\right]=\mathbb{E}\left[\boldsymbol{\Gamma }\left({\textbf{x}_{t}}-\boldsymbol{\mu }\right)\right]=\mathbb{E}\left[\boldsymbol{\Gamma }\left(\textbf{A}{\textbf{z}_{t}}+{\boldsymbol{\mu }_{\textbf{x}}}-\boldsymbol{\mu }\right)\right]=\boldsymbol{\Gamma }\mathbb{E}\left[{\boldsymbol{\mu }_{\textbf{x}}}-\boldsymbol{\mu }\right]=\textbf{0},\]
and thus ${\boldsymbol{\mu }_{\textbf{x}}}=\boldsymbol{\mu }$, since Γ is nonsingular. Next, we can rewrite condition (2) as
\[ {\textbf{S}_{0}}[{\textbf{y}_{t}}]=\mathbb{E}\left[{\textbf{y}_{t}}{\textbf{y}_{t}^{\text{H}}}\right]=\mathbb{E}\left[\boldsymbol{\Gamma }\textbf{A}{\textbf{z}_{t}}{\left(\boldsymbol{\Gamma }\textbf{A}{\textbf{z}_{t}}\right)^{\text{H}}}\right]=\boldsymbol{\Gamma }\textbf{A}{\textbf{A}^{\text{H}}}{\boldsymbol{\Gamma }^{\text{H}}}={\textbf{I}_{d}},\]
which implies that $\boldsymbol{\Gamma }\textbf{A}$ is a unitary matrix. Similarly, we can rewrite condition (4) as
\[ {\textbf{S}_{\tau }}[{\textbf{y}_{t}}]=\frac{1}{2}\left(\mathbb{E}\left[{\textbf{y}_{t}}{\textbf{y}_{t+\tau }^{\text{H}}}\right]+\mathbb{E}\left[{\textbf{y}_{t+\tau }}{\textbf{y}_{t}^{\text{H}}}\right]\right)=\boldsymbol{\Gamma }\textbf{A}{\boldsymbol{\Lambda }_{\tau }}{\textbf{A}^{\text{H}}}{\boldsymbol{\Gamma }^{\text{H}}}={\boldsymbol{\Lambda }_{\tau }},\]
which is equivalent to
\[ {\boldsymbol{\Lambda }_{\tau }}{\textbf{A}^{\text{H}}}{\boldsymbol{\Gamma }^{\text{H}}}={\textbf{A}^{\text{H}}}{\boldsymbol{\Gamma }^{\text{H}}}{\boldsymbol{\Lambda }_{\tau }}.\]
Since ${\boldsymbol{\Lambda }_{\tau }}$ has real-valued distinct diagonal entries and since ${\boldsymbol{\Lambda }_{\tau }}$ and ${\textbf{A}^{\text{H}}}{\boldsymbol{\Gamma }^{\text{H}}}$ commute, we get that $\boldsymbol{\Gamma }\textbf{A}$ is also a diagonal matrix. Consequently, $\boldsymbol{\Gamma }\textbf{A}$ is a unitary diagonal matrix, which implies that the diagonal elements of the matrix product are of the form $\exp (i{\theta _{1}}),\dots ,\exp (i{\theta _{d}})$.
For the second part of the proof, let $\boldsymbol{\mu }={\boldsymbol{\mu }_{\textbf{x}}}$ and $\boldsymbol{\Gamma }\textbf{A}=\textbf{J}$, where $\textbf{J}$ is some phase-shift matrix. We next verify that ${\textbf{y}_{\text{•}}}:=g\circ {\textbf{x}_{\text{•}}}$ is a solution, that is, we verify that ${\textbf{y}_{\text{•}}}$ satisfies conditions (1)–(4) of Definition 2. Condition (1) is clearly satisfied, since $\mathbb{E}[{\textbf{x}_{t}}]={\boldsymbol{\mu }_{\textbf{x}}}$. Furthermore, we have that, for every $\tau ,t\in \mathbb{N}$,
\[ {\ddot{\textbf{S}}_{\tau }}[{\textbf{y}_{t}}]=\mathbb{E}\left[\boldsymbol{\Gamma }\textbf{A}{\textbf{z}_{t}}{\left(\boldsymbol{\Gamma }\textbf{A}{\textbf{z}_{t+\tau }}\right)^{\text{H}}}\right]=\textbf{J}{\ddot{\boldsymbol{\Lambda }}_{\tau }}{\textbf{J}^{\text{H}}},\]
where ${\ddot{\boldsymbol{\Lambda }}_{0}}={\textbf{I}_{d}}$. Thus, conditions (2)–(3) are satisfied. Finally, we have for a fixed τ and for every $t\in \mathbb{N}$ that
\[ {\textbf{S}_{\tau }}[{\textbf{y}_{t}}]=\textbf{J}{\textbf{S}_{\tau }}[{\textbf{z}_{t}}]{\textbf{J}^{\text{H}}}=\textbf{J}{\boldsymbol{\Lambda }_{\tau }}{\textbf{J}^{\text{H}}}={\boldsymbol{\Lambda }_{\tau }},\]
since diagonal matrices commute. Thus, condition (4) is satisfied and the proof is complete.  □
Proof of Theorem 1.
In order to simplify the notation, we denote ${\textbf{S}_{0}}:={\textbf{S}_{0}}\left[{\textbf{x}_{t}}\right]$ and ${\textbf{S}_{\tau }}:={\textbf{S}_{\tau }}\left[{\textbf{x}_{t}}\right]$. First, let g be a solution. Then, since ${\textbf{x}_{\text{•}}}$ follows the mixing model, Lemma 3 gives
\[ {\textbf{S}_{0}}{\boldsymbol{\Gamma }^{\text{H}}}=\textbf{A}{\textbf{A}^{\text{H}}}{\boldsymbol{\Gamma }^{\text{H}}}=\textbf{A}{\textbf{J}^{\text{H}}}\hspace{1em}\Longrightarrow \hspace{1em}\textbf{A}={\textbf{S}_{0}}{\boldsymbol{\Gamma }^{\text{H}}}\textbf{J}.\]
Using the above expressions, we get that
\[ {\textbf{S}_{\tau }}{\boldsymbol{\Gamma }^{\text{H}}}=\textbf{A}{\boldsymbol{\Lambda }_{\tau }}{\textbf{A}^{\text{H}}}{\boldsymbol{\Gamma }^{\text{H}}}={\textbf{S}_{0}}{\boldsymbol{\Gamma }^{\text{H}}}\textbf{J}{\boldsymbol{\Lambda }_{\tau }}{\textbf{J}^{\text{H}}}={\textbf{S}_{0}}{\boldsymbol{\Gamma }^{\text{H}}}{\boldsymbol{\Lambda }_{\tau }},\]
and left-multiplying by ${\textbf{S}_{0}^{-1}}$ yields Equation (1). In addition, Lemma 3 directly gives Equation (2). Since the process ${\textbf{x}_{\text{•}}}$ is weakly stationary, the previous is true for every $t\in \mathbb{N}$.
Next, let Equations (1) and (2) be satisfied for every $t\in \mathbb{N}$. Left-multiplying by ${\textbf{S}_{0}}$, Equation (1) can be reformulated as
\[ \textbf{A}{\boldsymbol{\Lambda }_{\tau }}{\textbf{A}^{\text{H}}}{\boldsymbol{\Gamma }^{\text{H}}}=\textbf{A}{\textbf{A}^{\text{H}}}{\boldsymbol{\Gamma }^{\text{H}}}{\boldsymbol{\Lambda }_{\tau }},\]
which, after left-multiplication by ${\textbf{A}^{-1}}$, gives that ${\textbf{A}^{\text{H}}}{\boldsymbol{\Gamma }^{\text{H}}}$ and ${\boldsymbol{\Lambda }_{\tau }}$ commute. Since the diagonal matrix ${\boldsymbol{\Lambda }_{\tau }}$ has real-valued distinct diagonal elements, we get that ${\textbf{A}^{\text{H}}}{\boldsymbol{\Gamma }^{\text{H}}}$ is a diagonal matrix. Then the scaling equation gives that $\boldsymbol{\Gamma }\textbf{A}{\textbf{A}^{\text{H}}}{\boldsymbol{\Gamma }^{\text{H}}}={\textbf{I}_{d}}$, i.e., $\boldsymbol{\Gamma }\textbf{A}$ is a unitary matrix. Consequently, $\boldsymbol{\Gamma }\textbf{A}=\textbf{J}$, where $\textbf{J}$ is some phase-shift matrix. By Lemma 3, the functional g is hereby a solution to the corresponding unmixing problem, and the proof is complete.  □
Corollary 2.
Let ${\boldsymbol{x}_{\text{•}}}:={\left({\boldsymbol{x}_{t}}\right)_{t\in \mathbb{N}}}$ be a process that satisfies Definition 2 and denote ${\boldsymbol{S}_{j}}:={\boldsymbol{S}_{j}}[{\boldsymbol{x}_{t}}]$. Let ${\boldsymbol{S}_{0}^{-1/2}}$ be a conjugate symmetric matrix, such that ${\boldsymbol{S}_{0}^{-1/2}}{\boldsymbol{S}_{0}}{\boldsymbol{S}_{0}^{-1/2}}={\boldsymbol{I}_{d}}$. Then the eigendecomposition ${\boldsymbol{S}_{0}^{-1/2}}{\boldsymbol{S}_{\tau }}{\boldsymbol{S}_{0}^{-1/2}}=\boldsymbol{v}{\boldsymbol{\Lambda }_{\tau }}{\boldsymbol{v}^{\text{H}}}$ is satisfied for some unitary $\boldsymbol{v}$, and the functional
\[ g:\boldsymbol{a}\mapsto {\boldsymbol{v}^{\text{H}}}{\boldsymbol{S}_{0}^{-\frac{1}{2}}}(\boldsymbol{a}-{\boldsymbol{\mu }_{\boldsymbol{x}}}):{\mathbb{C}^{d}}\to {\mathbb{C}^{d}}\]
is a solution to the corresponding unmixing problem.
Proof of Corollary 2.
Under the assumptions of Definition 2, we have that ${\textbf{S}_{0}}$ is a positive-definite matrix and that ${\textbf{S}_{\tau }}$ is a conjugate symmetric matrix. Thus, the matrix-square root of ${\textbf{S}_{0}}$ exists and the conjugate symmetric matrix-square root is unique, and consequently similar arguments as in the real-valued counterpart presented in [17] can be applied here. Hereby, ${\textbf{S}_{0}^{-1/2}}{\textbf{S}_{\tau }}{\textbf{S}_{0}^{-1/2}}$ can always be eigendecomposed.
The second part of the proof follows directly from Theorem 1 by verifying that $\boldsymbol{\Gamma }={\textbf{V}^{\text{H}}}{\textbf{S}_{0}^{-1/2}}$ satisfies Equations (1) and (2).  □
Lemma 4.
Let ${\boldsymbol{x}_{\text{•}}}:={\left({\boldsymbol{x}_{t}}\right)_{t\in \mathbb{N}}}$ be a process that satisfies Definition 2 and let $\boldsymbol{x}$ be a ${\mathbb{C}^{T\times d}}$-valued, $1\le d<T<\infty $, sampled stochastic process generated by ${\boldsymbol{x}_{\text{•}}}$. Denote $\hat{\boldsymbol{\mu }}:=\hat{\boldsymbol{\mu }}[\boldsymbol{x}]$ and $\hat{{\boldsymbol{S}_{j}}}:=\hat{{\boldsymbol{S}_{j}}}[\boldsymbol{x}]$, $j\in \{0,\tau \}$. Let ${\hat{\boldsymbol{\Sigma }}_{0}}$ be a conjugate symmetric matrix that satisfies ${\hat{\boldsymbol{\Sigma }}_{0}}\hat{{\boldsymbol{S}_{0}}}{\hat{\boldsymbol{\Sigma }}_{0}}={\boldsymbol{I}_{d}}$ and let ${\hat{\boldsymbol{\Sigma }}_{0}}\hat{{\boldsymbol{S}_{\tau }}}{\hat{\boldsymbol{\Sigma }}_{0}}=\hat{\boldsymbol{v}}{\hat{\boldsymbol{\Lambda }}_{\tau }}{\hat{\boldsymbol{v}}^{\text{H}}}$, where $\hat{\boldsymbol{v}}$ is unitary. Such matrices ${\hat{\boldsymbol{\Sigma }}_{0}}$, $\hat{\boldsymbol{v}}$ exist, and the mapping
\[ \hat{g}:\boldsymbol{C}\mapsto (\boldsymbol{C}-{\boldsymbol{1}_{T}}{\hat{\boldsymbol{\mu }}^{\top }}){\left({\hat{\boldsymbol{v}}^{\text{H}}}{\hat{\boldsymbol{\Sigma }}_{0}}\right)^{\top }}:{\mathbb{C}^{T\times d}}\to {\mathbb{C}^{T\times d}}\]
is a solution to the finite sample unmixing problem.
Proof of Lemma 4.
Note that Lemma 4 can be seen as a finite sample version of Corollary 2. By assumption, the components of ${\textbf{z}_{\text{•}}}$ have continuous marginals, which implies that covariance matrix estimates are almost surely nonsingular. Thus, we can almost surely find a unique matrix ${\hat{\boldsymbol{\Sigma }}_{0}}$ which is the conjugate symmetric inverse square root of ${\hat{\textbf{S}}_{0}}$. We proceed to verify that $\hat{\boldsymbol{\Gamma }}={\hat{\textbf{V}}^{\text{H}}}{\hat{\boldsymbol{\Sigma }}_{0}}$ satisfies the conditions of Definition 4. First,
\[ {\hat{\textbf{V}}^{\text{H}}}{\hat{\boldsymbol{\Sigma }}_{0}}{\hat{\textbf{S}}_{0}}{\left({\hat{\textbf{V}}^{\text{H}}}{\hat{\boldsymbol{\Sigma }}_{0}}\right)^{\text{H}}}={\hat{\textbf{V}}^{\text{H}}}\hat{\textbf{V}}={\textbf{I}_{d}},\]
and similarly,
\[ {\hat{\textbf{V}}^{\text{H}}}{\hat{\boldsymbol{\Sigma }}_{0}}{\hat{\textbf{S}}_{\tau }}{\left({\hat{\textbf{V}}^{\text{H}}}{\hat{\boldsymbol{\Sigma }}_{0}}\right)^{\text{H}}}={\hat{\textbf{V}}^{\text{H}}}\hat{\textbf{V}}{\hat{\boldsymbol{\Lambda }}_{\tau }}{\hat{\textbf{V}}^{\text{H}}}\hat{\textbf{V}}={\hat{\boldsymbol{\Lambda }}_{\tau }}.\]
 □
Lemma 5.
Let ${\boldsymbol{x}_{\text{•}}}:={\left({\boldsymbol{x}_{t}}\right)_{t\in \mathbb{N}}}$ be a process that satisfies Definition 2 and let $g:a\mapsto \boldsymbol{\Gamma }(\boldsymbol{a}-\boldsymbol{\mu })$ be a corresponding solution. Furthermore, let $\tilde{{\boldsymbol{x}_{\text{•}}}}:={\left(\boldsymbol{C}{\boldsymbol{x}_{t}}+\boldsymbol{b}\right)_{t\in \mathbb{N}}}$, where $\boldsymbol{C}\in {\mathbb{C}^{d\times d}}$ is nonsingular, $\boldsymbol{b}\in {\mathbb{C}^{d}}$ and let $\tilde{g}:a\mapsto \tilde{\boldsymbol{\Gamma }}(\boldsymbol{a}-\tilde{\boldsymbol{\mu }})$ be a solution for the process $\tilde{{\boldsymbol{x}_{\text{•}}}}$. Then,
\[ \tilde{\boldsymbol{\Gamma }}=\boldsymbol{J}\boldsymbol{\Gamma }{\boldsymbol{C}^{-1}}\]
for some phase-shift matrix $\boldsymbol{J}$.
Proof of Lemma 5.
By Lemma 3, we have that $\boldsymbol{\Gamma }\textbf{A}={\textbf{J}_{1}}$ and $\tilde{\boldsymbol{\Gamma }}\textbf{CA}={\textbf{J}_{2}}$, where ${\textbf{J}_{1}}$ and ${\textbf{J}_{2}}$ are some phase-shift matrices. Recall that the mixing matrix $\textbf{A}$ is nonsingular. Hereby, we get that $\boldsymbol{\Gamma }={\textbf{J}_{1}}{\textbf{A}^{-1}}$ and $\tilde{\boldsymbol{\Gamma }}={\textbf{J}_{2}}{(\textbf{CA})^{-1}}$. By using the obtained expressions for Γ and $\tilde{\boldsymbol{\Gamma }}$, we get that
\[ \tilde{\boldsymbol{\Gamma }}={\textbf{J}_{2}}{\textbf{I}_{d}}{\textbf{A}^{-1}}{\textbf{C}^{-1}}={\textbf{J}_{2}}{\textbf{J}_{1}^{\text{H}}}{\textbf{J}_{1}}{\textbf{A}^{-1}}{\textbf{C}^{-1}}={\textbf{J}_{3}}\boldsymbol{\Gamma }{\textbf{C}^{-1}},\]
where ${\textbf{J}_{3}}={\textbf{J}_{2}}{\textbf{J}_{1}^{\text{H}}}$ is a phase-shift matrix as the set of phase-shift matrices is stable under matrix multiplication.  □
Lemma 6.
Let ${\boldsymbol{x}_{\text{•}}}:={\left({\boldsymbol{x}_{t}}\right)_{t\in \mathbb{N}}}$ be a process that satisfies Definition 2 and let $\boldsymbol{x}$ be a ${\mathbb{C}^{T\times d}}$-valued, $1\le d<T<\infty $, sampled stochastic process generated by ${\boldsymbol{x}_{\text{•}}}$ and let $\hat{g}:\boldsymbol{C}\mapsto (\boldsymbol{C}-{\boldsymbol{1}_{T}}{\hat{\boldsymbol{\mu }}^{\top }}){\hat{\boldsymbol{\Gamma }}^{\top }}$ be a corresponding finite sample solution defined in Lemma 4. Let $\boldsymbol{b}\in {\mathbb{C}^{d\times d}}$ be nonsingular, let $\boldsymbol{b}\in {\mathbb{C}^{d}}$, let $\tilde{\boldsymbol{x}}=\boldsymbol{x}{\boldsymbol{b}^{\top }}+{\boldsymbol{1}_{T}}{\boldsymbol{b}^{\top }}$ and let $\tilde{g}:\boldsymbol{C}\mapsto (\boldsymbol{C}-{\boldsymbol{1}_{T}}{\tilde{\boldsymbol{\mu }}^{\top }}){\tilde{\boldsymbol{\Gamma }}^{\top }}$ be a corresponding finite sample solution for $\tilde{\boldsymbol{x}}$. Then, $\tilde{\boldsymbol{\Gamma }}=\boldsymbol{J}\hat{\boldsymbol{\Gamma }}{\boldsymbol{b}^{-1}}$ is a finite sample solution for $\tilde{\boldsymbol{x}}$, where $\boldsymbol{J}$ is some phase-shift matrix.
Proof of Lemma 6.
By Lemma 4, we can write $\hat{\boldsymbol{\Gamma }}={\hat{\textbf{V}}^{\text{H}}}{\hat{\boldsymbol{\Sigma }}_{0}}$, where ${\hat{\boldsymbol{\Sigma }}_{0}}$ denotes the conjugate symmetric inverse square root of ${\hat{\textbf{S}}_{0}}$. Note that the matrix-valued estimators are affine equivariant in the sense that
\[ {\hat{\textbf{S}}_{j}}[\textbf{X}{\textbf{B}^{\top }}-{\textbf{1}_{T}}{\textbf{b}^{\top }}]=\textbf{B}{\hat{\textbf{S}}_{j}}[\textbf{X}]{\textbf{B}^{\text{H}}},\]
$j\in \{0,\tau \}$, for all nonsingular ${\mathbb{C}^{d\times d}}$-matrices $\textbf{B}$ and all ${\mathbb{C}^{d}}$-vectors $\textbf{b}$. We proceed to verify that $\textbf{J}\hat{\boldsymbol{\Gamma }}{\textbf{B}^{-1}}$ satisfies the criteria of a finite sample solution, given in Definition 4. Using the affine equivariance property, we get that
\[ \textbf{J}\hat{\boldsymbol{\Gamma }}{\textbf{B}^{-1}}{\hat{\textbf{S}}_{0}}[\tilde{\textbf{X}}]{\left(\textbf{J}\hat{\boldsymbol{\Gamma }}{\textbf{B}^{-1}}\right)^{\text{H}}}=\textbf{J}{\hat{\textbf{V}}^{\text{H}}}{\hat{\boldsymbol{\Sigma }}_{0}}{\hat{\textbf{S}}_{0}}[\textbf{X}]{\hat{\boldsymbol{\Sigma }}_{0}}\hat{\textbf{V}}{\textbf{J}^{\text{H}}}={\textbf{JJ}^{\text{H}}}={\textbf{I}_{d}}\]
and
\[ \textbf{J}\hat{\boldsymbol{\Gamma }}{\textbf{B}^{-1}}{\hat{\textbf{S}}_{\tau }}[\tilde{\textbf{X}}]{\left(\textbf{J}\hat{\boldsymbol{\Gamma }}{\textbf{B}^{-1}}\right)^{\text{H}}}=\textbf{J}{\hat{\textbf{V}}^{\text{H}}}{\hat{\boldsymbol{\Sigma }}_{0}}{\hat{\textbf{S}}_{\tau }}[\textbf{X}]{\hat{\boldsymbol{\Sigma }}_{0}}\hat{\textbf{V}}{\textbf{J}^{\text{H}}}=\textbf{J}{\hat{\boldsymbol{\Lambda }}_{\tau }}{\textbf{J}^{\text{H}}}={\hat{\boldsymbol{\Lambda }}_{\tau }}.\]
Hereby, $\textbf{J}\hat{\boldsymbol{\Gamma }}{\textbf{B}^{-1}}$ is a finite sample solution for $\tilde{\textbf{X}}$ and the proof is complete.  □
Lemma 7.
Let $\hat{\boldsymbol{\Sigma }}$ be a T-indexed sequence of ${\mathbb{C}^{d\times d}}$-valued nonsingular estimates and let ${\alpha _{T}}$ be a real-valued sequence that satisfies ${\alpha _{T}}\uparrow \infty $ as $T\to \infty $. Furthermore, let ${\alpha _{T}}(\hat{\boldsymbol{\Sigma }}-{\boldsymbol{I}_{d}})={\mathcal{O}_{p}}(1)$. Then the following two statements hold.
  • (i) ${\hat{\boldsymbol{\Sigma }}^{a}}{\xrightarrow[T\to \infty ]{\mathbb{P}}}{\boldsymbol{I}_{d}}$, where $a\in \left\{-1,-\frac{1}{2},1\right\}$.
  • (ii) ${\alpha _{T}}({\hat{\boldsymbol{\Sigma }}^{a}}-{\boldsymbol{I}_{d}})={\mathcal{O}_{p}}(1)$, where $a\in \left\{-1,-\frac{1}{2}\right\}$.
Proof of Lemma 7.
The assumption ${\alpha _{T}}(\hat{\boldsymbol{\Sigma }}-{\textbf{I}_{d}})={\mathcal{O}_{p}}(1)$ implies that $\hat{\boldsymbol{\Sigma }}-{\textbf{I}_{d}}=(1/{\alpha _{T}}){\mathcal{O}_{p}}(1)={o_{p}}(1){\mathcal{O}_{p}}(1)={o_{p}}(1)$, that is, $\hat{\boldsymbol{\Sigma }}\xrightarrow{\mathbb{P}}{\textbf{I}_{d}}$, as $T\to \infty $. Note that the matrix inversion and the conjugate symmetric square root of the inversion are continuous transformations in the neighborhood of ${\boldsymbol{I}_{d}}$. Thus, we can apply the continuous mapping theorem which gives us that ${\hat{\boldsymbol{\Sigma }}^{a}}\xrightarrow{\mathbb{P}}{\boldsymbol{I}_{d}}$, as $T\to \infty $, for $a\in \{-1,-\frac{1}{2}\}$.
Using part (i) and Slutsky’s lemma, we get that the inverse is uniformly tight, since
\[\begin{aligned}{}{\alpha _{T}}({\hat{\boldsymbol{\Sigma }}^{-1}}-{\textbf{I}_{d}})& =({\hat{\boldsymbol{\Sigma }}^{-1}}-{\textbf{I}_{d}}){\alpha _{T}}({\textbf{I}_{d}}-\hat{\boldsymbol{\Sigma }})+{\alpha _{T}}({\textbf{I}_{d}}-\hat{\boldsymbol{\Sigma }})\\ {} & ={o_{p}}(1){\mathcal{O}_{p}}(1)+{\mathcal{O}_{p}}(1)={\mathcal{O}_{p}}(1).\end{aligned}\]
For the final part, first note that
\[ ({\hat{\boldsymbol{\Sigma }}^{-1}}-{\textbf{I}_{d}})=({\hat{\boldsymbol{\Sigma }}^{-\frac{1}{2}}}-{\textbf{I}_{d}})({\hat{\boldsymbol{\Sigma }}^{-\frac{1}{2}}}+{\textbf{I}_{d}}).\]
Since ${\hat{\boldsymbol{\Sigma }}^{-\frac{1}{2}}}+{\textbf{I}_{d}}$ converges in probability to $2{\textbf{I}_{d}}$ and the matrix inversion is a continuous transformation in the neighborhood of $2{\textbf{I}_{d}}$, the continuous mapping theorem gives us that ${({\hat{\boldsymbol{\Sigma }}^{-\frac{1}{2}}}+{\textbf{I}_{d}})^{-1}}{\xrightarrow[T\to \infty ]{\mathbb{P}}}\frac{1}{2}{\boldsymbol{I}_{d}}$. Thus by Slutsky’s lemma,
\[ {\alpha _{T}}({\hat{\boldsymbol{\Sigma }}^{-\frac{1}{2}}}-{\textbf{I}_{d}})={\alpha _{T}}({\hat{\boldsymbol{\Sigma }}^{-1}}-{\textbf{I}_{d}}){({\hat{\boldsymbol{\Sigma }}^{-\frac{1}{2}}}+{\textbf{I}_{d}})^{-1}}={\mathcal{O}_{p}}(1).\]
 □
Proof of Lemma 2.
Denote ${\hat{\textbf{S}}_{j}}:={\hat{\textbf{S}}_{j}}[\textbf{X}]$ and ${\textbf{S}_{j}}:={\textbf{S}_{j}}[{\textbf{x}_{t}}]$, where $j\in \{0,\tau \}$. By Lemmas 5 and 6, it is sufficient to consider the trivial mixing scenario $\textbf{A}={\textbf{I}_{d}}$. Note that ${\gamma _{T}}(\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}-\boldsymbol{\Gamma })={\gamma _{T}}(\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}{\boldsymbol{\Gamma }^{-1}}-{\textbf{I}_{d}})\boldsymbol{\Gamma }$ and by Lemma 6 we have that $\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}{\boldsymbol{\Gamma }^{-1}}$ is a finite sample solution for $\textbf{X}{\boldsymbol{\Gamma }^{\top }}$. Hereby, the trivial mixing can be generalized to any full-rank mixing matrix $\textbf{A}$.
Under trivial mixing, we have that ${\alpha _{T}}({\hat{\textbf{S}}_{0}}-{\textbf{I}_{d}})={\mathcal{O}_{p}}(1)$ and ${\beta _{T}}({\hat{\textbf{S}}_{\tau }}-{\boldsymbol{\Lambda }_{\tau }})={\mathcal{O}_{p}}(1)$. Furthermore, by Corollary 2 and Lemma 4, we have $\hat{\boldsymbol{\Gamma }}={\hat{\boldsymbol{V}}^{\text{H}}}{\hat{\boldsymbol{\Sigma }}_{0}}$ and $\boldsymbol{\Gamma }={\boldsymbol{V}^{\text{H}}}{\boldsymbol{\Sigma }_{0}}$, where ${\hat{\boldsymbol{\Sigma }}_{0}}={\hat{\textbf{S}}_{0}^{-1/2}}$ and ${\boldsymbol{\Sigma }_{0}}={\textbf{S}_{0}^{-1/2}}$ are conjugate symmetric. Note that under trivial mixing we have ${\textbf{S}_{0}^{-1/2}}={\textbf{I}_{d}}$ and $\textbf{V}=\textbf{J}$, where $\textbf{J}$ is some phase-shift matrix.
We next denote ${\hat{\boldsymbol{\Sigma }}_{\tau }}={\hat{\boldsymbol{\Sigma }}_{0}}{\hat{\textbf{S}}_{\tau }}{\hat{\boldsymbol{\Sigma }}_{0}}$ and show that ${\gamma _{T}}({\hat{\boldsymbol{\Sigma }}_{\tau }}-{\boldsymbol{\Lambda }_{\tau }})={\mathcal{O}_{p}}(1)$. The uniform tightness follows from Lemma 7, Slutsky’s lemma and the factorization
\[\begin{aligned}{}{\gamma _{T}}({\hat{\boldsymbol{\Sigma }}_{\tau }}-{\boldsymbol{\Lambda }_{\tau }})& ={\gamma _{T}}\left[({\hat{\boldsymbol{\Sigma }}_{0}}-{\textbf{I}_{d}})({\hat{\textbf{S}}_{\tau }}-{\boldsymbol{\Lambda }_{\tau }})({\hat{\boldsymbol{\Sigma }}_{0}}-{\textbf{I}_{d}})+({\hat{\boldsymbol{\Sigma }}_{0}}-{\textbf{I}_{d}})({\hat{\textbf{S}}_{\tau }}-{\boldsymbol{\Lambda }_{\tau }})\right.\\ {} & +({\hat{\boldsymbol{\Sigma }}_{0}}-{\textbf{I}_{d}}){\boldsymbol{\Lambda }_{\tau }}({\hat{\boldsymbol{\Sigma }}_{0}}-{\textbf{I}_{d}})+({\hat{\boldsymbol{\Sigma }}_{0}}-{\textbf{I}_{d}}){\boldsymbol{\Lambda }_{\tau }}+({\hat{\textbf{S}}_{\tau }}-{\boldsymbol{\Lambda }_{\tau }})\\ {} & +\left.({\hat{\textbf{S}}_{\tau }}-{\boldsymbol{\Lambda }_{\tau }})({\hat{\boldsymbol{\Sigma }}_{0}}-{\textbf{I}_{d}})+{\boldsymbol{\Lambda }_{\tau }}({\hat{\boldsymbol{\Sigma }}_{0}}-{\textbf{I}_{d}})\right]\\ {} & ={o_{p}}(1)+{\gamma _{T}}\left[({\hat{\boldsymbol{\Sigma }}_{0}}-{\textbf{I}_{d}}){\boldsymbol{\Lambda }_{\tau }}+({\hat{\textbf{S}}_{\tau }}-{\boldsymbol{\Lambda }_{\tau }})+{\boldsymbol{\Lambda }_{\tau }}({\hat{\boldsymbol{\Sigma }}_{0}}-{\textbf{I}_{d}})\right]\\ {} & ={\mathcal{O}_{p}}(1).\end{aligned}\]
Next, recall the eigendecomposition ${\hat{\boldsymbol{\Sigma }}_{\tau }}=\hat{\textbf{V}}{\hat{\boldsymbol{\Lambda }}_{\tau }}{\hat{\textbf{V}}^{\text{H}}}$ and that the space of unitary matrices is compact. Consequently, $\textbf{U}={\mathcal{O}_{p}}(1)$ for any unitary $\textbf{U}$. By right-multiplying both sides of the eigendecomposition with $\hat{\boldsymbol{V}}$ and by subtracting ${\boldsymbol{\Lambda }_{\tau }}$ from both sides, we get
\[ {\hat{\boldsymbol{\Sigma }}_{\tau }}\hat{\textbf{V}}-{\boldsymbol{\Lambda }_{\tau }}=\hat{\textbf{V}}{\hat{\boldsymbol{\Lambda }}_{\tau }}-{\boldsymbol{\Lambda }_{\tau }},\]
where both sides of the equation can be further factorized as
\[ ({\hat{\boldsymbol{\Sigma }}_{\tau }}-{\boldsymbol{\Lambda }_{\tau }})\hat{\textbf{V}}+{\boldsymbol{\Lambda }_{\tau }}(\hat{\textbf{V}}-{\textbf{I}_{d}})=(\hat{\textbf{V}}-{\textbf{I}_{d}}){\hat{\boldsymbol{\Lambda }}_{\tau }}+({\hat{\boldsymbol{\Lambda }}_{\tau }}-{\boldsymbol{\Lambda }_{\tau }}),\]
and multiplying by ${\gamma _{T}}$ and rearranging the terms yield
(13)
\[ {\gamma _{T}}\left[{\boldsymbol{\Lambda }_{\tau }}(\hat{\textbf{V}}-{\textbf{I}_{d}})-(\hat{\textbf{V}}-{\textbf{I}_{d}}){\hat{\boldsymbol{\Lambda }}_{\tau }}-({\hat{\boldsymbol{\Lambda }}_{\tau }}-{\boldsymbol{\Lambda }_{\tau }})\right]={\mathcal{O}_{p}}(1).\]
By assumption, the diagonal matrix ${\boldsymbol{\Lambda }_{\tau }}$ has distinct real-valued diagonal elements. Furthermore, since the matrix ${\hat{\boldsymbol{\Lambda }}_{\tau }}$ is obtained by estimating an eigendecomposition, it is also diagonal, with diagonal elements denoted as ${\hat{\lambda }_{\tau }^{(1)}},\dots ,{\hat{\lambda }_{\tau }^{(d)}}$. Then, consider the element $(j,k)$, $j\ne k$, of Equation (13):
(14)
\[ {\gamma _{T}}{\hat{\textbf{V}}_{jk}}({\lambda _{\tau }^{(j)}}-{\hat{\lambda }_{\tau }^{(k)}})={\mathcal{O}_{p}}(1).\]
Since ${\gamma _{T}}({\hat{\boldsymbol{\Sigma }}_{\tau }}-{\boldsymbol{\Lambda }_{\tau }})={\mathcal{O}_{p}}(1)$, we get that ${\hat{\lambda }_{\tau }^{(k)}}$ converges in probability to ${\lambda _{\tau }^{(k)}}$. Furthermore, since the diagonal elements of ${\boldsymbol{\Lambda }_{\tau }}$ are distinct, we can divide both sides of Equation (14) by ${\lambda _{\tau }^{(j)}}-{\hat{\lambda }_{\tau }^{(k)}}$, which gives us that ${\gamma _{T}}{[\hat{\textbf{V}}]_{jk}}={\mathcal{O}_{p}}(1)$ holds asymptotically when $j\ne k$.
Next, let $\hat{\textbf{J}}$ be a T-indexed sequence of phase-shift matrices, such that the diagonal entries of the product $\hat{\textbf{V}}{\hat{\textbf{J}}^{\text{H}}}$ are in the positive real axis for every $T\in \mathbb{N}$. Note that any complex number can be expressed in the form $r\exp (i\theta )$, where θ is the phase and r is the modulus, i.e., the length of the complex number. In the matrix case, we can similarly express $\hat{\textbf{V}}$ so that $\hat{\textbf{V}}={\hat{\textbf{V}}_{r}}{\hat{\textbf{V}}_{\theta }}$, where ${\hat{\textbf{V}}_{\theta }}$ is a phase-shift matrix that contains the phases of the diagonal elements of $\hat{\textbf{V}}$ and consequently ${\hat{\textbf{V}}_{r}}$ is a matrix with real-valued diagonal entries. Then, given a phase-shift matrix ${\hat{\textbf{V}}_{\theta }}=\text{diag}(\exp (i{\theta _{1}}),\dots ,\exp (i{\theta _{d}}))$, we can always construct a matrix ${\hat{\textbf{J}}^{\text{H}}}=\text{diag}(\exp (-i{\theta _{1}}),\dots ,\exp (-i{\theta _{d}}))$ such that the product $\hat{\textbf{V}}{\hat{\textbf{J}}^{\text{H}}}$ has real-valued diagonal entries.
After the diagonal entries have been rotated to the positive real axis, the rotated diagonal elements are equal to the corresponding moduli, that is, ${\hat{\textbf{V}}_{kk}}{\hat{\textbf{J}}_{kk}^{\text{H}}}=|{\hat{\textbf{V}}_{kk}}|$. Then, since $\hat{\textbf{V}}$ is unitary, each of its row vectors has length one and the absolute value of a single element is at most one, which gives us
\[ \left|1-{\hat{\textbf{V}}_{kk}}{\hat{\textbf{J}}_{kk}^{\text{H}}}\right|=1-{\hat{\textbf{V}}_{kk}}{\hat{\textbf{J}}_{kk}^{\text{H}}}=1-\left|{\hat{\textbf{V}}_{kk}}\right|=1-\sqrt{1-{\sum \limits_{\begin{array}{c}h=1\\ {} h\ne k\end{array}}^{d}}{\left|{\hat{\textbf{V}}_{hk}}\right|^{2}}}.\]
By unitarity of $\hat{\textbf{V}}$, the above square root is always between zero and one, and thus squaring the square root produces a smaller or equal number. Hereby, using the (asymptotic) uniform tightness of the off-diagonal elements, we get that asymptotically
\[ \left|1-{\hat{\textbf{V}}_{kk}}{\hat{\textbf{J}}_{kk}^{\text{H}}}\right|\le 1-1+{\sum \limits_{\begin{array}{c}h=1\\ {} h\ne k\end{array}}^{d}}{\left|{\hat{\textbf{V}}_{hk}}\right|^{2}}={\sum \limits_{\begin{array}{c}h=1\\ {} h\ne k\end{array}}^{d}}{\left|{\hat{\textbf{V}}_{hk}}\right|^{2}}={\mathcal{O}_{p}}(1/{\gamma _{T}^{2}}).\]
Hereby, by combining the results for the diagonal and off-diagonal elements, we get that there exists a sequence of phase-shift matrices $\hat{\textbf{J}}$ such that ${\gamma _{T}}(\hat{\textbf{V}}{\hat{\textbf{J}}^{\text{H}}}-{\textbf{I}_{d}})={\mathcal{O}_{p}}(1)$ holds asymptotically.
The claim of the lemma can then be written as
\[\begin{aligned}{}{\gamma _{T}}\left[\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}-{\textbf{I}_{d}}\right]& ={\gamma _{T}}\left[{\left((\hat{\textbf{V}}{\hat{\textbf{J}}^{\text{H}}}\right)^{\text{H}}}{\hat{\boldsymbol{\Sigma }}_{0}}-{\textbf{I}_{d}}\right]\\ {} & ={\left(\hat{\textbf{V}}{\hat{\textbf{J}}^{\text{H}}}\right)^{\text{H}}}{\gamma _{T}}({\hat{\boldsymbol{\Sigma }}_{0}}-{\textbf{I}_{d}})+{\gamma _{T}}{\left(\hat{\textbf{V}}{\hat{\textbf{J}}^{\text{H}}}-{\textbf{I}_{d}}\right)^{\text{H}}}\\ {} & ={\mathcal{O}_{p}}(1){\mathcal{O}_{p}}(1)+{\mathcal{O}_{p}}(1)={\mathcal{O}_{p}}(1).\end{aligned}\]
 □
Proof of Theorem 2.
Let ${\hat{\textbf{S}}_{j}}:={\hat{\textbf{S}}_{j}}[\textbf{X}]$, $\tau \in \{0,\tau \}$, and recall the equations from Definition 4
(15)
\[ \hat{\boldsymbol{\Gamma }}{\hat{\textbf{S}}_{0}}{\hat{\boldsymbol{\Gamma }}^{\text{H}}}={\textbf{I}_{d}}\hspace{2em}\hspace{2.5pt}\text{and}\hspace{2em}\hat{\boldsymbol{\Gamma }}{\hat{\textbf{S}}_{\tau }}{\hat{\boldsymbol{\Gamma }}^{\text{H}}}={\hat{\boldsymbol{\Lambda }}_{\tau }}.\]
In order to simplify the notation, we denote $\hat{\textbf{G}}:=\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}$, where $\hat{\textbf{J}}$ is a T-indexed sequence of phase-shift matrices such that the diagonal elements of $\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}$ are on the positive real axis. Under Lemma 2, we have that ${\gamma _{T}}(\hat{\textbf{G}}-{\textbf{I}_{d}})={\mathcal{O}_{p}}(1)$ and note that both parts of Equation (15) also hold for $\hat{\textbf{G}}$.
The left part of Equation (15) can then be expanded as
\[\begin{aligned}{}& (\hat{\textbf{G}}-{\textbf{I}_{d}}){\hat{\textbf{S}}_{0}}{\hat{\textbf{G}}^{\text{H}}}+({\hat{\textbf{S}}_{0}}-{\textbf{I}_{d}}){\hat{\textbf{G}}^{\text{H}}}+({\hat{\textbf{G}}^{\text{H}}}-{\textbf{I}_{d}})=0,\end{aligned}\]
where the left-hand side can be further expanded as
\[\begin{aligned}{}& (\hat{\textbf{G}}-{\textbf{I}_{d}})({\hat{\textbf{S}}_{0}}-{\textbf{I}_{d}})({\hat{\textbf{G}}^{\text{H}}}-{\textbf{I}_{d}})+(\hat{\textbf{G}}-{\textbf{I}_{d}})({\hat{\textbf{S}}_{0}}-{\textbf{I}_{d}})+(\hat{\textbf{G}}-{\textbf{I}_{d}})({\hat{\textbf{G}}^{\text{H}}}-{\textbf{I}_{d}})\\ {} & +(\hat{\textbf{G}}-{\textbf{I}_{d}})+({\hat{\textbf{S}}_{0}}-{\textbf{I}_{d}})({\hat{\textbf{G}}^{\text{H}}}-{\textbf{I}_{d}})+({\hat{\textbf{S}}_{0}}-{\textbf{I}_{d}})+({\hat{\textbf{G}}^{\text{H}}}-{\textbf{I}_{d}})\\ {} & =(\hat{\textbf{G}}-{\textbf{I}_{d}})+({\hat{\textbf{S}}_{0}}-{\textbf{I}_{d}})+({\hat{\textbf{G}}^{\text{H}}}-{\textbf{I}_{d}})+{\mathcal{O}_{p}}(1/{\gamma _{T}^{2}}).\end{aligned}\]
By rearranging the terms, we get the form
(16)
\[ ({\hat{\textbf{G}}^{\text{H}}}-{\textbf{I}_{d}})=({\textbf{I}_{d}}-\hat{\textbf{G}})+({\textbf{I}_{d}}-{\hat{\textbf{S}}_{0}})+{\mathcal{O}_{p}}(1/{\gamma _{T}^{2}}).\]
Similarly, the right part of Equation (15) can be expanded as
\[ (\hat{\textbf{G}}-{\textbf{I}_{d}}){\hat{\textbf{S}}_{\tau }}{\hat{\textbf{G}}^{\text{H}}}+({\hat{\textbf{S}}_{\tau }}-{\boldsymbol{\Lambda }_{\tau }}){\hat{\textbf{G}}^{\text{H}}}+{\boldsymbol{\Lambda }_{\tau }}({\hat{\textbf{G}}^{\text{H}}}-{\textbf{I}_{d}})+({\boldsymbol{\Lambda }_{\tau }}-{\hat{\boldsymbol{\Lambda }}_{\tau }})=0,\]
where the left-hand side can be further expanded as
\[\begin{aligned}{}& (\hat{\textbf{G}}-{\textbf{I}_{d}})({\hat{\textbf{S}}_{\tau }}-{\boldsymbol{\Lambda }_{\tau }})({\hat{\textbf{G}}^{\text{H}}}-{\textbf{I}_{d}})+(\hat{\textbf{G}}-{\textbf{I}_{d}})({\hat{\textbf{S}}_{\tau }}-{\boldsymbol{\Lambda }_{\tau }})\\ {} & +(\hat{\textbf{G}}-{\textbf{I}_{d}}){\boldsymbol{\Lambda }_{\tau }}({\hat{\textbf{G}}^{\text{H}}}-{\textbf{I}_{d}})+(\hat{\textbf{G}}-{\textbf{I}_{d}}){\boldsymbol{\Lambda }_{\tau }}+({\hat{\textbf{S}}_{\tau }}-{\boldsymbol{\Lambda }_{\tau }})({\hat{\textbf{G}}^{\text{H}}}-{\textbf{I}_{d}})\\ {} & +({\hat{\textbf{S}}_{\tau }}-{\boldsymbol{\Lambda }_{\tau }})+{\boldsymbol{\Lambda }_{\tau }}({\hat{\textbf{G}}^{\text{H}}}-{\textbf{I}_{d}})+({\boldsymbol{\Lambda }_{\tau }}-{\hat{\boldsymbol{\Lambda }}_{\tau }})\\ {} & =(\hat{\textbf{G}}-{\textbf{I}_{d}}){\boldsymbol{\Lambda }_{\tau }}+({\hat{\textbf{S}}_{\tau }}-{\boldsymbol{\Lambda }_{\tau }})+{\boldsymbol{\Lambda }_{\tau }}({\hat{\textbf{G}}^{\text{H}}}-{\textbf{I}_{d}})+({\boldsymbol{\Lambda }_{\tau }}-{\hat{\boldsymbol{\Lambda }}_{\tau }})+{\mathcal{O}_{p}}(1/{\gamma _{T}^{2}}).\end{aligned}\]
Hereby we obtain
\[ (\hat{\textbf{G}}-{\textbf{I}_{d}}){\boldsymbol{\Lambda }_{\tau }}+({\hat{\textbf{S}}_{\tau }}-{\boldsymbol{\Lambda }_{\tau }})+{\boldsymbol{\Lambda }_{\tau }}({\hat{\textbf{G}}^{\text{H}}}-{\textbf{I}_{d}})=({\hat{\boldsymbol{\Lambda }}_{\tau }}-{\boldsymbol{\Lambda }_{\tau }})+{\mathcal{O}_{p}}(1/{\gamma _{T}^{2}}),\]
which is, by using the expression for ${\hat{\textbf{G}}^{\text{H}}}-{\textbf{I}_{d}}$ given by Equation (16), equivalent to
(17)
\[\begin{aligned}{}\hat{\textbf{G}}{\boldsymbol{\Lambda }_{\tau }}-{\boldsymbol{\Lambda }_{\tau }}\hat{\textbf{G}}& =({\boldsymbol{\Lambda }_{\tau }}-{\hat{\textbf{S}}_{\tau }})+{\boldsymbol{\Lambda }_{\tau }}({\hat{\textbf{S}}_{0}}-{\textbf{I}_{d}})+({\hat{\boldsymbol{\Lambda }}_{\tau }}-{\boldsymbol{\Lambda }_{\tau }})+{\mathcal{O}_{p}}(1/{\gamma _{T}^{2}}).\end{aligned}\]
Recall that $\hat{\textbf{G}}=\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}$ and that ${\boldsymbol{\Lambda }_{\tau }}$, ${\hat{\boldsymbol{\Lambda }}_{\tau }}$ are both diagonal matrices with real-valued diagonal elements. Then, by rearranging the terms and by considering the element $(j,j)$ of Equation (16), we get that
\[ {\hat{\textbf{J}}_{jj}}{\hat{\boldsymbol{\Gamma }}_{jj}}-1=\frac{1}{2}\left(1-{\left[{\hat{\textbf{S}}_{0}}\right]_{jj}}\right)+{\mathcal{O}_{p}}(1/{\gamma _{T}^{2}}).\]
Similarly, by considering the element $(j,k)$, $j\ne k$, of Equation (17), we get that
\[ ({\lambda _{\tau }^{(k)}}-{\lambda _{\tau }^{(j)}}){\hat{\textbf{J}}_{jj}}{\hat{\boldsymbol{\Gamma }}_{jk}}={\lambda _{\tau }^{(j)}}{\left[{\hat{\textbf{S}}_{0}}\right]_{jk}}-{\left[{\hat{\textbf{S}}_{\tau }}\right]_{jk}}+{\mathcal{O}_{p}}(1/{\gamma _{T}^{2}}).\]
The theorem then follows by multiplying both sides by ${\gamma _{T}}$.  □
Proof of Theorem 4.
In order to incorporate the shift τ given in Definition 2, we first introduce some notation. For ${\boldsymbol{\eta }_{\text{•}}}$ given in Assumption 1, we set ${\textbf{y}_{\text{•}}^{\top }}=\left(\begin{array}{c@{\hskip10.0pt}c}{\boldsymbol{\eta }_{\text{•}}^{\top }}& {\boldsymbol{\eta }_{\text{•}+\tau }^{\top }}\end{array}\right)$. We also introduce functions ${g_{j}}({\textbf{y}_{\text{•}}}):{\mathbb{R}^{2\ell }}\to \mathbb{R},j\in \{1,2,\dots ,4d\}$, through relations ${g_{j}}({\textbf{y}_{\text{•}}})={\tilde{f}_{j}}({\boldsymbol{\eta }_{\text{•}}}),j\in \{1,2,\dots ,2d\}$ and ${g_{j+2d}}({\textbf{y}_{\text{•}}})={\tilde{f}_{j}}({\boldsymbol{\eta }_{\text{•}+\tau }}),j\in \{1,2,\dots ,2d\}$. That is, for $j\ge 2d+1$ the function ${g_{j}}$ corresponds to ${\tilde{f}_{j}}$ evaluated at shift ${\boldsymbol{\eta }_{\text{•}+\tau }}$.
The unsymmetrized autocovariance matrix estimator with lag τ is defined as
\[ {\tilde{\textbf{S}}_{\tau }}[\textbf{X}]=\frac{1}{T-\tau }{\sum \limits_{t=1}^{T-\tau }}\left({\textbf{X}_{t}}-\hat{\boldsymbol{\mu }}\right){\left({\textbf{X}_{t+\tau }}-\hat{\boldsymbol{\mu }}\right)^{\text{H}}},\]
where $\hat{\boldsymbol{\mu }}:=\hat{\boldsymbol{\mu }}[\textbf{X}]$. Let $\tilde{\textbf{X}}=\textbf{X}-{\textbf{1}_{T}}{\boldsymbol{\mu }^{\top }}$ and $\tilde{\boldsymbol{\mu }}=\hat{\boldsymbol{\mu }}-\boldsymbol{\mu }$. Under Assumption 1, we have that the kth component ${\tilde{\boldsymbol{X}}_{t}^{(k)}}$ has the same asymptotic autocovariance function as ${g_{k}}({\textbf{y}_{t}})+i{g_{k+d}}({\textbf{y}_{t}})-\mathbb{E}[{g_{k}}({\textbf{y}_{t}})+i{g_{k+d}}({\textbf{y}_{t}})]$. To improve the fluency of the proof, we denote ${f_{k}}({\textbf{y}_{t}}):={g_{k}}({\textbf{y}_{t}})-\mathbb{E}[{g_{k}}({\textbf{y}_{t}})]$.
We can reformulate the estimator as
\[\begin{aligned}{}{\tilde{\textbf{S}}_{\tau }}[\textbf{X}]=\frac{1}{T-\tau }{\sum \limits_{t=1}^{T-\tau }}& \left[{\tilde{\textbf{X}}_{t}}{\tilde{\textbf{X}}_{t+\tau }^{\text{H}}}-{\tilde{\textbf{X}}_{t}}{\tilde{\boldsymbol{\mu }}^{\text{H}}}-\tilde{\boldsymbol{\mu }}{\tilde{\textbf{X}}_{t+\tau }^{\text{H}}}+\tilde{\boldsymbol{\mu }}{\tilde{\boldsymbol{\mu }}^{\text{H}}}\right],\end{aligned}\]
where the last three terms of the sum are equal to
(18)
\[ \frac{1}{T-\tau }\left[{\sum \limits_{t=T-\tau +1}^{T}}{\tilde{\textbf{X}}_{t}}{\tilde{\boldsymbol{\mu }}^{\text{H}}}+{\sum \limits_{t=1}^{\tau }}\tilde{\boldsymbol{\mu }}{\tilde{\textbf{X}}_{t}^{\text{H}}}-(T+\tau )\tilde{\boldsymbol{\mu }}{\tilde{\boldsymbol{\mu }}^{\text{H}}}\right]={\mathcal{O}_{p}}(1/T).\]
Equation (18) holds, since the first two terms are finite sums. Furthermore, since $\mathbb{E}[\tilde{\boldsymbol{\mu }}]=\textbf{0}$, we get that under Assumption 1 the kth component ${\tilde{\boldsymbol{\mu }}^{(k)}}$ is asymptotically equivalent to $(1/T){\textstyle\sum _{t=1}^{T}}[{f_{k}}({\textbf{y}_{t}})-\mathbb{E}[{f_{k}}({\textbf{y}_{t}})]+i({f_{(k+d)}}({\textbf{y}_{t}})-\mathbb{E}[{f_{k+d}}({\textbf{y}_{t}})])]$. We can then directly apply Theorem 3, which in combination with Prohorov’s theorem and Corollary 1 gives that $\tilde{\boldsymbol{\mu }}={\mathcal{O}_{p}}(1/\sqrt{T})$, and consequently the last term of Equation (18) is ${\mathcal{O}_{p}}(1/T)$.
The symmetrized autocovariance estimator can then be expressed as
\[ {\hat{\textbf{S}}_{\tau }}[\textbf{X}]=\frac{1}{2(T-\tau )}{\sum \limits_{t=1}^{T-\tau }}\left({\tilde{\textbf{X}}_{t}}{\tilde{\textbf{X}}_{t+\tau }^{\text{H}}}+{\tilde{\textbf{X}}_{t+\tau }}{\tilde{\textbf{X}}_{t}^{\text{H}}}\right)+{\mathcal{O}_{p}}(1/T),\]
and the existence of fourth moments and the model assumptions given in Definition 2 give us that
\[ \mathbb{E}\left[{\hat{\textbf{S}}_{\tau }}[\textbf{X}]\right]=\frac{1}{2}\left({\ddot{\boldsymbol{\Lambda }}_{\tau }}+{\ddot{\boldsymbol{\Lambda }}_{\tau }^{\text{H}}}\right)+\mathcal{O}(1/T),\]
where ${\ddot{\boldsymbol{\Lambda }}_{0}}={\textbf{I}_{d}}$. Hereby, we have that
\[\begin{aligned}{}\sqrt{T}\left({\hat{\textbf{S}}_{0}}[\textbf{X}]-{\textbf{I}_{d}}\right)& =\frac{1}{\sqrt{T}}{\sum \limits_{t=1}^{T}}\left({\tilde{\textbf{X}}_{t}}{\tilde{\textbf{X}}_{t}^{\text{H}}}-\mathbb{E}\left[{\tilde{\textbf{X}}_{t}}{\tilde{\textbf{X}}_{t}^{\text{H}}}\right]\right)+{o_{p}}(1).\end{aligned}\]
Note that asymptotically it is indifferent whether we scale the autocovariance estimators with $1/T$ or $1/(T-\tau )$. Thus, for the fixed τ, that satisfies Definition 2, we have that $\mathbb{E}[{\boldsymbol{\Lambda }_{\tau }}{\hat{\textbf{S}}_{0}}[\textbf{X}]-{\hat{\textbf{S}}_{\tau }}[\textbf{X}]]=\mathcal{O}(1/T)$, which gives that
\[ \sqrt{T}\left({\boldsymbol{\Lambda }_{\tau }}{\hat{\textbf{S}}_{0}}[\textbf{X}]-{\hat{\textbf{S}}_{\tau }}[\textbf{X}]\right)\]
is equal to
\[\begin{aligned}{}\frac{1}{\sqrt{T}}{\sum \limits_{t=1}^{T}}& \frac{1}{2}\left(2{\boldsymbol{\Lambda }_{\tau }}{\tilde{\textbf{X}}_{t}}{\tilde{\textbf{X}}_{t}^{\text{H}}}-{\tilde{\textbf{X}}_{t}}{\tilde{\textbf{X}}_{t+\tau }^{\text{H}}}-{\tilde{\textbf{X}}_{t+\tau }}{\tilde{\textbf{X}}_{t}^{\text{H}}}\right.\\ {} & \left.-\mathbb{E}\left[2{\boldsymbol{\Lambda }_{\tau }}{\tilde{\textbf{X}}_{t}}{\tilde{\textbf{X}}_{t}^{\text{H}}}-{\tilde{\textbf{X}}_{t}}{\tilde{\textbf{X}}_{t+\tau }^{\text{H}}}-{\tilde{\textbf{X}}_{t+\tau }}{\tilde{\textbf{X}}_{t}^{\text{H}}}\right]\right)+{o_{p}}(1),\end{aligned}\]
such that in the above expression the terms ${\tilde{\textbf{X}}_{k+\tau }}=\textbf{0}$ when $k+\tau >T$.
Let $\hat{\textbf{J}}$ be the T-indexed sequence that sets the diagonal elements of $\hat{\boldsymbol{\Gamma }}$ to the positive real axis. By Theorem 2 and Assumption 1, we have that the diagonal element $(j,j)$ of $\sqrt{T}(\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}-{\textbf{I}_{d}})$ is asymptotically equivalent to ${\textbf{H}_{jj}}+{o_{p}}(1)$ defined as
\[ {\textbf{H}_{jj}}=\frac{1}{2\sqrt{T}}{\sum \limits_{t=1}^{T}}\left({h_{j,j}}({\textbf{y}_{t}})-\mathbb{E}\left[{h_{j,j}}({\textbf{y}_{t}})\right]+i\left({\tilde{h}_{j,j}}({\textbf{y}_{t}})-\mathbb{E}\left[{\tilde{h}_{j,j}}({\textbf{y}_{t}})\right]\right)\right),\]
such that for $j\in \{1,\dots d\}$ we have ${h_{j,j}}({\textbf{y}_{t}})={\left({f_{j}}({\textbf{y}_{t}})\right)^{2}}+{\left({f_{j+d}}({\textbf{y}_{t}})\right)^{2}}$ and ${\tilde{h}_{j,j}}({\textbf{y}_{t}})=0$.
The off-diagonal element $(j,k)$, $j\ne k$, of $\sqrt{T}(\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}-{\textbf{I}_{d}})$ is asymptotically equivalent to ${\textbf{H}_{jk}}+{o_{p}}(1)$, where ${\textbf{H}_{jk}}$ is equal to
\[ \frac{1}{\sqrt{T}\left({\lambda _{\tau }^{(k)}}-{\lambda _{\tau }^{(j)}}\right)}{\sum \limits_{t=1}^{T}}\left({h_{j,k}}({\textbf{y}_{t}})-\mathbb{E}\left[{h_{j,k}}({\textbf{y}_{t}})\right]+i\left({\tilde{h}_{j,k}}({\textbf{y}_{t}})-\mathbb{E}\left[{\tilde{h}_{j,k}}({\textbf{y}_{t}})\right]\right)\right),\]
where
\[\begin{aligned}{}{h_{j,k}}({\textbf{y}_{t}})=& {\lambda _{\tau }^{(j)}}\left[{f_{j}}({\textbf{y}_{t}}){f_{k}}({\textbf{y}_{t}})+{f_{j+d}}({\textbf{y}_{t}}){f_{k+d}}({\textbf{y}_{t}})\right]-\frac{1}{2}\left[{f_{j}}({\textbf{y}_{t}}){f_{k+2d}}({\textbf{y}_{t}})\right.\\ {} & \left.+{f_{j+d}}({\textbf{y}_{t}}){f_{k+3d}}({\textbf{y}_{t}})+{f_{j+2d}}({\textbf{y}_{t}}){f_{k}}({\textbf{y}_{t}})+{f_{j+3d}}({\textbf{y}_{t}}){f_{k+d}}({\textbf{y}_{t}})\right]\end{aligned}\]
and
\[\begin{aligned}{}{\tilde{h}_{j,k}}({\textbf{y}_{t}})=& {\lambda _{\tau }^{(j)}}\left[{f_{j+d}}({\textbf{y}_{t}}){f_{k}}({\textbf{y}_{t}})-{f_{j}}({\textbf{y}_{t}}){f_{k+d}}({\textbf{y}_{t}})\right]-\frac{1}{2}\left[{f_{j+d}}({\textbf{y}_{t}}){f_{k+2d}}({\textbf{y}_{t}})\right.\\ {} & \left.+{f_{j+3d}}({\textbf{y}_{t}}){f_{k}}({\textbf{y}_{t}})-{f_{j}}({\textbf{y}_{t}}){f_{k+3d}}({\textbf{y}_{t}})-{f_{j+2d}}({\textbf{y}_{t}}){f_{k+d}}({\textbf{y}_{t}})\right].\end{aligned}\]
Let $\mathfrak{Re}[\textbf{H}]$ be the real part and $\mathfrak{Im}[\textbf{H}]$ be the imaginary part of the matrix $\textbf{H}$, and let $\textbf{v}=\text{vec}\left(\begin{array}{c@{\hskip10.0pt}c}\mathfrak{Re}[\textbf{H}]& \mathfrak{Im}[\textbf{H}]\end{array}\right)$, such that the ${\mathbb{C}^{2{d^{2}}}}$-valued vector $\textbf{v}$ first contains the columns of the real part and then the columns of the imaginary part.
Under Assumption 1, we have that ${f_{j}}({\textbf{y}_{t}})$ has finite fourth moments for every $j\in \{1,\dots 4d\}$ and every $t\in \mathbb{N}$. Hereby, the Cauchy–Schwarz inequality gives that every ${h_{j,k}}({\textbf{y}_{t}})$ and ${\tilde{h}_{j,k}}({\textbf{y}_{t}})$ are square-integrable for every $t\in \mathbb{N}$. Furthermore, covariances and cross-covariances of the Gaussian process ${\textbf{y}_{\text{•}}}$ are summable. Hereby, we can apply Theorem 3 for $\textbf{v}$, which gives that
\[ \textbf{v}{\xrightarrow[T\to \infty ]{\mathcal{D}}}\boldsymbol{\rho }\sim {\mathcal{N}_{2{d^{2}}}}(\textbf{0},{\boldsymbol{\Sigma }_{\boldsymbol{\rho }}}),\hspace{1em}\hspace{2.5pt}\text{where}\hspace{2.5pt}\hspace{1em}{\boldsymbol{\Sigma }_{\boldsymbol{\rho }}}=\left(\begin{array}{c@{\hskip10.0pt}c}{\boldsymbol{\Sigma }_{{\boldsymbol{\rho }_{1}}}}& {\boldsymbol{\Sigma }_{{\boldsymbol{\rho }_{12}}}}\\ {} {\boldsymbol{\Sigma }_{{\boldsymbol{\rho }_{12}}}^{\top }}& {\boldsymbol{\Sigma }_{{\boldsymbol{\rho }_{2}}}}\end{array}\right).\]
Corollary 1 then gives us
\[ \sqrt{T}\cdot \text{vec}\left(\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}-{\textbf{I}_{d}}\right){\xrightarrow[T\to \infty ]{\mathcal{D}}}\boldsymbol{\nu }\sim {\mathcal{N}_{{d^{2}}}}(\textbf{0},{\boldsymbol{\Sigma }_{\boldsymbol{\nu }}},{\textbf{P}_{\boldsymbol{\nu }}}),\]
where ${\boldsymbol{\Sigma }_{\boldsymbol{\nu }}}={\boldsymbol{\Sigma }_{{\boldsymbol{\rho }_{1}}}}+{\boldsymbol{\Sigma }_{{\boldsymbol{\rho }_{2}}}}+i({\boldsymbol{\Sigma }_{{\boldsymbol{\rho }_{12}}}^{\top }}-{\boldsymbol{\Sigma }_{{\boldsymbol{\rho }_{12}}}})$ and ${\textbf{P}_{\boldsymbol{\nu }}}={\boldsymbol{\Sigma }_{{\boldsymbol{\rho }_{1}}}}-{\boldsymbol{\Sigma }_{{\boldsymbol{\rho }_{2}}}}+i({\boldsymbol{\Sigma }_{{\boldsymbol{\rho }_{12}}}^{\top }}+{\boldsymbol{\Sigma }_{{\boldsymbol{\rho }_{12}}}})$. Denoting ${\boldsymbol{\nu }^{\top }}=\left(\begin{array}{c@{\hskip10.0pt}c}{\nu _{1,1}}& {\nu _{2,1}}\cdots {\nu _{d,d}}\end{array}\right)$, we get that ${\boldsymbol{\Sigma }_{\boldsymbol{\nu }}}$ has entries of the form
\[\begin{aligned}{}\mathbb{E}\left[{\nu _{j,k}}{\nu _{l,m}^{\ast }}\right]& ={\text{S}_{0}}\left[{h_{j,k}}({\textbf{y}_{1}}),{h_{l,m}}({\textbf{y}_{1}})\right]+{\text{S}_{0}}\left[{\tilde{h}_{j,k}}({\textbf{y}_{1}}),{\tilde{h}_{l,m}}({\textbf{y}_{1}})\right]\\ {} & +{\sum \limits_{\tau =1}^{\infty }}\left({\text{R}_{\tau }}\left[{h_{j,k}}({\textbf{y}_{1}}),{h_{l,m}}({\textbf{y}_{1}})\right]+{\text{R}_{\tau }}\left[{\tilde{h}_{j,k}}({\textbf{y}_{1}}),{\tilde{h}_{l,m}}({\textbf{y}_{1}})\right]\right)\\ {} & +i\left({\text{S}_{0}}\left[{\tilde{h}_{j,k}}({\textbf{y}_{1}}),{h_{l,m}}({\textbf{y}_{1}})\right]-{\text{S}_{0}}\left[{h_{j,k}}({\textbf{y}_{1}}),{\tilde{h}_{l,m}}({\textbf{y}_{1}})\right]\right)\\ {} & +i{\sum \limits_{\tau =1}^{\infty }}\left({\text{R}_{\tau }}\left[{\tilde{h}_{j,k}}({\textbf{y}_{1}}),{h_{l,m}}({\textbf{y}_{1}})\right]-{\text{R}_{\tau }}\left[{h_{j,k}}({\textbf{y}_{1}}),{\tilde{h}_{l,m}}({\textbf{y}_{1}})\right]\right),\end{aligned}\]
where ${\text{S}_{0}}$ and ${\text{R}_{\tau }}$ are
\[\begin{aligned}{}& {\ddot{\text{S}}_{\tau }}[{x_{t}},{y_{s}}]=\mathbb{E}\left[({x_{t}}-{\mu _{{x_{t}}}}){({y_{s+\tau }}-{\mu _{{y_{s+\tau }}}})^{\ast }}\right]\hspace{1em}\hspace{2.5pt}\text{and}\hspace{2.5pt}\\ {} & {\text{R}_{\tau }}[{x_{t}},{y_{s}}]={\ddot{\text{S}}_{\tau }}[{x_{t}},{y_{s}}]+{\ddot{\text{S}}_{\tau }}[{y_{s}},{x_{t}}].\end{aligned}\]
The elements of the relation matrix ${\textbf{P}_{\boldsymbol{\nu }}}$ have the form
\[\begin{aligned}{}\mathbb{E}\left[{\nu _{j,k}}{\nu _{l,m}}\right]& ={\text{S}_{0}}\left[{h_{j,k}}({\textbf{y}_{1}}),{h_{l,m}}({\textbf{y}_{1}})\right]-{\text{S}_{0}}\left[{\tilde{h}_{j,k}}({\textbf{y}_{1}}),{\tilde{h}_{l,m}}({\textbf{y}_{1}})\right]\\ {} & +{\sum \limits_{\tau =1}^{\infty }}\left({\text{R}_{\tau }}\left[{h_{j,k}}({\textbf{y}_{1}}),{h_{l,m}}({\textbf{y}_{1}})\right]-{\text{R}_{\tau }}\left[{\tilde{h}_{j,k}}({\textbf{y}_{1}}),{\tilde{h}_{l,m}}({\textbf{y}_{1}})\right]\right)\\ {} & +i\left({\text{S}_{0}}\left[{\tilde{h}_{j,k}}({\textbf{y}_{1}}),{h_{l,m}}({\textbf{y}_{1}})\right]+{\text{S}_{0}}\left[{h_{j,k}}({\textbf{y}_{1}}),{\tilde{h}_{l,m}}({\textbf{y}_{1}})\right]\right)\\ {} & +i{\sum \limits_{\tau =1}^{\infty }}\left({\text{R}_{\tau }}\left[{\tilde{h}_{j,k}}({\textbf{y}_{1}}),{h_{l,m}}({\textbf{y}_{1}})\right]+{\text{R}_{\tau }}\left[{h_{j,k}}({\textbf{y}_{1}}),{\tilde{h}_{l,m}}({\textbf{y}_{1}})\right]\right).\end{aligned}\]
Note that the diagonal elements of ${\boldsymbol{\Sigma }_{\boldsymbol{\nu }}}$ are real-valued and recall that ${\tilde{h}_{j,j}}=0$ for every $j\in \{1,\dots d\}$.  □
Lemma 8.
Let Assumptions 2 and 3 be satisfied. Let $\hat{\boldsymbol{\mu }}:=\hat{\boldsymbol{\mu }}[\boldsymbol{x}]$ and $\tilde{\boldsymbol{\mu }}=\hat{\boldsymbol{\mu }}-\boldsymbol{\mu }$. Then for every element $(j,k)$, such that $j,k\in \{1,2,\dots ,d\}$ and $j\ne k$, we have ${[{T^{\gamma }}\tilde{\boldsymbol{\mu }}{\tilde{\boldsymbol{\mu }}^{\text{H}}}]_{jk}}{\xrightarrow[T\to \infty ]{{\mathcal{L}_{1}}}}0$, where ${\xrightarrow[]{{\mathcal{L}_{1}}}}$ denotes convergence in the space ${\mathcal{L}_{1}}(\Omega ,\mathbb{P})$. Consequently, the convergence also holds in probability.
Proof of Lemma 8.
Recall that the real part of the kth component ${\tilde{\boldsymbol{\mu }}^{(k)}}$ is given by
\[ \frac{1}{T}{\sum \limits_{k=1}^{T}}\left[{\tilde{f}_{k}}({\boldsymbol{\eta }_{t}^{(k)}})-\mathbb{E}\left({\tilde{f}_{k}}({\boldsymbol{\eta }_{t}^{(k)}})\right)\right].\]
As in the proof of Proposition 2, we may assume $k\in I$ and that ${q_{1,k}}(2{H_{k}}-2)>-1$, in which case we have
\[\begin{aligned}{}\mathbb{E}\left[{\left(\mathfrak{Re}({\tilde{\boldsymbol{\mu }}^{(k)}})\right)^{2}}\right]& \le \frac{{c_{1}}}{T}{\sum \limits_{t=1}^{T}}\left|{S_{0}}\left[{\tilde{f}_{k}}({\boldsymbol{\eta }_{t}^{(k)}}),{\tilde{f}_{k}}({\boldsymbol{\eta }_{1}^{(k)}})\right]\right|\\ {} & \le \frac{{c_{2}}}{T}{\sum \limits_{t=1}^{T}}{t^{{q_{1,k}}(2{H_{k}}-2)}}\le {c_{3}}{T^{{q_{1,k}}(2{H_{k}}-2)}},\end{aligned}\]
where ${c_{1}}$, ${c_{2}},{c_{3}}\in (0,\infty )$. Similarly, for $j\in I$ and ${c_{4}}\in (0,\infty )$, we obtain
\[ \mathbb{E}\left[{\left(\mathfrak{Re}({\tilde{\boldsymbol{\mu }}^{(j)}})\right)^{2}}\right]\le {c_{4}}{T^{{q_{1,j}}(2{H_{j}}-2)}}.\]
Hereby, the Cauchy–Schwartz inequality yields
\[\begin{aligned}{}\mathbb{E}\left[\left|{T^{\gamma }}\mathfrak{Re}({\tilde{\boldsymbol{\mu }}^{(k)}})\mathfrak{Re}({\tilde{\boldsymbol{\mu }}^{(j)}})\right|\right]& \le {T^{\gamma }}\sqrt{\mathbb{E}\left[{\left(\mathfrak{Re}({\tilde{\boldsymbol{\mu }}^{(k)}})\right)^{2}}\right]\mathbb{E}\left[{\left(\mathfrak{Re}({\tilde{\boldsymbol{\mu }}^{(j)}})\right)^{2}}\right]}\\ {} & \le {c_{5}}{T^{\gamma }}{T^{{q_{1,k}}({H_{k}}-1)+{q_{1,j}}({H_{j}}-1)}},\end{aligned}\]
which converges to zero by Equation (19) and ${c_{5}}\in (0,\infty )$. Note that identical arguments remain valid for the imaginary parts. Finally, the short-range dependency cases and the boundary cases ${q_{1,k}}(2{H_{k}}-2)=-1$ can be treated similarly as in the proof of Proposition 2.  □
Proof of Proposition 2.
With γ given by Equation (9), Condition (7) translates into
(19)
\[ \gamma <\underset{j,k\in I}{\min }\left\{{q_{1,k}}(1-{H_{k}})+{q_{1,j}}(1-{H_{j}}),1/2\right\}.\]
Using independence of the processes ${\boldsymbol{\eta }_{\text{•}}^{(k)}}$ and ${\boldsymbol{\eta }_{\text{•}}^{(j)}}$ together with straightforward computations, we obtain
\[\begin{aligned}{}& \mathbb{E}{\left[{T^{\gamma -1}}{\sum \limits_{t=1}^{T}}\left[\left({\tilde{f}_{k}}({\boldsymbol{\eta }_{t}^{(k)}})-\mathbb{E}\left({\tilde{f}_{k}}({\boldsymbol{\eta }_{t}^{(k)}})\right)\right)\left({\tilde{f}_{j}}({\boldsymbol{\eta }_{t}^{(j)}})-\mathbb{E}\left({\tilde{f}_{j}}({\boldsymbol{\eta }_{t}^{(j)}})\right)\right)\right]\right]^{2}}\\ {} & ={T^{2\gamma -2}}{\sum \limits_{t,s=1}^{T}}{\text{S}_{0}}\left[{\tilde{f}_{k}}({\boldsymbol{\eta }_{t}^{(k)}}),{\tilde{f}_{k}}({\boldsymbol{\eta }_{s}^{(k)}})\right]{\text{S}_{0}}\left[{\tilde{f}_{j}}({\boldsymbol{\eta }_{t}^{(j)}}),{\tilde{f}_{j}}({\boldsymbol{\eta }_{s}^{(j)}})\right]\\ {} & \le {c_{1}}{T^{2\gamma -1}}{\sum \limits_{t=1}^{T}}\left|{\text{S}_{0}}\left[{\tilde{f}_{k}}({\boldsymbol{\eta }_{t}^{(k)}}),{\tilde{f}_{k}}({\boldsymbol{\eta }_{1}^{(k)}})\right]{\text{S}_{0}}\left[{\tilde{f}_{j}}({\boldsymbol{\eta }_{t}^{(j)}}),{\tilde{f}_{j}}({\boldsymbol{\eta }_{1}^{(j)}})\right]\right|,\end{aligned}\]
where ${c_{1}}\in (0,\infty )$ and the inequality follows from change of variables and the Fubini–Tonelli theorem. Note that if
\[ {\sum \limits_{t=1}^{\infty }}\left|{\text{S}_{0}}\left({\tilde{f}_{k}}({\boldsymbol{\eta }_{t}^{(k)}}),{\tilde{f}_{k}}({\boldsymbol{\eta }_{1}^{(k)}})\right){\text{S}_{0}}\left({\tilde{f}_{j}}({\boldsymbol{\eta }_{t}^{(j)}}),{\tilde{f}_{j}}({\boldsymbol{\eta }_{1}^{(j)}})\right)\right|<+\infty ,\]
the claim follows, since by Equation (19) we have that $\gamma <1/2$. In particular, this is the case if ${f_{k}}({\boldsymbol{\eta }^{(k)}})$ or ${f_{j}}({\boldsymbol{\eta }^{(j)}})$ is short-range dependent. Hence we may assume that $j,k\in I$ and that ${q_{1,k}}(2{H_{k}}-2)+{q_{1,j}}(2{H_{j}}-2)\ge -1$. Let first ${q_{1,k}}(2{H_{k}}-2)+{q_{1,j}}(2{H_{j}}-2)>-1$. In this case we get
\[\begin{aligned}{}& {T^{2\gamma -1}}{\sum \limits_{t=1}^{T}}\left|{\text{S}_{0}}\left({\tilde{f}_{k}}({\boldsymbol{\eta }_{t}^{(k)}}),{\tilde{f}_{k}}({\boldsymbol{\eta }_{1}^{(k)}})\right){\text{S}_{0}}\left({\tilde{f}_{j}}({\boldsymbol{\eta }_{t}^{(j)}}),{\tilde{f}_{j}}({\boldsymbol{\eta }_{1}^{(j)}})\right)\right|\\ {} & \le {c_{2}}{T^{2\gamma -1}}{\sum \limits_{t=1}^{T}}{t^{{q_{1,k}}(2{H_{k}}-2)+{q_{1,j}}(2{H_{j}}-2)}}\\ {} & \le {c_{2}}{T^{2\gamma +{q_{1,k}}(2{H_{k}}-2)+{q_{1,j}}(2{H_{j}}-2)}},\end{aligned}\]
which converges to zero by Equation (19) and ${c_{2}}\in (0,\infty )$. Similarly, if ${q_{1,k}}(2{H_{k}}-2)+{q_{1,j}}(2{H_{j}}-2)=-1$, we obtain
\[\begin{aligned}{}& {T^{2\gamma -1}}{\sum \limits_{t=1}^{T}}\left|{\text{S}_{0}}\left({\tilde{f}_{k}}({\boldsymbol{\eta }_{t}^{(k)}}),{\tilde{f}_{k}}({\boldsymbol{\eta }_{1}^{(k)}})\right){\text{S}_{0}}\left({\tilde{f}_{j}}({\boldsymbol{\eta }_{t}^{(j)}}),{\tilde{f}_{j}}({\boldsymbol{\eta }_{1}^{(j)}})\right)\right|\\ {} & \le {c_{3}}{T^{2\gamma -1}}{\sum \limits_{t=1}^{T}}{t^{-1}}\le {c_{4}}{T^{2\gamma -1}}\log T,\end{aligned}\]
which converges to zero, since $\gamma <\frac{1}{2}$ and ${c_{3}},{c_{4}}\in (0,\infty )$. This verifies Equation (10). Treating Equation (11) similarly concludes the proof.  □
Proof of Theorem 5.
Similarly as in the proof of Theorem 4, we set
\[ {\textbf{y}_{\text{•}}^{\top }}=\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c@{\hskip10.0pt}c@{\hskip10.0pt}c@{\hskip10.0pt}c@{\hskip10.0pt}c@{\hskip10.0pt}c}{\boldsymbol{\eta }_{\text{•}}^{(1)}}& {\boldsymbol{\eta }_{\text{•}}^{(2)}}& \dots & {\boldsymbol{\eta }_{\text{•}}^{(2d)}}& {\boldsymbol{\eta }_{\text{•}+\tau }^{(1)}}& {\boldsymbol{\eta }_{\text{•}+\tau }^{(2)}}& \dots & {\boldsymbol{\eta }_{\text{•}+\tau }^{(2d)}}\end{array}\right).\]
We also introduce functions ${g_{j}}({\textbf{y}_{\text{•}}}):{\mathbb{R}^{4d}}\to \mathbb{R},j\in \{1,2,\dots ,4d\}$, through relations ${g_{j}}({\textbf{y}_{\text{•}}})={\tilde{f}_{j}}({\boldsymbol{\eta }_{\text{•}}^{(j)}}),j\in \{1,2,\dots ,2d\}$, and ${g_{j+2d}}({\textbf{y}_{\text{•}}})={\tilde{f}_{j}}({\boldsymbol{\eta }_{\text{•}+\tau }^{(j)}})$, $j\in \{1,2,\dots ,2d\}$. Moreover, we denote ${f_{k}}({\textbf{y}_{t}}):={g_{k}}({\textbf{y}_{t}})-\mathbb{E}[{g_{k}}({\textbf{y}_{t}})]$, which gives that $\mathbb{E}\left[{f_{k}}({\textbf{y}_{t}})\right]=0$ for all $k\in \{1,2,\dots ,4d\}$.
Following the proof of Theorem 4, we can express the symmetrized autocovariance matrix estimator as
\[ {\hat{\textbf{S}}_{\tau }}[\textbf{X}]=\frac{1}{2(T-\tau )}{\sum \limits_{t=1}^{T-\tau }}\left({\tilde{\textbf{X}}_{t}}{\tilde{\textbf{X}}_{t+\tau }^{\text{H}}}+{\tilde{\textbf{X}}_{t+\tau }}{\tilde{\textbf{X}}_{t}^{\text{H}}}\right)+\tilde{\boldsymbol{\mu }}{\tilde{\boldsymbol{\mu }}^{\text{H}}}+{\mathcal{O}_{p}}(1/T).\]
We begin by showing that the off-diagonal terms vanish. Following the proof of Theorem 4, we obtain
\[ {T^{\gamma }}\left({\boldsymbol{\Lambda }_{\tau }}{\hat{\textbf{S}}_{0}}[\textbf{X}]-{\hat{\textbf{S}}_{\tau }}[\textbf{X}]\right),\]
which is equal to
\[\begin{aligned}{}{T^{\gamma -1}}{\sum \limits_{t=1}^{T}}& \frac{1}{2}\left(2{\boldsymbol{\Lambda }_{\tau }}{\tilde{\textbf{X}}_{t}}{\tilde{\textbf{X}}_{t}^{\text{H}}}-{\tilde{\textbf{X}}_{t}}{\tilde{\textbf{X}}_{t+\tau }^{\text{H}}}-{\tilde{\textbf{X}}_{t+\tau }}{\tilde{\textbf{X}}_{t}^{\text{H}}}\right)+{T^{\gamma }}\left({\boldsymbol{\Lambda }_{\tau }}-{\textbf{I}_{d}}\right)\tilde{\boldsymbol{\mu }}{\tilde{\boldsymbol{\mu }}^{\text{H}}}+{o_{p}}(1),\end{aligned}\]
with the convention ${\tilde{\textbf{X}}_{k+\tau }}=\textbf{0}$, when $k+\tau >T$. By Lemma 8, the off-diagonal elements of ${T^{\gamma }}\tilde{\boldsymbol{\mu }}{\tilde{\boldsymbol{\mu }}^{\text{H}}}$ vanish. Thus, the off-diagonal element $(j,k)$, $j\ne k$, of ${T^{\gamma }}(\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}-{\textbf{I}_{d}})$ is asymptotically equivalent to ${\textbf{H}_{jk}}+{o_{p}}(1)$, where independence and the proof of Theorem 4 give that
\[ {\textbf{H}_{jk}}=\frac{{T^{\gamma -1}}}{{\lambda _{\tau }^{(k)}}-{\lambda _{\tau }^{(j)}}}{\sum \limits_{t=1}^{T}}\left({h_{j,k}}({\textbf{y}_{t}})+i{\tilde{h}_{j,k}}({\textbf{y}_{t}})\right),\]
with
\[\begin{aligned}{}{h_{j,k}}({\textbf{y}_{t}})=& {\lambda _{\tau }^{(j)}}\left[{f_{j}}({\textbf{y}_{t}}){f_{k}}({\textbf{y}_{t}})+{f_{j+d}}({\textbf{y}_{t}}){f_{k+d}}({\textbf{y}_{t}})\right]-\frac{1}{2}\left[{f_{j}}({\textbf{y}_{t}}){f_{k+2d}}({\textbf{y}_{t}})\right.\\ {} & \left.+{f_{j+d}}({\textbf{y}_{t}}){f_{k+3d}}({\textbf{y}_{t}})+{f_{j+2d}}({\textbf{y}_{t}}){f_{k}}({\textbf{y}_{t}})+{f_{j+3d}}({\textbf{y}_{t}}){f_{k+d}}({\textbf{y}_{t}})\right]\end{aligned}\]
and
\[\begin{aligned}{}{\tilde{h}_{j,k}}({\textbf{y}_{t}})=& {\lambda _{\tau }^{(j)}}\left[{f_{j+d}}({\textbf{y}_{t}}){f_{k}}({\textbf{y}_{t}})-{f_{j}}({\textbf{y}_{t}}){f_{k+d}}({\textbf{y}_{t}})\right]-\frac{1}{2}\left[{f_{j+d}}({\textbf{y}_{t}}){f_{k+2d}}({\textbf{y}_{t}})\right.\\ {} & \left.+{f_{j+3d}}({\textbf{y}_{t}}){f_{k}}({\textbf{y}_{t}})-{f_{j}}({\textbf{y}_{t}}){f_{k+3d}}({\textbf{y}_{t}})-{f_{j+2d}}({\textbf{y}_{t}}){f_{k+d}}({\textbf{y}_{t}})\right].\end{aligned}\]
Thus, by Proposition 2, ${\textbf{H}_{jk}}$ converges to zero in probability. Hence it remains to prove the convergence of the diagonal elements. We can write
\[\begin{aligned}{}{T^{\gamma }}\left({\hat{\textbf{S}}_{0}}[\textbf{X}]-{\textbf{I}_{d}}\right)& ={T^{\gamma -1}}{\sum \limits_{t=1}^{T}}\left({\tilde{\textbf{X}}_{t}}{\tilde{\textbf{X}}_{t}^{\text{H}}}-\mathbb{E}\left[{\tilde{\textbf{X}}_{t}}{\tilde{\textbf{X}}_{t}^{\text{H}}}\right]\right)+{T^{\gamma }}\tilde{\boldsymbol{\mu }}{\tilde{\boldsymbol{\mu }}^{\text{H}}}+{o_{p}}(1).\end{aligned}\]
Thus, the diagonal element $(j,j)$ of ${T^{\gamma }}(\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}-{\textbf{I}_{d}})$ is asymptotically equivalent to $\frac{1}{2}{\textbf{H}_{jj}}+\frac{1}{2}{T^{\gamma }}{\left(\tilde{\boldsymbol{\mu }}{\tilde{\boldsymbol{\mu }}^{\text{H}}}\right)_{jj}}+{o_{p}}(1)$, where
\[ {\textbf{H}_{jj}}={T^{\gamma -1}}{\sum \limits_{t=1}^{T}}\left({h_{j,j}}({\textbf{y}_{t}})-\mathbb{E}\left[{h_{j,j}}({\textbf{y}_{t}})\right]\right)\]
with ${h_{j,j}}({\textbf{y}_{t}})={\left({f_{j}}({\textbf{y}_{t}})\right)^{2}}+{\left({f_{j+d}}({\textbf{y}_{t}})\right)^{2}}$, and
\[ {\left[\tilde{\boldsymbol{\mu }}{\tilde{\boldsymbol{\mu }}^{\text{H}}}\right]_{jj}}={\left(\mathfrak{Re}[{\tilde{\boldsymbol{\mu }}^{(j)}}]\right)^{2}}+{\left(\mathfrak{Im}[{\tilde{\boldsymbol{\mu }}^{(j)}}]\right)^{2}}.\]
Recall that $\mathbb{E}[{f_{j}}({\textbf{y}_{t}})]=0$,
\[ \mathfrak{Re}\left({\tilde{\boldsymbol{\mu }}^{(j)}}\right)=\frac{1}{T}{\sum \limits_{t=1}^{T}}{f_{j}}({\textbf{y}_{t}})\hspace{2em}\hspace{2.5pt}\text{and}\hspace{2.5pt}\hspace{2em}\mathfrak{Im}\left({\tilde{\boldsymbol{\mu }}^{(j)}}\right)=\frac{1}{T}{\sum \limits_{t=1}^{T}}{f_{j+d}}({\textbf{y}_{t}}).\]
Hence, due to independence, it suffices to prove that, for every $j\in \{1,\dots ,d\}$,
\[\begin{aligned}{}& {T^{\gamma -1}}\hspace{-0.1667em}{\sum \limits_{t=1}^{T}}\left({h_{j,j}}({\textbf{y}_{t}})-\mathbb{E}\left[{h_{j,j}}({\textbf{y}_{t}})\right]\right)+{T^{\gamma -2}}\left({\left[{\sum \limits_{t=1}^{T}}{f_{j}}({\textbf{y}_{t}})\right]^{2}}+{\left[{\sum \limits_{t=1}^{T}}{f_{j+d}}({\textbf{y}_{t}})\right]^{2}}\right)\\ {} & {\xrightarrow[T\to \infty ]{\mathcal{D}}}{C_{2,j}}{Z_{{q_{2,j}}}}+{C_{1,j}^{2}}{Z_{{q_{1,j}}}^{2}}+{C_{2,j+d}}{Z_{{q_{2,j+d}}}}+{C_{1,j+d}^{2}}{Z_{{q_{1,j+d}}}^{2}}.\end{aligned}\]
The convergence follows from independence and the continuous mapping theorem, if, $\forall j\in \{1,\dots ,2d\}$, the following two-dimensional vector converges:
(20)
\[ \left(\begin{array}{c}{T^{\gamma -1}}{\textstyle\textstyle\sum _{t=1}^{T}}\left({({f_{j}}({\textbf{y}_{t}}))^{2}}-\mathbb{E}\left[{({f_{j}}({\textbf{y}_{t}}))^{2}}\right]\right)\\ {} {T^{\frac{\gamma }{2}-1}}{\textstyle\textstyle\sum _{t=1}^{T}}{f_{j}}({\textbf{y}_{t}})\end{array}\right){\xrightarrow[T\to \infty ]{\mathcal{D}}}\left(\begin{array}{c}{C_{2,j}}{Z_{{q_{2,j}}}}\\ {} {C_{1,j}}{Z_{{q_{1,j}}}}\end{array}\right).\]
Using Equation (6) for the asymptotic variance, we observe first that
\[ {T^{\gamma -1}}{\sum \limits_{t=1}^{T}}\left({({f_{j}}({\textbf{y}_{t}}))^{2}}-\mathbb{E}\left[{({f_{j}}({\textbf{y}_{t}}))^{2}}\right]\right)\]
converges to a nontrivial limit only if ${\max _{i\in I}}{q_{2,i}}(2{H_{i}}-2)={q_{2,j}}(2{H_{j}}-2)$. Similarly,
\[ {T^{\frac{\gamma }{2}-1}}{\sum \limits_{t=1}^{T}}{f_{j}}({\textbf{y}_{t}})\]
converges to a nontrivial limit only if ${\max _{i\in I}}{q_{2,i}}(2{H_{i}}-2)={q_{1,j}}(4{H_{j}}-4)$. Finally, if both of these conditions are satisfied, the convergence in Equation (20) follows from [3, Theorem 4].  □

Acknowledgments

The authors would like to thank Katariina Kilpinen for providing the photographs to Section 5. The authors would like to thank the two anonymous referees for their insightful comments that helped to improve this paper greatly.

References

[1] 
Adali, T., Schreier, P.J.: Optimization and estimation of complex-valued signals: Theory and applications in filtering and blind source separation. IEEE Signal Process. Mag. 31(5), 112–128 (2014). MR1824645. https://doi.org/10.1109/78.847796
[2] 
Arcones, M.A.: Limit theorems for nonlinear functionals of a stationary Gaussian sequence of vectors. Ann. Probab. 22(4), 2242–2274 (1994). MR1331224
[3] 
Bai, S., Taqqu, M.: Multivariate limit theorems in the context of long-range dependence. J. Time Ser. Anal. 34(6), 717–743 (2013). MR3127215. https://doi.org/10.1111/jtsa.12046
[4] 
Bai, S., Taqqu, M.: Generalized Hermite processes, discrete chaos and limit theorems. Stoch. Process. Appl. 124(4), 1710–1739 (2014). MR3163219. https://doi.org/10.1016/j.spa.2013.12.011
[5] 
Beran, J.: Statistics for Long-Memory Processes. Chapman & Hall, New York (1994). MR1304490
[6] 
Beran, J., Feng, Y., Ghosh, S., Kulik, R.: Long-Memory Processes: Probabilistic Properties and Statistical Methods. Springer (2013). MR3075595. https://doi.org/10.1007/978-3-642-35512-7
[7] 
Billingsley, P.: Probability and Measure, Anniversary Edition. Wiley, Hoboken (2012). MR2893652
[8] 
Breuer, P., Major, P.: Central limit theorems for non-linear functionals of Gaussian fields. J. Multivar. Anal. 13(3), 425–441 (1983). MR0716933. https://doi.org/10.1016/0047-259X(83)90019-2
[9] 
Cichocki, A., Amari, S.-i.: Adaptive Blind Signal and Image Processing: Learning Algorithms and Applications, vol. 1. John Wiley & Sons (2002)
[10] 
Comon, P., Jutten, C.: Handbook of Blind Source Separation: Independent Component Analysis and Applications. Academic Press (2010)
[11] 
Dobrushin, R.L., Major, P.: Non-central limit theorems for nonlinear functionals of Gaussian fields. Z. Wahrscheinlichkeitstheor. Verw. Geb. 50, 27–52 (1979). MR0550122. https://doi.org/10.1007/BF00535673
[12] 
Embrechts, P., Maejima, M.: Selfsimilar Processes. Princeton Series in Applied Mathematics, p. 111. Princeton University Press, Princeton, New Jersey (2002). MR1920153
[13] 
Eriksson, J., Ollila, E., Koivunen, V.: Essential statistics and tools for complex random variables. IEEE Trans. Signal Process. 58(10), 5400–5408 (2010). MR2722676. https://doi.org/10.1109/TSP.2010.2054085
[14] 
Fan, J., Wang, M., Yao, Q.: Modelling multivariate volatilities via conditionally uncorrelated components. J. R. Stat. Soc., Ser. B, Stat. Methodol. 70(4), 679–702 (2008). MR2523899. https://doi.org/10.1111/j.1467-9868.2008.00654.x
[15] 
Goodman, N.: Statistical analysis based on a certain multivariate complex Gaussian distribution (an introduction). Ann. Math. Stat. 34(1), 152–177 (1963). MR0145618. https://doi.org/10.1214/aoms/1177704250
[16] 
Ilmonen, P.: On asymptotic properties of the scatter matrix based estimates for complex valued independent component analysis. Stat. Probab. Lett. 83(4), 1219–1226 (2013). MR3041396. https://doi.org/10.1016/j.spl.2013.01.020
[17] 
Ilmonen, P., Oja, H., Serfling, R.: On invariant coordinate system (ICS) functionals. Int. Stat. Rev. 80(1), 93–110 (2012). MR2990347. https://doi.org/10.1111/j.1751-5823.2011.00163.x
[18] 
Leonenko, N.: Limit Theorems for Random Fields with Singular Spectrum, vol. 465. Springer (1999). MR1687092. https://doi.org/10.1007/978-94-011-4607-4
[19] 
Li, H., Correa, N.M., Rodriguez, P.A., Calhoun, V.D., Adali, T.: Application of independent component analysis with adaptive density model to complex-valued fMRI data. IEEE Trans. Biomed. Eng. 58(10), 2794–2803 (2011)
[20] 
Lietzén, N., Virta, J., Nordhausen, K., Ilmonen, P.: Minimum distance index for BSS, generalization, interpretation and asymptotics. Aust. J. Stat. 49(4), 57–68 (2020)
[21] 
Makogin, V., Spodarev, E.: Limit theorems for excursion sets of subordinated Gaussian random fields with long-range dependence. Stochastics, 1–32 (2021)
[22] 
Matteson, D.S., Tsay, R.S.: Dynamic orthogonal components for multivariate time series. J. Am. Stat. Assoc. 106(496), 1450–1463 (2011). MR2896848. https://doi.org/10.1198/jasa.2011.tm10616
[23] 
Miettinen, J., Nordhausen, K., Oja, H., Taskinen, S.: Statistical properties of a blind source separation estimator for stationary time series. Stat. Probab. Lett. 82(11), 1865–1873 (2012). MR2970285. https://doi.org/10.1016/j.spl.2012.06.025
[24] 
Miettinen, J., Illner, K., Nordhausen, K., Oja, H., Taskinen, S., Theis, F.J.: Separation of uncorrelated stationary time series using autocovariance matrices. J. Time Ser. Anal. 37(3), 337–354 (2016). MR3512961. https://doi.org/10.1111/jtsa.12159
[25] 
Miettinen, J., Matilainen, M., Nordhausen, K., Taskinen, S.: Extracting conditionally heteroskedastic components using independent component analysis. J. Time Ser. Anal. 41(2), 293–311 (2020). MR4086188. https://doi.org/10.1111/jtsa.12505
[26] 
Nordhausen, K., Oja, H., Tyler, D.E., et al.: Tools for exploring multivariate data: The package ICS. J. Stat. Softw. 28(6), 1–31 (2008)
[27] 
Nourdin, I., Peccati, G.: Stein’s method on Wiener chaos. Probab. Theory Relat. Fields 145(1), 75–118 (2009). MR2520122. https://doi.org/10.1007/s00440-008-0162-x
[28] 
Nourdin, I., Peccati, G.: Stein’s method and exact Berry-Esseen asymptotics for functionals of Gaussian fields. Ann. Probab. 37(6), 2231–2261 (2010). MR2573557. https://doi.org/10.1214/09-AOP461
[29] 
Nourdin, I., Peccati, G.: Normal Approximations Using Malliavin Calculus: From Stein’s Method to Universality. Cambridge University Press (2012). MR2962301. https://doi.org/10.1017/CBO9781139084659
[30] 
Nourdin, I., Peccati, G., Podolskij, M.: Quantitative Breuer–Major theorems. Stoch. Process. Appl. 121(4), 793–812 (2011). MR2770907. https://doi.org/10.1016/j.spa.2010.12.006
[31] 
Picinbono, B.: Second-order complex random vectors and normal distributions. IEEE Trans. Signal Process. 44(10), 2637–2640 (1996)
[32] 
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2018). R Foundation for Statistical Computing. https://www.R-project.org/
[33] 
Samorodnitsky, G.: Long range dependence. Found. Trends Stoch. Syst. 1(3), 163–257 (2007). MR2379935. https://doi.org/10.1561/0900000004
[34] 
Taqqu, M.: Weak convergence to fractional Brownian motion and to the Rosenblatt process. Probab. Theory Relat. Fields 31(4), 287–302 (1975). MR0400329. https://doi.org/10.1007/BF00532868
[35] 
Tong, L., Soon, V., Huang, Y.-F., Liu, R.: Amuse: a new blind identification algorithm. In: IEEE International Symposium on Circuits and Systems, pp. 1784–1787. IEEE (1990).
[36] 
Tudor, C.A.: Analysis of Variations for Self-similar Processes: A Stochastic Calculus Approach. Springer (2013). MR3112799. https://doi.org/10.1007/978-3-319-00936-0
[37] 
Urbanek, S.: Jpeg: Read and Write JPEG Images. (2014). R package version 0.1-8. https://CRAN.R-project.org/package=jpeg
[38] 
Van der Vaart, A.W.: Asymptotic Statistics, vol. 3. Cambridge University Press (2000). MR1652247. https://doi.org/10.1017/CBO9780511802256
[39] 
Viitasaari, L., Ilmonen, P.: On modeling a class of weakly stationary processes. Front. Appl. Math. Stat. 15 (2020).
[40] 
Wendler, M., Wu, W.B.: Central limit theorems for nearly long range dependent subordinated linear processes. J. Appl. Probab. 57(2), 637–656 (2020). MR4125469. https://doi.org/10.1017/jpr.2020.10
[41] 
Zarzoso, V., Comon, P.: Robust independent component analysis by iterative maximization of the kurtosis contrast with algebraic optimal step size. IEEE Trans. Neural Netw. 21(2), 248–261 (2009)
Reading mode PDF XML

Table of contents
  • 1 Introduction
  • 2 Random variables in the complex plane
  • 3 Temporally uncorrelated components model
  • 4 Estimation and asymptotic properties
  • 5 Image source separation
  • A Appendix
  • Acknowledgments
  • References

Copyright
© 2021 The Author(s). Published by VTeX
by logo by logo
Open access article under the CC BY license.

Keywords
Asymptotic theory blind source separation long-range dependency multivariate analysis noncentral limit theorems

MSC2010
62H12 60F05 60G15 60G10 94A12 94A08

Funding
N. Lietzén gratefully acknowledges financial support from the Emil Aaltonen Foundation (Grant 180144 N) and from the Academy of Finland (Grant 321968).

Metrics
since March 2018
493

Article info
views

611

Full article
views

386

PDF
downloads

112

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

  • Figures
    4
  • Theorems
    5
vmsta190_g001.jpg
Fig. 1.
Original images
vmsta190_g002.jpg
Fig. 2.
Mixed images
vmsta190_g003.jpg
Fig. 3.
Unmixed images using the AMUSE procedure with $\tau =1$
vmsta190_g004.jpg
Fig. 4.
Unmixed images produced by equivalent solutions
Theorem 1.
Theorem 2.
Theorem 3.
Theorem 4.
Theorem 5.
vmsta190_g001.jpg
Fig. 1.
Original images
vmsta190_g002.jpg
Fig. 2.
Mixed images
vmsta190_g003.jpg
Fig. 3.
Unmixed images using the AMUSE procedure with $\tau =1$
vmsta190_g004.jpg
Fig. 4.
Unmixed images produced by equivalent solutions
Theorem 1.
Let ${\boldsymbol{x}_{\text{•}}}:={\left({\boldsymbol{x}_{t}}\right)_{t\in \mathbb{N}}}$ be a process that satisfies Definition 2. The functional $g:\boldsymbol{a}\mapsto \boldsymbol{\Gamma }(\boldsymbol{a}-\boldsymbol{\mu }):{\mathbb{C}^{d}}\to {\mathbb{C}^{d}}$ is a solution to the corresponding unmixing problem if and only if the eigenvector equation
(1)
\[\begin{aligned}{}& {\left({\boldsymbol{S}_{0}}\left[{\boldsymbol{x}_{t}}\right]\right)^{-1}}{\boldsymbol{S}_{\tau }}\left[{\boldsymbol{x}_{t}}\right]{\boldsymbol{\Gamma }^{\text{H}}}={\boldsymbol{\Gamma }^{\text{H}}}{\boldsymbol{\Lambda }_{\tau }},\hspace{1em}\forall t\in \mathbb{N},\end{aligned}\]
and the scaling equation
(2)
\[\begin{aligned}{}& \boldsymbol{\Gamma }{\boldsymbol{S}_{0}}\left[{\boldsymbol{x}_{t}}\right]{\boldsymbol{\Gamma }^{\text{H}}}={\boldsymbol{I}_{d}},\hspace{1em}\forall t\in \mathbb{N},\end{aligned}\]
are satisfied.
Theorem 2.
Let $\boldsymbol{x}$, g and $\hat{g}$ be defined as in Lemma 2 and denote the element $(j,k)$ of $\hat{{\boldsymbol{S}_{\tau }}}[\boldsymbol{x}]$ as ${[\hat{{\boldsymbol{S}_{\tau }}}]_{jk}}$. Then, under the assumptions of Lemma 2 and the trivial mixing scenario $\boldsymbol{a}={\boldsymbol{I}_{d}}$, there exists a sequence of T-indexed phase-shift matrices $\hat{\boldsymbol{J}}$, such that
\[\begin{aligned}{}& {\gamma _{T}}\left({\hat{\boldsymbol{J}}_{jj}}{\hat{\boldsymbol{\Gamma }}_{jj}}-1\right)=\frac{{\gamma _{T}}}{2}\left({\left[\hat{{\boldsymbol{S}_{0}}}\right]_{jj}}-1\right)+{\mathcal{O}_{p}}(1/{\gamma _{T}}),\hspace{1em}\forall j\in \{1,\dots ,d\},\hspace{1em}\textit{and}\\ {} & {\gamma _{T}}\left({\lambda _{\tau }^{(k)}}-{\lambda _{\tau }^{(j)}}\right){\hat{\boldsymbol{J}}_{jj}}{\hat{\boldsymbol{\Gamma }}_{jk}}={\gamma _{T}}\left({\lambda _{\tau }^{(j)}}{\left[\hat{{\boldsymbol{S}_{0}}}\right]_{jk}}-{\left[\hat{{\boldsymbol{S}_{\tau }}}\right]_{jk}}\right)+{\mathcal{O}_{p}}(1/{\gamma _{T}}),\hspace{1em}j\ne k.\end{aligned}\]
Theorem 3.
Let ${\boldsymbol{y}_{\text{•}}}:={({\boldsymbol{y}_{t}})_{t\in \mathbb{N}}}$ be a ${\mathbb{R}^{\ell }}$-valued centered stationary Gaussian process. Let ${f_{1}},{f_{2}},\dots ,{f_{d}}:{\mathbb{R}^{\ell }}\to \mathbb{R}$ be measurable functions that have the Hermite rank at least $q\in \mathbb{N}$, with respect to ${\boldsymbol{y}_{\text{•}}}$, and let ${f_{j}}\circ {\boldsymbol{y}_{\text{•}}}$ be square-integrable for every $j\in \{1,\dots d\}$. Additionally, let the series of covariances satisfy
\[ {\sum \limits_{\tau =0}^{\infty }}{\left|{\ddot{\text{S}}_{\tau }}\left[{y_{1}^{(j)}},{y_{1}^{(k)}}\right]\right|^{q}}={\sum \limits_{\tau =0}^{\infty }}{\left|\mathbb{E}\left[{y_{1}^{(j)}}{y_{1+\tau }^{(k)}}\right]\right|^{q}}<+\infty ,\hspace{1em}\forall j,k\in \left\{1,\dots ,\ell \right\}.\]
Then, the ${\mathbb{R}^{d}}$-sequence
\[ \frac{1}{\sqrt{T}}{\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}{\textstyle\textstyle\sum _{t=1}^{T}}\left({f_{1}}({\boldsymbol{y}_{t}})-\mathbb{E}\left[{f_{1}}({\boldsymbol{y}_{t}})\right]\right)& \cdots & {\textstyle\textstyle\sum _{t=1}^{T}}\left({f_{d}}({\boldsymbol{y}_{t}})-\mathbb{E}\left[{f_{d}}({\boldsymbol{y}_{t}})\right]\right)\end{array}\right)^{\top }}\]
converges in distribution, as $T\to \infty $, to a centered Gaussian vector $\boldsymbol{\rho }=\left({\rho _{1}},{\rho _{2}},\dots ,{\rho _{d}}\right)$ with finite covariances $\mathbb{E}\left[{\rho _{j}}{\rho _{k}}\right]$ equal to
\[ {\text{S}_{0}}\left[{f_{j}}({\boldsymbol{y}_{1}}),{f_{k}}({\boldsymbol{y}_{1}})\right]+{\sum \limits_{\tau =1}^{\infty }}\left({\ddot{\text{S}}_{\tau }}\left[{f_{j}}({\boldsymbol{y}_{1}}),{f_{k}}({\boldsymbol{y}_{1}})\right]+{\ddot{\text{S}}_{\tau }}\left[{f_{k}}({\boldsymbol{y}_{1}}),{f_{j}}({\boldsymbol{y}_{1}})\right]\right).\]
Theorem 4.
Let Assumption 1 be satisfied and let $\hat{g}:\boldsymbol{C}\mapsto (\boldsymbol{C}-{\boldsymbol{1}_{T}}{\hat{\boldsymbol{\mu }}^{\top }}){\hat{\boldsymbol{\Gamma }}^{\top }}$ be a T-indexed sequence of finite sample solutions. Then, under the trivial mixing scenario $\boldsymbol{a}={\boldsymbol{I}_{d}}$, there exists a T-indexed sequence $\hat{\boldsymbol{J}}$ of phase-shift matrices such that
\[ \sqrt{T}\cdot \text{vec}\left(\hat{\boldsymbol{J}}\hat{\boldsymbol{\Gamma }}-{\boldsymbol{I}_{d}}\right){\xrightarrow[T\to \infty ]{\mathcal{D}}}\boldsymbol{\nu }\sim {\mathcal{N}_{{d^{2}}}}(\boldsymbol{0},{\boldsymbol{\Sigma }_{\boldsymbol{\nu }}},{\boldsymbol{P}_{\boldsymbol{\nu }}}).\]
The exact forms for ${\boldsymbol{\Sigma }_{\boldsymbol{\nu }}}$ and ${\boldsymbol{P}_{\boldsymbol{\nu }}}$ are given in the supplementary Appendix.
Theorem 5.
Let Assumptions 2 and 3 hold. Then,
\[ {T^{\gamma }}\left(\hat{\boldsymbol{J}}\hat{\boldsymbol{\Gamma }}-{\boldsymbol{I}_{d}}\right){\xrightarrow[T\to \infty ]{\mathcal{D}}}\frac{1}{2}\boldsymbol{\Upsilon },\]
where Υ is a ${\mathbb{R}^{d\times d}}$-valued diagonal matrix with elements given by
\[ {(\boldsymbol{\Upsilon })_{jj}}\sim {C_{2,j}}{Z_{{q_{2,j}}}}+{C_{1,j}^{2}}{Z_{{q_{1,j}}}^{2}}+{C_{2,j+d}}{Z_{{q_{2,j+d}}}}+{C_{1,j+d}^{2}}{Z_{{q_{1,j+d}}}^{2}}.\]

MSTA

MSTA

  • Online ISSN: 2351-6054
  • Print ISSN: 2351-6046
  • Copyright © 2018 VTeX

About

  • About journal
  • Indexed in
  • Editors-in-Chief

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • ejournals-vmsta@vtex.lt
  • Mokslininkų 2A
  • LT-08412 Vilnius
  • Lithuania
Powered by PubliMill  •  Privacy policy