1 Introduction
In modern statistics, there is an increasing demand for theoretically solid methodology which can be applied beyond standard real-valued data, in the realm of more exotic data structures. In this paper, we consider a complex-valued linear mixture model based on temporally uncorrelated components. We consider the problem under discrete weakly stationary processes. We aim to find latent processes of interest when only linear mixtures of them are observable. The recovery of the latent processes is referred to as the unmixing procedure. The properties of the recovered processes themselves can be the source of interest, or alternatively, the approach can be used to reduce multivariate models into several univariate models, which can simplify the modeling burden.
In the context of linear mixture models, the assumption of uncorrelated components or stronger conditions that imply uncorrelated components is considered to be natural in several applications, for example, in finance and signal processing, see the article [14] and the book [10]. Applications where the observed time series are naturally complex-valued are frequent in, e.g., signal processing and biomedical applications, see [19, 41]. In such applications, the interest can lie in the shapes of the unobservable signals, such as the shapes of three-dimensional image valued signals, which correspond to different parts of the brain under different activities. In the signal processing literature, the problem is often referred to as the blind source separation (BSS) problem. In the BSS literature, it has been argued that discarding the special complex-valued structure can lead to loss of information, see [1]. Thus, it is often beneficial to natively work with complex-valued signals in such applications. We wish to emphasize that our ${\mathbb{C}^{d}}$-valued model does not directly correspond to existing ${\mathbb{R}^{2d}}$ valued models, e.g., the one in [23]. In ${\mathbb{R}^{2d}}$ valued BSS, it is assumed that all $2d$ components are uncorrelated. In our model, the real part and the corresponding complex part are allowed to be correlated. For a collection of BSS applications, see [10].
In parallel to the signal processing community, linear mixture models, with similar model assumptions as in BSS literature, have recently received notable attention in finance, see for example [14, 22]. In financial applications, the term blind source separation is rarely used, and usually more descriptive model naming conventions are utilized. Note that our complex-valued approach is highly applicable in many real-valued financial applications. In our approach, we assume little concerning the relationship between the real and imaginary parts of a single component. Thus, from the real-valued perspective, the problem can be equivalently considered as modeling real-valued temporally uncorrelated pairs, where the elements contained in a single pair are not necessarily uncorrelated. However, transforming this complex-valued problem into a real-valued counterpart is often unfeasible, as it usually introduces additional constraints that would be otherwise embedded within the complex-valued formulation. Once the temporally uncorrelated components are recovered, one can then, for example, model volatilities bivariately (or univariately) or assess risk by modeling tail behavior with the tools of bivariate (or univariate) extreme value theory. Moreover, our approach is natural in applications where a single observation is vector valued, that is, the observations have both a magnitude and a direction, e.g., modeling the latent components of wind at an airport.
The main focus in this paper is on asymptotic behavior of a classical unmixing estimator. We consider an algorithm that is identical to the so-called Algorithm for Multiple Unknown Signals Extraction (AMUSE), [35]. In the financial side, asymptotic properties of an alternative approach, with slightly differing model assumptions compared to our approach, are given in, e.g., [14]. In the context of real-valued BSS, asymptotics have been considered in, e.g., [23–25]. Compared to the above mentioned approaches, we consider a substantially wider class of processes. In particular, we analyze the asymptotic behavior of the corresponding estimators under both long- and short-range dependent complex-valued processes. Furthermore, we take a semiparametric approach, that is, our theory is not limited to specific parametric family of distributions. As a pinnacle of the novel theory presented in this paper, we consider square-integrable processes without summable autocovariance structures, which causes the limiting distribution of the corresponding unmixing estimator to be non-Gaussian. Instead of using the classical central limit theorem, we take a modern approach and utilize general central limit theorem type results that are also applicable for processes with strong temporal dependencies. Moreover, we consider convergence rates that differ from the usual $\sqrt{T}$. We wish to emphasize that modeling long-range dependent processes, that is, processes without summable autocovariance structures, are of paramount importance in many financial applications.
The paper is structured as follows. In Section 2, we recall some basic theory regarding complex-valued random variables. In Section 3, we present the temporally uncorrelated components mixing model. In Section 4, we formulate the estimation procedure and study the asymptotic properties of the estimators. In Section 5, we consider a data example involving photographs, which can be presented as complex-valued time series. The proofs for the theoretical results are presented in the supplementary Appendix.
2 Random variables in the complex plane
Throughout, let $(\Omega ,\mathcal{F},\mathbb{P})$ be a common probability space and let ${\textbf{z}_{\text{•}}}:={({\textbf{z}_{t}})_{t\in \mathbb{N}}}$ be a collection of random variables ${\textbf{z}_{t}}:\Omega \to {\mathbb{C}^{d}}$, $t\in \mathbb{N}=\{1,2,\dots \}$, where the dimension parameter d is a finite constant. Furthermore, let ${z_{t}^{(k)}}={\text{pr}_{k}}\circ {\textbf{z}_{t}}$, where ${\text{pr}_{k}}$ is a projection to the kth complex coordinate. We refer to the process ${z_{\text{•}}^{(k)}}:={({z_{t}^{(k)}})_{t\in \mathbb{N}}}$ as the kth component of the ${\mathbb{C}^{d}}$-valued stochastic process ${\textbf{z}_{\text{•}}}$. Note that the components can be expressed in the form ${z_{t}^{(k)}}={a_{t}^{(k)}}+i{b_{t}^{(k)}}$, where ${a_{t}^{(k)}}$, ${b_{t}^{(k)}}$ are real-valued $\forall k\in \left\{1,2,\dots ,d\right\}$ and i is the imaginary unit. We denote the complex conjugate of ${\textbf{z}_{t}}$ as ${\textbf{z}_{t}^{\ast }}$ and we denote the conjugate transpose of ${\textbf{z}_{t}}$ as ${\textbf{z}_{t}^{\text{H}}}:={({\textbf{z}_{t}^{\ast }})^{\top }}$. Furthermore, we use $\mathcal{B}({\mathbb{C}^{d}})$ and $\mathcal{B}({\mathbb{R}^{2d}})$ to denote the Borel-sigma algebras on ${\mathbb{C}^{d}}$ and ${\mathbb{R}^{2d}}$, respectively.
We use the following notation for the multivariate mean and the (unsymmetrized) autocovariance matrix with lag $\tau \in \left\{0\right\}\cup \mathbb{N}$:
\[\begin{aligned}{}& {\boldsymbol{\mu }_{{\textbf{z}_{t}}}}=\mathbb{E}\left[{\textbf{z}_{t}}\right]\in {\mathbb{C}^{d}},\hspace{1em}{\ddot{\textbf{S}}_{\tau }}\left[{\textbf{z}_{t}}\right]=\mathbb{E}\left[\left({\textbf{z}_{t}}-{\boldsymbol{\mu }_{{\textbf{z}_{t}}}}\right){\left({\textbf{z}_{t+\tau }}-{\boldsymbol{\mu }_{{\textbf{z}_{t+\tau }}}}\right)^{\text{H}}}\right]\in {\mathbb{C}^{d\times d}}.\end{aligned}\]
Furthermore, we use the notation ${\textbf{S}_{\tau }}\left[{\textbf{z}_{t}}\right]=\frac{1}{2}[{\ddot{\textbf{S}}_{\tau }}[{\textbf{z}_{t}}]+{({\ddot{\textbf{S}}_{\tau }}[{\textbf{z}_{t}}])^{\text{H}}}]\in {\mathbb{C}^{d\times d}}$ for the symmetrized autocovariance matrix with lag $\tau \in \left\{0\right\}\cup \mathbb{N}$.In the case of univariate stochastic processes, we use ${\ddot{\text{S}}_{\tau }}[{x_{t}},{y_{s}}]$ to denote ${\ddot{\text{S}}_{\tau }}[{x_{t}},{y_{s}}]=\mathbb{E}\left[({x_{t}}-{\mu _{{x_{t}}}}){({y_{s+\tau }}-{\mu _{{y_{s+\tau }}}})^{\ast }}\right]$. Note that the unsymmetrized autocovariance matrix ${\ddot{\textbf{S}}_{\tau }}$ is not necessarily conjugate symmetric, i.e., Hermite symmetric or Hermitian, and it can have complex-valued diagonal entries. In contrast, the symmetrized autocovariance matrix ${\textbf{S}_{\tau }}$ is always conjugate symmetric, that is, ${\textbf{S}_{\tau }}={\textbf{S}_{\tau }^{\text{H}}}$, and the diagonal entries of the symmetrized autocovariance matrix are by definition real-valued.
In the literature, the definition of stationarity for complex-valued stochastic processes varies. In this paper, we use the following definition.
Definition 1.
The ${\mathbb{C}^{d}}$-valued process ${\textbf{z}_{\text{•}}}:={\left({\textbf{z}_{t}}\right)_{t\in \mathbb{N}}}$ is weakly stationary if the components of ${\textbf{z}_{\text{•}}}$ are square-integrable and if for any $\tau \in \left\{0\right\}\cup \mathbb{N}$ and for any pair $\left(u,v\right)\in \mathbb{N}\times \mathbb{N}$, we have that $\mathbb{E}[{\textbf{z}_{u}}]=\mathbb{E}[{\textbf{z}_{v}}]$ and ${\ddot{\textbf{S}}_{\tau }}[{\textbf{z}_{u}}]={\ddot{\textbf{S}}_{\tau }}[{\textbf{z}_{v}}]$.
Let $\textbf{Z}:={\left({\textbf{Z}_{j}}\right)_{j\in \mathcal{T}}}$, $\mathcal{T}=\left\{1,2,\dots ,T\right\}$, be a sampled process generated by ${\textbf{z}_{\text{•}}}$, where ${\textbf{Z}_{j}}$ is ${\mathbb{C}^{d}}$-valued for every $j\in \mathcal{T}$. We use the following classical finite sample estimators for the mean vector and the autocovariance matrix with lag $\tau \in \left\{0,1,\dots ,T-1\right\}$:
\[\begin{array}{r@{\hskip0pt}l@{\hskip0pt}r}& \displaystyle \hat{\boldsymbol{\mu }}[\textbf{Z}]=\frac{1}{T}{\sum \limits_{j=1}^{T}}{\textbf{Z}_{j}},\hspace{1em}\hspace{2em}& \displaystyle {\tilde{\textbf{S}}_{\tau }}\left[\textbf{Z}\right]=\frac{1}{T-\tau }{\sum \limits_{j=1}^{T-\tau }}\left({\textbf{Z}_{j}}-\hat{\boldsymbol{\mu }}[\textbf{Z}]\right){\left({\textbf{Z}_{j+\tau }}-\hat{\boldsymbol{\mu }}[\textbf{Z}]\right)^{\text{H}}},\end{array}\]
where for $\tau =0$, the scaling used in ${\tilde{\textbf{S}}_{\tau }}$ is $1/(T-1)$. Furthermore, the symmetrized autocovariance matrix estimator with lag $\tau \in \left\{0,1,\dots ,T-1\right\}$ is defined as ${\hat{\textbf{S}}_{\tau }}\left[\textbf{Z}\right]=\frac{1}{2}[{\tilde{\textbf{S}}_{\tau }}[\textbf{Z}]+{({\tilde{\textbf{S}}_{\tau }}[\textbf{Z}])^{\text{H}}}]$. Note that the symmetrized autocovariance matrix estimator is always conjugate symmetric.Let ${\textbf{y}_{1}}$ and ${\textbf{y}_{2}}$ be ${\mathbb{R}^{d}}$-valued random vectors such that the concatenated random vector, that is, ${\left(\begin{array}{c@{\hskip10.0pt}c}{\textbf{y}_{1}^{\top }}& {\textbf{y}_{2}^{\top }}\end{array}\right)^{\top }}$, follows a ${\mathbb{R}^{2d}}$-valued Gaussian distribution. Then, the ${\mathbb{C}^{d}}$-valued random vector $\textbf{y}={\textbf{y}_{1}}+i{\textbf{y}_{2}}$ follows the ${\mathbb{C}^{d}}$-valued Gaussian distribution with the location parameter ${\boldsymbol{\mu }_{\textbf{y}}}$, the covariance matrix ${\boldsymbol{\Sigma }_{\textbf{y}}}$ and the relation matrix ${\textbf{P}_{\textbf{y}}}$, defined as
\[\begin{aligned}{}& {\boldsymbol{\mu }_{\textbf{y}}}=\mathbb{E}\left[\textbf{y}\right]=\mathbb{E}\left[{\textbf{y}_{1}}\right]+i\mathbb{E}\left[{\textbf{y}_{2}}\right]\in {\mathbb{C}^{d}},\hspace{1em}{\boldsymbol{\Sigma }_{\textbf{y}}}=\mathbb{E}\left[\left(\textbf{y}-{\boldsymbol{\mu }_{\textbf{y}}}\right){\left(\textbf{y}-{\boldsymbol{\mu }_{\textbf{y}}}\right)^{\text{H}}}\right]\in {\mathbb{C}^{d\times d}},\\ {} & {\textbf{P}_{\textbf{y}}}=\mathbb{E}\left[\left(\textbf{y}-{\boldsymbol{\mu }_{\textbf{y}}}\right){\left(\textbf{y}-{\boldsymbol{\mu }_{\textbf{y}}}\right)^{\top }}\right]\in {\mathbb{C}^{d\times d}}.\end{aligned}\]
We distinguish between complex- and real-valued Gaussian distributions with the number of given parameters — ${\mathcal{N}_{d}}({\boldsymbol{\mu }_{y}},\boldsymbol{\Sigma })$ denotes a d-variate real-valued Gaussian distribution and ${\mathcal{N}_{d}}({\boldsymbol{\mu }_{y}},\boldsymbol{\Sigma },\textbf{P})$ denotes a d-variate complex-valued Gaussian distribution.
The classical central limit theorem (CLT) for independent and identically distributed complex-valued random variables is given as follows. Let $\{{\textbf{x}_{1}},{\textbf{x}_{2}},\dots ,{\textbf{x}_{n}}\}$ be a collection of i.i.d. ${\mathbb{C}^{d}}$-vectors with square-integrable components, mean ${\boldsymbol{\mu }_{\textbf{x}}}$, covariance matrix Σ and relation matrix $\textbf{P}$. Then,
\[ \frac{1}{\sqrt{n}}{\sum \limits_{j=1}^{n}}\left({\textbf{x}_{j}}-{\boldsymbol{\mu }_{\textbf{x}}}\right){\xrightarrow[n\to \infty ]{\mathcal{D}}}\textbf{x}\sim {\mathcal{N}_{d}}\left(\textbf{0},\boldsymbol{\Sigma },\textbf{P}\right),\]
where ${\xrightarrow[]{\mathcal{D}}}$ denotes convergence in distribution. We can harness the rich literature on real-valued limiting theorems via the following lemma.Lemma 1.
Let ${\{{\boldsymbol{v}_{n}}\}_{n\in \mathbb{N}}}$ be a collection of ${\mathbb{R}^{2d}}$-valued random vectors ${\boldsymbol{v}_{j}^{\top }}=\left(\begin{array}{c@{\hskip10.0pt}c}{\boldsymbol{x}_{j}^{\top }}& {\boldsymbol{y}_{j}^{\top }}\end{array}\right)$, where ${\boldsymbol{x}_{j}}$, ${\boldsymbol{y}_{j}}$ are ${\mathbb{R}^{d}}$-valued for every $j\in \mathbb{N}$, and let ${\boldsymbol{v}^{\top }}=\left(\begin{array}{c@{\hskip10.0pt}c}{\boldsymbol{x}^{\top }}& {\boldsymbol{y}^{\top }}\end{array}\right)$. Then, ${\boldsymbol{v}_{n}}{\xrightarrow[n\to \infty ]{\mathcal{D}}}\boldsymbol{v}$ if and only if ${\boldsymbol{x}_{n}}+i{\boldsymbol{y}_{n}}{\xrightarrow[n\to \infty ]{\mathcal{D}}}\boldsymbol{x}+i\boldsymbol{y}$.
Lemma 1 can be, for example, applied to Gaussian vectors in a straightforward manner, see Corollary 1 in the supplementary Appendix.
In our work, we utilize Lemma 1 in order to apply real-valued results for complex-valued scenarios, but we wish to emphasize that it also works in the other way — the complex-valued asymptotic distribution automatically grants the corresponding real-valued limiting distribution. For additional details regarding complex-valued statistics, see, e.g., [13, 15, 31].
3 Temporally uncorrelated components model
In this section, we consider a linear temporally uncorrelated components model for discrete time complex-valued processes. We define the following version of the model.
Definition 2.
The ${\mathbb{C}^{d}}$-process ${\textbf{x}_{\text{•}}}:={\left({\textbf{x}_{t}}\right)_{t\in \mathbb{N}}}$ follows the temporally uncorrelated components mixing model, if
\[ {\textbf{x}_{t}}=\textbf{A}{\textbf{z}_{t}}+{\boldsymbol{\mu }_{\textbf{x}}},\hspace{1em}\forall t\in \mathbb{N},\]
where $\textbf{A}$ is a constant nonsingular ${\mathbb{C}^{d\times d}}$-matrix, ${\boldsymbol{\mu }_{\textbf{x}}}\in {\mathbb{C}^{d}}$ is a constant location parameter and ${\textbf{z}_{\text{•}}}$ is a ${\mathbb{C}^{d}}$-process with components having continuous marginal distributions. In addition, the process ${\textbf{z}_{\text{•}}}$ satisfies the following four conditions for every $t,\kappa \in \mathbb{N}$,
\[\begin{aligned}{}& (1)\hspace{2.5pt}{\boldsymbol{\mu }_{{\textbf{z}_{t}}}}=\textbf{0},\hspace{2em}(2)\hspace{2.5pt}{\textbf{S}_{0}}\left[{\textbf{z}_{t}}\right]={\textbf{I}_{d}},\\ {} & (3)\hspace{2.5pt}{\ddot{\textbf{S}}_{\kappa }}\left[{\textbf{z}_{t}}\right]\hspace{2.5pt}\text{is finite and depends only on}\hspace{2.5pt}\kappa ,\\ {} & (4)\hspace{2.5pt}\exists \tau \in \mathbb{N}:{\textbf{S}_{\tau }}\left[{\textbf{z}_{t}}\right]={\boldsymbol{\Lambda }_{\tau }}=\text{diag}\left({\lambda _{\tau }^{(1)}},\dots ,{\lambda _{\tau }^{(d)}}\right),\end{aligned}\]
such that $+\infty >{\lambda _{\tau }^{(1)}}>{\lambda _{\tau }^{(2)}}>\cdots >{\lambda _{\tau }^{(d)}}>-\infty $.To improve the fluency of the paper, from hereon, the term mixing model is used to refer to the temporally uncorrelated components mixing model. The conditions (1)–(4) of Definition 2 imply that the latent process ${\textbf{z}_{\text{•}}}$ is weakly stationary in the sense of Definition 1. Note that under these model assumptions the concatenated ${\mathbb{R}^{2d}}$-process ${\left(\begin{array}{c@{\hskip10.0pt}c}\text{Re}({\textbf{z}_{\text{•}}^{\top }})& \text{Im}({\textbf{z}_{\text{•}}^{\top }})\end{array}\right)^{\top }}$ is not necessarily weakly stationary in the classical real-valued sense. One could also require that the concatenated vector is stationary, which can be done by adding to the model a condition involving the complex-valued relation matrix, defined in Section 2. However, adding the extra condition would also fix the relationship, up to heterogeneous sign-changes, between the real and imaginary parts inside a single component. In this paper, we do not wish to make assumptions regarding the complex phase of a single component. Consequently, many of the following results involve a so-called phase-shift matrix $\textbf{J}$, that is, a complex-valued diagonal matrix with diagonal entries of the form $\exp (i{\theta _{1}}),\dots ,\exp (i{\theta _{d}})$. A phase-shift matrix $\textbf{J}$ is by definition unitary, i.e., $\textbf{J}{\textbf{J}^{\text{H}}}={\textbf{J}^{\text{H}}}\textbf{J}={\textbf{I}_{d}}$. Note that all possible phase-shift matrices can be constructed by considering phases on some suitable $2\pi $-length interval, e.g., the usual $[0,2\pi )$.
Condition (3) is included in Definition 2 to ensure that we can apply limiting results that require the traditional real-valued weak stationarity. Since Condition (3) is solely used to guarantee stationarity, the only requirements for ${\ddot{\textbf{S}}_{\kappa }}\left[{\textbf{z}_{t}}\right]$ are that it has finite elements and that it is invariant with respect to the time parameter t.
For a process ${\textbf{x}_{\text{•}}}$ that follows the mixing model, the corresponding unmixing problem is to uncover the latent process ${\textbf{z}_{\text{•}}}$ by using only the information contained in the observable process ${\textbf{x}_{\text{•}}}$.
Definition 3.
Let ${\textbf{x}_{\text{•}}}:={\left({\textbf{x}_{t}}\right)_{t\in \mathbb{N}}}$ be a process that satisfies Definition 2, such that condition (4) holds for some fixed τ. The affine functional $g:\textbf{a}\mapsto \boldsymbol{\Gamma }(\textbf{a}-\boldsymbol{\mu }):{\mathbb{C}^{d}}\to {\mathbb{C}^{d}}$ is a solution to the corresponding unmixing problem if the process $g\circ {\textbf{x}_{\text{•}}}$ satisfies conditions (1)–(3) and condition (4) is satisfied with the fixed τ.
Given a process ${\textbf{x}_{\text{•}}}$ that follows the mixing model and a corresponding solution $g:\textbf{a}\mapsto \boldsymbol{\Gamma }(\textbf{a}-\boldsymbol{\mu })$, we refer to $\boldsymbol{A}$, see Definition 2, as the mixing matrix and we refer to Γ as the unmixing matrix. The objective of the unmixing matrix is to reverse the effects of the mixing matrix. However, under our model assumptions, we cannot uniquely determine the relationship between the mixing and the unmixing matrix. The relationship between the two matrices is $\boldsymbol{\Gamma }\textbf{A}=\textbf{J}$, where $\textbf{J}\in {\mathbb{C}^{d\times d}}$ is some phase-shift matrix, see Lemma 3 in the supplementary Appendix.
Under our assumptions, we cannot distinguish between solutions that contain unmixing matrices that are a phase-shift away from each other. Thus, we say that two solutions ${g_{1}}:\textbf{a}\mapsto {\boldsymbol{\Gamma }_{1}}(\textbf{a}-{\boldsymbol{\mu }_{1}})$ and ${g_{2}}:\textbf{a}\mapsto {\boldsymbol{\Gamma }_{2}}(\textbf{a}-{\boldsymbol{\mu }_{2}})$ are equivalent, if there exists a phase-shift matrix $\textbf{J}$ such that ${\boldsymbol{\Gamma }_{1}}={\boldsymbol{\Gamma }_{2}}\textbf{J}$.
We next provide a solution procedure for the unmixing problem. Recall that in the mixing model we assume that the mixing matrix $\textbf{A}$ and the latent process ${\textbf{z}_{\text{•}}}$ are unobservable. Therefore, the solution procedure relies solely on statistics of the observable process ${\textbf{x}_{\text{•}}}$.
Theorem 1.
Let ${\boldsymbol{x}_{\text{•}}}:={\left({\boldsymbol{x}_{t}}\right)_{t\in \mathbb{N}}}$ be a process that satisfies Definition 2. The functional $g:\boldsymbol{a}\mapsto \boldsymbol{\Gamma }(\boldsymbol{a}-\boldsymbol{\mu }):{\mathbb{C}^{d}}\to {\mathbb{C}^{d}}$ is a solution to the corresponding unmixing problem if and only if the eigenvector equation
and the scaling equation
are satisfied.
(1)
\[\begin{aligned}{}& {\left({\boldsymbol{S}_{0}}\left[{\boldsymbol{x}_{t}}\right]\right)^{-1}}{\boldsymbol{S}_{\tau }}\left[{\boldsymbol{x}_{t}}\right]{\boldsymbol{\Gamma }^{\text{H}}}={\boldsymbol{\Gamma }^{\text{H}}}{\boldsymbol{\Lambda }_{\tau }},\hspace{1em}\forall t\in \mathbb{N},\end{aligned}\]One can apply Theorem 1 to find solutions by first estimating the eigenvalues and eigenvectors of ${\left({\textbf{S}_{0}}\left[{\textbf{x}_{t}}\right]\right)^{-1}}{\textbf{S}_{\tau }}\left[{\textbf{x}_{t}}\right]$ and then scaling the eigenvectors so that the scaling equation is satisfied. In addition, see Corollary 2 in the supplementary Appendix.
4 Estimation and asymptotic properties
The algorithm for multiple unknown signals extraction (AMUSE), [35], is a widely applied blind source separation unmixing procedure. AMUSE and the corresponding asymptotic properties have been previously studied in the real-valued case, see, e.g, [23]. Here, we adopt the estimation part of the AMUSE algorithm. However, our underlying model assumptions are not identical to the ones in [23] and [35]. In this section, we consider the consistency and the limiting distribution of the corresponding unmixing estimator under complex-valued long-range and short-range dependent processes, significantly extending the results given in [23].
Definition 4.
Let ${\textbf{x}_{\text{•}}}:={\left({\textbf{x}_{t}}\right)_{t\in \mathbb{N}}}$ be a process that satisfies Definition 2 and let $\textbf{X}$ be a ${\mathbb{C}^{T\times d}}$-valued, $1\le d<T<\infty $, sampled stochastic process generated by ${\textbf{x}_{\text{•}}}$. Let $\hat{\boldsymbol{\mu }}:=\hat{\boldsymbol{\mu }}[\textbf{X}]$ and let ${\textbf{1}_{T}}$ be a ${\mathbb{R}^{T}}$-vector full of ones. The mapping $\hat{g}:\textbf{C}\mapsto (\textbf{C}-{\textbf{1}_{T}}{\hat{\boldsymbol{\mu }}^{\top }}){\hat{\boldsymbol{\Gamma }}^{\top }}:{\mathbb{C}^{T\times d}}\to {\mathbb{C}^{T\times d}}$ is a solution to the finite sample unmixing problem, if ${\hat{\lambda }_{\tau }^{(1)}}\ge \cdots \ge {\hat{\lambda }_{\tau }^{(d)}}$, such that
\[ \hat{\boldsymbol{\Gamma }}{\hat{\textbf{S}}_{0}}[\textbf{X}]{\hat{\boldsymbol{\Gamma }}^{\text{H}}}={\textbf{I}_{d}}\hspace{2em}\hspace{2.5pt}\text{and}\hspace{2.5pt}\hspace{2em}\hat{\boldsymbol{\Gamma }}{\hat{\textbf{S}}_{\tau }}[\textbf{X}]{\hat{\boldsymbol{\Gamma }}^{\text{H}}}={\hat{\boldsymbol{\Lambda }}_{\tau }}=\text{diag}\left({\hat{\lambda }_{\tau }^{(1)}},\dots ,{\hat{\lambda }_{\tau }^{(d)}}\right).\]
Recall that ${\hat{\textbf{S}}_{\tau }}$ is the symmetrized autocovariance matrix, and it is, by definition, conjugate symmetric. In addition, recall that the diagonal entries and the eigenvalues of a conjugate symmetric matrix are, again by definition, real-valued. Hereby, the ordering of ${\hat{\lambda }_{\tau }^{(j)}}$ is always well-defined. The diagonal elements (eigenvalues) ${\hat{\lambda }_{\tau }^{(j)}}$ can be seen as finite sample estimates of ${\lambda _{\tau }^{(j)}}$. Even though the population eigenvalues ${\lambda _{\tau }^{(j)}}$ are assumed to be distinct, it is possible, due to, e.g., limited computational accuracy, that ties occur among the estimates ${\hat{\lambda }_{\tau }^{(j)}}$. Hereby, in the finite sample case, strict inequalities are not imposed on the estimates ${\hat{\lambda }_{\tau }^{(j)}}$ in Definition 4.
Using Lemma 4 of the supplementary Appendix, we can straightforwardly implement the unmixing procedure. The first step is to calculate ${\hat{\textbf{S}}_{0}}:={\hat{\textbf{S}}_{0}}[\textbf{X}]$, from a realization $\textbf{X}$, and the corresponding conjugate symmetric inverse square root ${\hat{\boldsymbol{\Sigma }}_{0}}:={\hat{\textbf{S}}_{0}^{-1/2}}$. Recall that, by assumption, the components of ${\textbf{z}_{\text{•}}}$ have continuous marginal distributions and the mixing matrix is nonsingular. Thus, the eigenvalues of covariance matrix estimates are always real-valued and almost surely positive. Hereby, the matrix ${\hat{\boldsymbol{\Sigma }}_{0}}$ can be obtained by estimating the eigendecomposition of ${\hat{\textbf{S}}_{0}}$. The next step is to choose a lag-parameter τ and estimate the eigendecomposition ${\hat{\textbf{S}}_{\tau }}[\textbf{X}{\hat{\boldsymbol{\Sigma }}_{0}^{\top }}]={\hat{\boldsymbol{\Sigma }}_{0}}{\hat{\textbf{S}}_{\tau }}[\textbf{X}]{\hat{\boldsymbol{\Sigma }}_{0}}=\hat{\textbf{V}}{\hat{\boldsymbol{\Lambda }}_{\tau }}{\hat{\textbf{V}}^{\text{H}}}$. The covariance matrix and autocovariance matrix estimators are affine equivariant in the sense that ${\hat{\textbf{S}}_{j}}[\textbf{X}{\textbf{C}^{\top }}]=\textbf{C}{\hat{\textbf{S}}_{j}}[\textbf{X}]{\textbf{C}^{\text{H}}}$, $j\in \{0,\tau \}$, for all nonsingular ${\mathbb{C}^{d\times d}}$-matrices $\textbf{C}$. An estimate for the latent process can then be found via the mapping $(\textbf{X}-{\textbf{1}_{T}}{\hat{\boldsymbol{\mu }}^{\top }}){({\hat{\textbf{V}}^{\text{H}}}{\hat{\boldsymbol{\Sigma }}_{0}})^{\top }}$, where $\hat{\boldsymbol{\mu }}$ is the sample mean vector of $\textbf{X}$.
In practice, one should choose the lag parameter $\tau \ne 0$ such that the diagonal elements of ${\hat{\boldsymbol{\Lambda }}_{\tau }}$ are as distinct as possible. Strategies for choosing τ are discussed, e.g., in [9]. From a computational perspective, the estimation procedure is relatively fast. Hereby, in practice, one should try several different values of τ and study ${\hat{\boldsymbol{\Lambda }}_{\tau }}$.
We emphasize that one can apply the theory of this paper to other estimators as well. That is, under minor model assumptions, the estimators ${\hat{\textbf{S}}_{0}}$, ${\hat{\textbf{S}}_{\tau }}$ can be replaced with any matrix-valued estimators that have the so-called complex-valued affine equivariance property, see [16].
The conditions given in Definition 4 remain true if we replace $\hat{\boldsymbol{\Gamma }}$ with $\textbf{J}\hat{\boldsymbol{\Gamma }}$, where $\textbf{J}\in {\mathbb{C}^{d\times d}}$ can be any phase-shift matrix. Thus, as in the population level, we say that the estimates ${\hat{\boldsymbol{\Gamma }}_{1}}$ and ${\hat{\boldsymbol{\Gamma }}_{2}}$ are equivalent if ${\hat{\boldsymbol{\Gamma }}_{1}}=\textbf{J}{\hat{\boldsymbol{\Gamma }}_{2}}$ for some phase-shift matrix $\textbf{J}$.
Justified by the affine invariance property given in supplementary Appendix Lemmas 5 and 6, we can, without loss of generality, derive the rest of the theory under the assumption of trivial mixing, that is, in the case when the mixing matrix is $\textbf{A}={\textbf{I}_{d}}$. See also the beginning of the Proof of Lemma 2.
4.1 Limiting behavior of the AMUSE estimator
We next consider limiting properties of the finite sample solutions. Note that the finite sample statistics and the sampled stochastic process $\textbf{X}$ depend on the sample size T. In the framework of this paper, our usage of the notation is equivalent with the one presented in [38]. We use the short expression ${\textbf{X}_{T}}={o_{p}}(1)$ to denote that a sequence of ${\mathbb{C}^{d\times d}}$-matrices ${\textbf{X}_{1}},{\textbf{X}_{2}}\dots $, converges in probability to a zero matrix. In addition, we use ${\textbf{Y}_{\beta }}={\mathcal{O}_{p}}(1)$ to denote that a collection of ${\mathbb{C}^{d\times d}}$-matrices $\{{\textbf{Y}_{\beta }}:\beta \in B\}$ is uniformly tight under some nonempty indexing set B. By saying that a matrix is uniformly tight, we mean that real part and imaginary part of every element in the matrix is uniformly tight.
Lemma 2.
Let ${\boldsymbol{x}_{\text{•}}}:={\left({\boldsymbol{x}_{t}}\right)_{t\in \mathbb{N}}}$ be a process that satisfies Definition 2 and let $\boldsymbol{x}$ be a ${\mathbb{C}^{T\times d}}$-valued, $1\le d<T<\infty $, sampled stochastic process generated by ${\boldsymbol{x}_{\text{•}}}$ and let $\hat{g}:\boldsymbol{C}\mapsto (\boldsymbol{C}-{\boldsymbol{1}_{T}}{\hat{\boldsymbol{\mu }}^{\top }}){\hat{\boldsymbol{\Gamma }}^{\top }}$ be a T-indexed sequence of corresponding finite sample solutions. Furthermore, let ${\alpha _{T}}(\hat{{\boldsymbol{S}_{0}}}[\boldsymbol{x}]-{\boldsymbol{S}_{0}}[{\boldsymbol{x}_{t}}])={\mathcal{O}_{p}}(1)$ and ${\beta _{T}}(\hat{{\boldsymbol{S}_{\tau }}}[\boldsymbol{x}]-{\boldsymbol{S}_{\tau }}[{\boldsymbol{x}_{t}}])={\mathcal{O}_{p}}(1)$, for some real-valued sequences ${\alpha _{T}}$ and ${\beta _{T}}$, which satisfy ${\alpha _{T}}\uparrow \infty $, ${\beta _{T}}\uparrow \infty $ as $T\to \infty $. Then, for ${\gamma _{t}}=\min ({\alpha _{t}},{\beta _{t}})$, there exists a sequence of T-indexed phase-shift matrices $\hat{\boldsymbol{J}}$, such that
holds asymptotically.
Hereby, under the assumptions of Lemma 2, there exists a sequence of T-indexed matrices $\hat{\textbf{J}}$ such that $\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}$ converges in probability to Γ.
Theorem 2.
Let $\boldsymbol{x}$, g and $\hat{g}$ be defined as in Lemma 2 and denote the element $(j,k)$ of $\hat{{\boldsymbol{S}_{\tau }}}[\boldsymbol{x}]$ as ${[\hat{{\boldsymbol{S}_{\tau }}}]_{jk}}$. Then, under the assumptions of Lemma 2 and the trivial mixing scenario $\boldsymbol{a}={\boldsymbol{I}_{d}}$, there exists a sequence of T-indexed phase-shift matrices $\hat{\boldsymbol{J}}$, such that
\[\begin{aligned}{}& {\gamma _{T}}\left({\hat{\boldsymbol{J}}_{jj}}{\hat{\boldsymbol{\Gamma }}_{jj}}-1\right)=\frac{{\gamma _{T}}}{2}\left({\left[\hat{{\boldsymbol{S}_{0}}}\right]_{jj}}-1\right)+{\mathcal{O}_{p}}(1/{\gamma _{T}}),\hspace{1em}\forall j\in \{1,\dots ,d\},\hspace{1em}\textit{and}\\ {} & {\gamma _{T}}\left({\lambda _{\tau }^{(k)}}-{\lambda _{\tau }^{(j)}}\right){\hat{\boldsymbol{J}}_{jj}}{\hat{\boldsymbol{\Gamma }}_{jk}}={\gamma _{T}}\left({\lambda _{\tau }^{(j)}}{\left[\hat{{\boldsymbol{S}_{0}}}\right]_{jk}}-{\left[\hat{{\boldsymbol{S}_{\tau }}}\right]_{jk}}\right)+{\mathcal{O}_{p}}(1/{\gamma _{T}}),\hspace{1em}j\ne k.\end{aligned}\]
By Theorem 2, we can directly find the asymptotic distribution of the unmixing matrix estimator $\hat{\boldsymbol{\Gamma }}$, if we have the asymptotic distributions and the convergence rates of the estimators ${\hat{\textbf{S}}_{0}}$ and ${\hat{\textbf{S}}_{\tau }}$. Note that, if the rates ${\alpha _{T}}$ and ${\beta _{T}}$ differ, the estimator with the faster convergence will tend to zero in probability in Theorem 2. Furthermore, note that the asymptotic distributions of the diagonal elements of ${\gamma _{T}}(\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}-{\textbf{I}_{d}})$ only depend on the asymptotic distribution of the covariance matrix estimator ${\hat{\textbf{S}}_{0}}$. However, the rate of convergence of the off-diagonal elements depends on both ${\hat{\textbf{S}}_{0}}$ and ${\hat{\textbf{S}}_{\tau }}$, and consequently, if the covariance estimator ${\hat{\textbf{S}}_{0}}$ converges faster than the autocovariance estimator ${\hat{\textbf{S}}_{\tau }}$, the diagonal elements of ${\gamma _{T}}(\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}-{\textbf{I}_{d}})$ converge to zero in distribution and in probability. Conversely, it may happen that the autocovariance estimator ${\hat{\textbf{S}}_{\tau }}$ converges faster. In that case, the off-diagonal elements of ${\gamma _{T}}(\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}-{\textbf{I}_{d}})$ converge to zero in distribution and in probability.
Theorem 2 can alternatively be presented in matrix form as
\[\begin{aligned}{}& {\gamma _{T}}\left(\text{diag}\left[\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}-{\textbf{I}_{d}}\right]\right)={\gamma _{T}}\left(\frac{1}{2}\text{diag}\left[{\hat{\textbf{S}}_{0}}-{\textbf{I}_{d}}\right]\right)+{\mathcal{O}_{p}}(1/{\gamma _{T}}),\\ {} & {\gamma _{T}}\left(\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}-\text{diag}\left[\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}\right]\right)={\gamma _{T}}\left(\textbf{H}\odot \left[{\boldsymbol{\Lambda }_{\tau }}{\hat{\textbf{S}}_{0}}-{\hat{\textbf{S}}_{\tau }}\right]\right)+{\mathcal{O}_{p}}(1/{\gamma _{T}}),\end{aligned}\]
where ${\textbf{H}_{jj}}=0$ and ${\textbf{H}_{jk}}={({\lambda _{\tau }^{(k)}}-{\lambda _{\tau }^{(j)}})^{-1}}$, $j\ne k$, ⊙ denotes the Hadamard (i.e., entrywise) product and $\hat{\textbf{J}}$ is the T-indexed sequence of phase-shift matrices that set the diagonal elements of $\hat{\boldsymbol{\Gamma }}$ to be on the positive real-axis.4.2 Limiting behavior under summable covariance structures
In this section, we consider a class of stochastic processes that satisfy the Breuer–Major theorem for weakly stationary processes. The Breuer–Major theorem is considered in the context of Gaussian subordinated processes, see, e.g., [2, 8, 30]. A univariate real-valued Gaussian subordinated process ${z_{\text{•}}}$, defined on a probability space $(\Omega ,\mathcal{F},\mathbb{P})$, is a weakly stationary process that can be expressed in the form ${z_{\text{•}}}=f\circ {\textbf{y}_{\text{•}}}$, where ${\textbf{y}_{\text{•}}}$ is a ${\mathbb{R}^{\ell }}$-variate Gaussian process and $f:{\mathbb{R}^{\ell }}\to \mathbb{R}$.
We emphasize that Gaussian subordinated processes form a very rich model class. For example, recently in [39], the authors showed that arbitrary one-dimensional marginal distributions and a rich class of covariance structures can be modeled by $f\circ {\textbf{y}_{\text{•}}}$ with simple univariate stationary Gaussian process ${\textbf{y}_{\text{•}}}$. While the model class is very rich, underlying driving Gaussian processes still provide a large toolbox for analyzing limiting behavior of various estimators. Such limit theorems have been a topic of active research for decades. For recent relevant papers on the topic, we refer to [27–30] and the references therein.
We next give the definitions of univariate real-valued Hermite polynomials and Hermite ranks. We define the kth, $k\in \{0\}\cup \mathbb{N}$, (probabilistic) Hermite polynomial ${H_{k}}$, using Rodrigues’ formula
\[ {H_{k}}(x)={(-1)^{k}}\exp \left({x^{2}}/2\right)\frac{{d^{k}}}{d{x^{k}}}\exp \left(-{x^{2}}/2\right).\]
The first four Hermite polynomials are hereby ${H_{0}}(x)=1$, ${H_{1}}(x)=x$, ${H_{2}}(x)={x^{2}}-1$ and ${H_{3}}(x)={x^{3}}-3x$. The set $\{{H_{k}}/\sqrt{k!}:k\in \{0\}\cup \mathbb{N}\}$ forms an orthonormal basis on the Hilbert space ${\mathcal{L}^{2}}(\mathbb{R},{P_{y}})$, where ${P_{y}}$ denotes the law of a univariate standard Gaussian random variable. Consequently, every function $f\in {\mathcal{L}^{2}}(\mathbb{R},{P_{y}})$ can be decomposed as
where ${a_{k}}\in \mathbb{R}$ for every $k\in \{0\}\cup \mathbb{N}$. If x and y follow the univariate standard Gaussian distribution, the orthogonality of the Hermite polynomials and the decomposition of Equation (3) give
\[ \mathbb{E}\left[f(x)f(y)\right]={\sum \limits_{k=0}^{\infty }}k!{\alpha _{k}^{2}}{\left(\mathbb{E}\left[(x-\mathbb{E}[x])(y-\mathbb{E}[y])\right]\right)^{k}}={\sum \limits_{k=0}^{\infty }}k!{\alpha _{k}^{2}}{\left(\text{Cov}[x,y]\right)^{k}}.\]
In the following, we consider real-valued functions of several variables and apply some known results that are based on multivariate Hermite polynomials. However, our derivations (in Section 4.3) require the use of univariate Hermite polynomials only. Thus, we only define univariate Hermite polynomials here.The Hermite rank for a function f is defined as follows.
Definition 5 (Hermite rank).
Let $\textbf{y}$ be a ${\mathbb{R}^{\ell }}$-valued Gaussian random vector, $\ell \in \mathbb{N}$, and let $f:{\mathbb{R}^{\ell }}\to \mathbb{R}$ be a function such that $f\circ \textbf{y}$ is square-integrable. The function f has Hermite rank q, with respect to $\textbf{y}$, if
\[ \mathbb{E}\left[\left(f\left(\textbf{y}\right)-\mathbb{E}\left[f\left(\textbf{y}\right)\right]\right){p_{m}}\left(\textbf{y}\right)\right]=0,\]
for all Hermite polynomials ${p_{m}}:{\mathbb{R}^{d}}\to \mathbb{R}$ that are of degree $m\le q-1$ and there exists a polynomial ${p_{q}}$ of degree q such that
Note that in the one-dimensional setting, the Hermite rank q of a function f is the smallest nonnegative integer in Equation (3), such that ${\alpha _{q}}\ne 0$. We next present the multivariate version of the well-known Breuer–Major theorem from [8].
Theorem 3.
Let ${\boldsymbol{y}_{\text{•}}}:={({\boldsymbol{y}_{t}})_{t\in \mathbb{N}}}$ be a ${\mathbb{R}^{\ell }}$-valued centered stationary Gaussian process. Let ${f_{1}},{f_{2}},\dots ,{f_{d}}:{\mathbb{R}^{\ell }}\to \mathbb{R}$ be measurable functions that have the Hermite rank at least $q\in \mathbb{N}$, with respect to ${\boldsymbol{y}_{\text{•}}}$, and let ${f_{j}}\circ {\boldsymbol{y}_{\text{•}}}$ be square-integrable for every $j\in \{1,\dots d\}$. Additionally, let the series of covariances satisfy
\[ {\sum \limits_{\tau =0}^{\infty }}{\left|{\ddot{\text{S}}_{\tau }}\left[{y_{1}^{(j)}},{y_{1}^{(k)}}\right]\right|^{q}}={\sum \limits_{\tau =0}^{\infty }}{\left|\mathbb{E}\left[{y_{1}^{(j)}}{y_{1+\tau }^{(k)}}\right]\right|^{q}}<+\infty ,\hspace{1em}\forall j,k\in \left\{1,\dots ,\ell \right\}.\]
Then, the ${\mathbb{R}^{d}}$-sequence
\[ \frac{1}{\sqrt{T}}{\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}{\textstyle\textstyle\sum _{t=1}^{T}}\left({f_{1}}({\boldsymbol{y}_{t}})-\mathbb{E}\left[{f_{1}}({\boldsymbol{y}_{t}})\right]\right)& \cdots & {\textstyle\textstyle\sum _{t=1}^{T}}\left({f_{d}}({\boldsymbol{y}_{t}})-\mathbb{E}\left[{f_{d}}({\boldsymbol{y}_{t}})\right]\right)\end{array}\right)^{\top }}\]
converges in distribution, as $T\to \infty $, to a centered Gaussian vector $\boldsymbol{\rho }=\left({\rho _{1}},{\rho _{2}},\dots ,{\rho _{d}}\right)$ with finite covariances $\mathbb{E}\left[{\rho _{j}}{\rho _{k}}\right]$ equal to
\[ {\text{S}_{0}}\left[{f_{j}}({\boldsymbol{y}_{1}}),{f_{k}}({\boldsymbol{y}_{1}})\right]+{\sum \limits_{\tau =1}^{\infty }}\left({\ddot{\text{S}}_{\tau }}\left[{f_{j}}({\boldsymbol{y}_{1}}),{f_{k}}({\boldsymbol{y}_{1}})\right]+{\ddot{\text{S}}_{\tau }}\left[{f_{k}}({\boldsymbol{y}_{1}}),{f_{j}}({\boldsymbol{y}_{1}})\right]\right).\]
The proof of Theorem 3 is omitted here. Theorem 3 follows directly from the univariate Breuer–Major theorem, given in [8], and by using the usual Cramér–Wold device, see, e.g., [7, Theorem 29.4]. A similar version of Theorem 3 can also be found in Section 5 of [2].
We next present the following assumption which enables us to find the asymptotic behavior of the unmixing matrix estimator using the Breuer–Major theorem.
Assumption 1.
Let ${\boldsymbol{x}_{\text{•}}}:={\left({\boldsymbol{x}_{t}}\right)_{t\in \mathbb{N}}}$ be a process that satisfies Definition 2 with the trivial mixing matrix $\boldsymbol{a}={\boldsymbol{I}_{d}}$ and let $\boldsymbol{x}$ be a ${\mathbb{C}^{T\times d}}$-valued, $1\le d<T<\infty $, sampled stochastic process generated by ${\boldsymbol{x}_{\text{•}}}$. Denote ${\boldsymbol{z}_{\text{•}}}={({\boldsymbol{b}_{t}}+i{\boldsymbol{C}_{t}})_{t\in \mathbb{N}}}={\boldsymbol{b}_{\text{•}}}+i{\boldsymbol{C}_{\text{•}}}$. We assume that there exists $\ell \in \mathbb{N}$ and a centered ${\mathbb{R}^{\ell }}$-valued stationary Gaussian process ${\boldsymbol{\eta }_{\text{•}}}$, such that, for all $k\in \{1,\dots d\}$, the component ${\boldsymbol{b}_{\text{•}}^{(k)}}$ has the same finite-dimensional distributions and, asymptotically, the same autocovariance function as ${\tilde{f}_{k}}({\boldsymbol{\eta }_{\text{•}}})$, for some function ${\tilde{f}_{k}}:{\mathbb{R}^{\ell }}\to \mathbb{R}$. Similarly, we assume that, for all $k\in \{1,\dots d\}$, the component ${\boldsymbol{C}_{\text{•}}^{(k)}}$ has the same finite-dimensional distributions and, asymptotically, the same autocovariance function as ${\tilde{f}_{k+d}}({\boldsymbol{\eta }_{\text{•}}})$, for some function ${\tilde{f}_{k+d}}:{\mathbb{R}^{\ell }}\to \mathbb{R}$. Furthermore, we assume that $\mathbb{E}[|{\tilde{f}_{k}}({\boldsymbol{\eta }_{1}}){|^{4}}]<+\infty $, $\forall k\in \{1,\dots ,2d\}$, and that ${r_{\boldsymbol{\eta }}^{(j,k)}}(t)=\mathbb{E}\left[{\boldsymbol{\eta }_{t+1}^{(j)}}{\boldsymbol{\eta }_{1}^{(k)}}\right]$ satisfies ${r_{\boldsymbol{\eta }}^{(j,j)}}(0)=1$, $\forall j\in \{1,\dots ,\ell \}$, and
Note that the finite-dimensional distributions of the $\mathbb{R}$-valued stochastic process ${c_{\text{•}}}$ is the collection of probability measures defined as,
\[ \left\{\mathbb{P}\left[\left\{{c_{{t_{1}}}}\hspace{-0.1667em}\hspace{-0.1667em}\in \hspace{-0.1667em}{B_{1}},\dots ,{c_{{t_{m}}}}\hspace{-0.1667em}\hspace{-0.1667em}\in \hspace{-0.1667em}{B_{m}}\right\}\right]\mid m\hspace{-0.1667em}\in \hspace{-0.1667em}\mathbb{N}:\forall {B_{j}}\in \mathcal{B}(\mathbb{R})\hspace{2.5pt}\text{and}\hspace{2.5pt}\{{t_{1}},\dots ,{t_{m}}\}\hspace{-0.1667em}\subset \mathbb{N}\right\}.\]
We again want to emphasize that a wide class of stochastic processes satisfy Assumption 1. Indeed, we allow an arbitrary dimension ℓ for the driving Gaussian process ${\boldsymbol{\eta }_{\text{•}}}$ and arbitrary (summable) covariance structures. In comparison, it was shown in [39] that in many cases it would suffice to relate only one stationary Gaussian process ${\boldsymbol{\eta }_{\text{•}}^{(j)}}$ to each function ${\tilde{f}_{j}}$.
We are now ready to consider the asymptotic distribution for the unmixing matrix estimator under Assumption 1.
Theorem 4.
Let Assumption 1 be satisfied and let $\hat{g}:\boldsymbol{C}\mapsto (\boldsymbol{C}-{\boldsymbol{1}_{T}}{\hat{\boldsymbol{\mu }}^{\top }}){\hat{\boldsymbol{\Gamma }}^{\top }}$ be a T-indexed sequence of finite sample solutions. Then, under the trivial mixing scenario $\boldsymbol{a}={\boldsymbol{I}_{d}}$, there exists a T-indexed sequence $\hat{\boldsymbol{J}}$ of phase-shift matrices such that
\[ \sqrt{T}\cdot \text{vec}\left(\hat{\boldsymbol{J}}\hat{\boldsymbol{\Gamma }}-{\boldsymbol{I}_{d}}\right){\xrightarrow[T\to \infty ]{\mathcal{D}}}\boldsymbol{\nu }\sim {\mathcal{N}_{{d^{2}}}}(\boldsymbol{0},{\boldsymbol{\Sigma }_{\boldsymbol{\nu }}},{\boldsymbol{P}_{\boldsymbol{\nu }}}).\]
The exact forms for ${\boldsymbol{\Sigma }_{\boldsymbol{\nu }}}$ and ${\boldsymbol{P}_{\boldsymbol{\nu }}}$ are given in the supplementary Appendix.
Note that it is possible to present Theorem 4 with even weaker assumptions. In the current formulation, we require that the Gaussian process ${\textbf{y}_{\text{•}}}$ (equivalently, ${\boldsymbol{\eta }_{\text{•}}}$) has summable covariances and cross-covariances. By studying the exact Hermite ranks of the underlying functions, weaker summability conditions would suffice (cf. Theorem 3). However, given the Hermite ranks of the functions ${f_{j}}$, it is in general impossible to say anything about the Hermite ranks of transformations such as ${f_{j}^{2}}$ arising from ${\hat{\textbf{S}}_{0}}$, see the proof of Theorem 4. On the other hand, assuming Hermite rank equal to one is a very natural assumption in many occasions. For example, the modeling approach given in [39] sets the rank of ${f_{j}}$ to equal 1. Hermite rank 1 is stable in a sense that even small perturbations in a function with higher Hermite rank leads to rank 1 again. For further discussion on the stability of Hermite rank 1, we refer to [4].
4.3 Notes on noncentral limit theorems
In this section, we provide examples where the convergence rate of the unmixing estimator differs from the standard $\sqrt{T}$ and where the limiting distribution is non-Gaussian. Such situations arise, especially, when the convergence summability condition of Theorem 3 does not hold.
Assumption 2.
Let ${\boldsymbol{x}_{\text{•}}}:={\left({\boldsymbol{x}_{t}}\right)_{t\in \mathbb{N}}}$ be a process that satisfies Definition 2 with the trivial mixing matrix $\boldsymbol{a}={\boldsymbol{I}_{d}}$ and let $\boldsymbol{x}$ be a ${\mathbb{C}^{T\times d}}$-valued, $1\le d<T<+\infty $, sampled stochastic process generated by ${\boldsymbol{x}_{\text{•}}}$ and ${\boldsymbol{z}_{\text{•}}}={\boldsymbol{b}_{\text{•}}}+i{\boldsymbol{C}_{\text{•}}}$. We assume that there exists centered $\mathbb{R}$-valued stationary Gaussian processes ${\boldsymbol{\eta }_{\text{•}}^{(k)}}$, $k\in \{1,2\dots ,2d\}$, such that, for all $k\in \{1,\dots d\}$, the component ${\boldsymbol{b}_{\text{•}}^{(k)}}$ has the same finite-dimensional distributions and, asymptotically, the same autocovariance function as ${\tilde{f}_{k}}({\boldsymbol{\eta }_{\text{•}}^{(k)}})$, for some function ${\tilde{f}_{k}}:\mathbb{R}\to \mathbb{R}$. Similarly, we assume that, for all $k\in \{1,\dots d\}$, the component ${\boldsymbol{C}_{\text{•}}^{(k)}}$ has the same finite-dimensional distributions and, asymptotically, the same autocovariance function as ${\tilde{f}_{k+d}}({\boldsymbol{\eta }_{\text{•}}^{(k+d)}})$, for some function ${\tilde{f}_{k+d}}:\mathbb{R}\to \mathbb{R}$. Furthermore, we assume that $\mathbb{E}[|{\tilde{f}_{k}}({\boldsymbol{\eta }_{1}^{(k)}}){|^{4}}]<+\infty $, $\forall k\in \{1,\dots ,2d\}$, and that the Gaussian processes ${\boldsymbol{\eta }_{\text{•}}^{(1)}},{\boldsymbol{\eta }_{\text{•}}^{(2)}},\dots ,{\boldsymbol{\eta }_{\text{•}}^{(2d)}}$ are mutually independent.
In comparison to Assumption 1, we here assume that real and imaginary parts of each component are driven by a single Gaussian process. As discussed in Section 4.2, this is not a huge restriction. In addition, we assume independent components and that the real and imaginary parts of each component are independent. Although independence is a common assumption in the blind source separation literature, we note that our results can be extended to cover dependencies as long as, for every $j,k\in \{1,2,\dots ,2d\}$, the cross-covariances between ${\boldsymbol{\eta }^{(k)}}$ and ${\boldsymbol{\eta }^{(j)}}$ are negligible compared to the decay of the autocovariance functions ${r_{{\boldsymbol{\eta }^{(j)}}}}(t)$.
If the autocovariance functions ${r_{{\boldsymbol{\eta }^{(j)}}}}$ satisfy
then we are in the situation of Theorem 4. More generally, a weakly stationary series is called short-range dependent if the corresponding autocovariance function r satisfies Equation (4). Hence, we assume that at least one of the functions ${r_{{\boldsymbol{\eta }^{(j)}}}}$ is not summable. For such components, we assume the following long-range dependence condition.
(4)
\[ {\sum \limits_{t=1}^{\infty }}|{r_{{\boldsymbol{\eta }^{(j)}}}}(t)|<+\infty ,\hspace{1em}\forall j\in \{1,\dots 2d\},\]Note that the definition of long-range dependence varies in the literature. For details on long-range dependent processes and their different definitions, we refer to [5, 6, 33].
There is a large literature on limit theorems under long-range dependence. See, e.g., [3, 11, 34] for a few central works on the topic. For an insightful and detailed presentation we refer to [18], and for recent advances on the topic, see, e.g., [21, 40]. In the case of long-range dependent processes, the rate of convergence of the normalized mean is slower than the usual $\sqrt{T}$ and the limiting distribution depends on the corresponding Hermite rank. More precisely, the limiting distribution follows a so-called q-Hermite distribution, where q is the Hermite rank of the underlying function. Note that q-Hermite distributions can be fully characterized by the corresponding Hermite rank q and a self-similarity parameter H, i.e., the Hurst index. In the stable case $q=1$, we obtain a Gaussian limit, and in the case $q=2$ we obtain the so-called Rosenblatt distribution that is not Gaussian. Similarly, for $q\ge 3$ the limiting distribution is not Gaussian. For details on Hermite distributions and processes, we refer to [3, 12, 36]. In particular, we apply the following known result (see [11, Theorem 1]).
Proposition 1.
Let ${\eta _{\text{•}}}$ be a $\mathbb{R}$-valued stationary Gaussian process with autocovariance function ${r_{\eta }}$ such that ${r_{\eta }}(0)=1$ and ${r_{\eta }}$ satisfies Equation (5) for some $H\in \left[1/2,1\right)$. Let f be a function such that $\mathbb{E}[{(f({\eta _{1}}))^{2}}]<+\infty $ and the Hermite rank of f equals q. If $q(2H-2)>-1$, then, for some constant $C={C_{f,\eta }}>0$, we have
\[ {T^{q(1-H)-1}}{\sum \limits_{t=1}^{T}}\left(f({\eta _{t}})-\mathbb{E}[f({\eta _{t}})]\right){\xrightarrow[T\to \infty ]{\mathcal{D}}}C{Z_{q}},\]
where ${Z_{q}}$ follows a q-Hermite distribution.
In view of Proposition 1, in the long-range dependent case the Hermite rank plays a crucial role. Thus, one cannot pose general results without any a priori knowledge on the ranks. Indeed, it can be shown — see, e.g., [6, Equation 4.26] with $d=H-1/2$ — that if f has Hermite rank q and r satisfies Equation (5) with some H such that $q(2H-2)>-1$, then
This justifies the normalization in Proposition 1 and also gives the constant explicitly, as $\mathbb{E}[{Z_{q}}]=0$ and $\mathbb{E}[{({Z_{q}})^{2}}]=1$. We pose the following assumption on the autocovariances ${r_{{\boldsymbol{\eta }^{(j)}}}}$ and Hermite ranks.
(6)
\[ \underset{T\to \infty }{\lim }{T^{q(2-2H)}}\mathbb{E}\left[{\left(\frac{1}{T}{\sum \limits_{t=1}^{T}}\left(f({\boldsymbol{\eta }_{t}})-\mathbb{E}[f({\boldsymbol{\eta }_{t}})]\right)\right)^{2}}\right]=C.\]Assumption 3.
We assume that for every $j\in \{1,2,\dots ,2d\}$ the autocovariance functions ${r_{{\boldsymbol{\eta }^{(j)}}}}$ with ${r_{{\boldsymbol{\eta }^{(j)}}}}(0)=1$ satisfy either Equation (4) or Equation (5), and that Equation (5) is satisfied for at least one index. Let $I\subset \{1,2,\dots ,2d\}$ denote the set of indices for which Equation (5) holds, and for every $i\in I$, denote the Hermite rank of $x\mapsto {\tilde{f}_{i}}(x)$ by ${q_{1,i}}$ and the Hermite rank of $x\mapsto {\left[{\tilde{f}_{i}}(x)-\mathbb{E}({\tilde{f}_{i}}({\boldsymbol{\eta }^{(i)}}))\right]^{2}}$ by ${q_{2,i}}$. We assume that, for any indices $j,k\in I$, $j\ne k$,
and, for any $k\in I$,
(7)
\[ \underset{i\in I}{\max }{q_{2,i}}(2{H_{i}}-2)>\underset{j,k\in I}{\max }\left\{{q_{1,k}}(2{H_{k}}-2)+{q_{1,j}}(2{H_{j}}-2),-1\right\},\]We stress that Assumption 3 is not very restrictive. We allow a combination of short- and long-range dependent processes ${\boldsymbol{\eta }_{\text{•}}^{(j)}}$ and we assume that at least one of the processes is long-range dependent, since otherwise we may apply Theorem 4. In this respect, Condition (7) guarantees that at least one of the ${[{\tilde{f}_{i}}(x)-\mathbb{E}({\tilde{f}_{i}}({\boldsymbol{\eta }^{(i)}}))]^{2}}$ is long-range dependent. Indeed, if ${\max _{i\in I}}{q_{i}}(2{H_{i}}-2)<-1$, then Theorem 4 applies again. Moreover, one could also study the limiting case ${\max _{i\in I}}{q_{i}}(2{H_{i}}-2)=-1$. Then, one usually obtains a Gaussian limit, but with a rate given by $\sqrt{T/\log (T)}$. Our more restrictive Assumptions (7) and (8) are posed in order to apply known results on limit theorems in the long-range dependent case. Indeed, if these conditions were violated, then we would need to study the convergence of complicated dependent random vectors towards some combinations of Hermite distributions. Unfortunately, to the best of our knowledge, there are no studies in this topic that would fit into our general setting (on some results in special cases, see, e.g., [3]).
In view of Proposition 1, the rate of convergence, ${T^{\gamma }}$, of our unmixing estimator arises from Equation (7). We set
By Equation (7), we have that $\gamma <\frac{1}{2}$, meaning that our rate is slower than the usual $\sqrt{T}$. The following results show that cross-terms do not contribute to the limit.
Proposition 2.
Let Assumptions 2 and 3 be satisfied. Then for every $j\ne k$, we have that
and
where ${\xrightarrow[]{{\mathcal{L}_{2}}}}$ denotes convergence in the space ${\mathcal{L}_{2}}(\Omega ,\mathbb{P})$. Consequently, the convergence also holds in probability.
(10)
\[ {T^{\gamma -1}}{\sum \limits_{t=1}^{T}}\left({\tilde{f}_{k}}({\boldsymbol{\eta }_{t}^{(k)}})-\mathbb{E}[{\tilde{f}_{k}}({\boldsymbol{\eta }_{t}^{(k)}})]\right)\left({\tilde{f}_{j}}({\boldsymbol{\eta }_{t}^{(j)}})-\mathbb{E}[{\tilde{f}_{j}}({\boldsymbol{\eta }_{t}^{(j)}})]\right){\xrightarrow[T\to \infty ]{{\mathcal{L}_{2}}}}0\](11)
\[ {T^{\gamma -1}}{\sum \limits_{t=1}^{T}}\left({\tilde{f}_{k}}({\boldsymbol{\eta }_{t}^{(k)}})-\mathbb{E}[{\tilde{f}_{k}}({\boldsymbol{\eta }_{t}^{(k)}})]\right)\left({\tilde{f}_{j}}({\boldsymbol{\eta }_{t+\tau }^{(j)}})-\mathbb{E}[{\tilde{f}_{j}}({\boldsymbol{\eta }_{t}^{(j)}})]\right){\xrightarrow[T\to \infty ]{{\mathcal{L}_{2}}}}0\]In the presence of long-range dependency, the mean estimator $\hat{\boldsymbol{\mu }}[\textbf{X}]$ in the definition of ${\tilde{\textbf{S}}_{\tau }}\left[\textbf{X}\right]$ has to be studied separately as it may contribute to the limit.
In order to obtain the limiting distribution for ${T^{\gamma }}(\hat{\textbf{J}}\hat{\boldsymbol{\Gamma }}-{\textbf{I}_{d}})$, we introduce some notation. For each $k\in \{1,2,\dots ,2d\}$ we define Hermite processes ${Z_{{q_{1,k}}}}$ and ${Z_{{q_{2,k}}}}$, such that for a given k, we allow ${Z_{{q_{1,k}}}}$ and ${Z_{{q_{2,k}}}}$ to be dependent on each other, while both are independent of ${Z_{{q_{1,j}}}}$, ${Z_{{q_{2,j}}}}$ for $j\ne k$. We also introduce constants ${C_{1,k}}$, ${C_{2,k}}$ in the following way: if $k\in I$ is such that equality holds in (8), i.e., ${\max _{i\in I}}{q_{2,i}}(2{H_{i}}-2)=2[{q_{1,k}}(2{H_{k}}-2)]$, then we let ${C_{1,k}}$ to be the constant given in Proposition 1 associated to the limit
\[ {T^{{q_{1,k}}(1-{H_{k}})-1}}{\sum \limits_{t=1}^{T}}\left({\tilde{f}_{k}}({\boldsymbol{\eta }_{t}^{(k)}})-\mathbb{E}[{\tilde{f}_{k}}({\boldsymbol{\eta }_{t}^{(k)}})]\right){\xrightarrow[T\to \infty ]{\mathcal{D}}}{C_{1,k}}{Z_{{q_{1,k}}}}.\]
Otherwise we set ${C_{1,k}}=0$. Similarly, if $k\in I$ is such that
we let ${C_{2,k}}$ to be the constant given in Proposition 1 associated to the limit
\[ {T^{{q_{2,k}}(1-{H_{k}})-1}}{\sum \limits_{t=1}^{T}}\left({h_{k}}({\boldsymbol{\eta }_{t}^{(k)}})-\mathbb{E}\left[{h_{k}}({\boldsymbol{\eta }_{t}^{(k)}})\right]\right){\xrightarrow[T\to \infty ]{\mathcal{D}}}{C_{2,k}}{Z_{{q_{2,k}}}},\]
where
\[ {h_{k}}({\boldsymbol{\eta }_{t}^{(k)}})={\left({\tilde{f}_{k}}({\boldsymbol{\eta }_{t}^{(k)}})-\mathbb{E}[{\tilde{f}_{k}}({\boldsymbol{\eta }_{t}^{(k)}})]\right)^{2}}.\]
Otherwise we set ${C_{2,k}}=0$. The following theorem is our main result of this subsection.Theorem 5.
We remark that it might be that most of the elements vanish. Indeed, the coefficient ${C_{2,j}}$ (${C_{2,j+d}}$, respectively) is nonzero only if the real part (imaginary part, respectively) of ${\left[{f_{j}}(x)-\mathbb{E}({f_{j}}(\boldsymbol{\eta }))\right]^{2}}$ converges with the correct rate ${T^{\gamma }}$. In this case, ${C_{1,j}}$ (${C_{1,j+d}}$, respectively) is nonzero only if the corresponding element in $\tilde{\boldsymbol{\mu }}$ converges with the rate ${T^{\gamma /2}}$.
5 Image source separation
The restoration of images, which have been mixed together by some transformation, is a classical example in blind source separation (BSS), see, e.g., [26]. The image source separation is usually only considered for black and white images, where the colors can be presented using a single color parameter. Since there is only a single color parameter associated with a single pixel, the black and white image source problem can be straightforwardly formulated as a real-valued BSS problem. This does not remain true for colored photographs.
In our example, we consider colored photographs and conduct the example using the statistical software R ([32]). We start with three photographs that are of size $1536\times 2048$, $1200\times 1600$ and $960\times 720$, the figures are imported to R using the package JPEG, [37]. As a first step, we resize all of the photographs to be of equal size $960\times 720$. The resized images are stored in a $960\times 720\times 3$ array format, such that a red-green-blue color parameter triplet, denoted as $(R,G,B)$, is assigned to every pixel.
We apply a bijective cube to unit sphere transformation for every color triplet. Then, we use the well-known stereographic projection, which is an almost bijective transformation between the unit sphere and the complex plane. The stereographic projection is bijective everywhere except for the north pole of the unit sphere. For almost all photographs, we can choose the color coordinate system such that no pixel has a color triplet located on the north pole. This holds for our example and we can apply the inverse mappings in order to get back to the color cube surface.
We then have a single complex-number corresponding to every pixel of the color corrected images and we can present the images in a ${\mathbb{C}^{960\times 720}}$-matrix form. We denote the images with complex-valued elements as ${\textbf{Z}_{1}},{\textbf{Z}_{2}}$ and ${\textbf{Z}_{3}}$ and apply the following transformation,
\[ {\textbf{X}^{\top }}={\left(\begin{array}{c}\text{vec}{({\textbf{X}_{1}})^{\top }}\\ {} \text{vec}{({\textbf{X}_{2}})^{\top }}\\ {} \text{vec}{({\textbf{X}_{3}})^{\top }}\end{array}\right)^{\top }}={\left(\begin{array}{c}\text{vec}{({\textbf{Z}_{1}})^{\top }}\\ {} \text{vec}{({\textbf{Z}_{2}})^{\top }}\\ {} \text{vec}{({\textbf{Z}_{3}})^{\top }}\end{array}\right)^{\top }}\left({\textbf{I}_{3}}+\textbf{E}+i\textbf{F}\right),\]
where ${\textbf{I}_{3}}$ is the identity matrix and the elements of $\textbf{E}$ and $\textbf{F}$ were generated independently from the univariate uniform distribution $\texttt{Unif(-1/2, 1/2)}$. The complex-valued mixed images ${\textbf{X}_{1}},{\textbf{X}_{2}}$ and ${\textbf{X}_{3}}$ can then be plotted by performing the chain of inverse mappings in reverse order. The corresponding mixed images are presented in Figure 2.We then apply the unmixing procedure presented in Section 4 to the matrix $\textbf{X}$, using several different lag parameters τ. In this example, $\textbf{X}$ has three columns and 691200 rows, such that every element of the matrix is complex-valued. In controlled settings, where the true mixing matrix is known, the minimum distance index (MD) index is a straightforward way to compare the performance of different estimators, see [20] for the complex-valued formulation. The MD index is a performance index, defined between zero and one, where zero corresponds to the separation being perfect. In addition, the MD index ensures that comparison between different estimators is fair, this is especially important in this paper since under our model assumptions, different estimators do not necessarily estimate the same population quantities. The minimum distance index is based on the minimum distance between the known mixing matrix and the unmixing matrix estimate with respect to permutations, scaling and phase-shifts. While applying the AMUSE procedure, if we can find a parameter τ such that the diagonal elements of ${\hat{\boldsymbol{\Lambda }}_{\tau }}$ are sufficiently distinct (from a computational perspective), we can recover the unmixing matrix up to phase-shifts. Hereby, the choice of the mixing matrix in this example is inconsequential from the viewpoint of minimum distance index, identical values are obtained for a fixed τ with any finite mixing matrix. Consequently, in this example, minimum distance index will produce the same value if the mixing matrix was replaced with, e.g., the identity matrix.
We tried several different lag parameters and the best performance, in the light of the MD index and visual inspection, was attained with $\tau =1$ and the corresponding MD index value was approximately 0.174. The unmixed images obtained using the unmixing procedure with $\tau =1$ are presented in Figure 3. In addition, for example, lag parameter choices $\tau =2,3,961,962,963$ provided only slightly worse results, the corresponding MD index values were between 0.177 and 0.183. Note that the autocovariances are calculated from the vectorized images, and for example the first and the 961st entries are neighboring pixels in the unvectorized images.
Comparing the original color corrected images in Figure 1 and the unmixed images in Figure 3, the shapes seem to match almost perfectly. The color schemes vary since the complex phase is not uniquely fixed in our model. In addition, recall that under our model, solutions that are a phase-shift away from each other are considered to be equivalent. In Figure 4, we present the first unmixed image, produced by the unmixing procedure with $\tau =1$, under three equivalent solutions. The images in Figure 4 are obtained such that we first find an unmixing estimate $\hat{\boldsymbol{\Gamma }}$ and then right-multiply the obtained estimate with a diagonal matrix with diagonal entries $\exp (i\theta )$. In Figure 4, the left image is obtained with $\theta =\pi /4$, the middle one is obtained with $\theta =3\pi /4$ and the right image is obtained with $\theta =5\pi /4$.