Detecting independence of random vectors: generalized distance covariance and Gaussian covariance

Böttcher, Björn; Keller-Ressel, Martin; Schilling, René L.

doi:10.15559/18-VMSTA116

Volume 5, Issue 3 (2018), pp. 353–383

Björn Böttcher Martin Keller-Ressel René L. Schilling

https://doi.org/10.15559/18-VMSTA116

Pub. online: 19 September 2018 Type: Research Article

Open Access

Received
29 June 2018

Revised
23 August 2018

Accepted
30 August 2018

Published
19 September 2018

Abstract

Distance covariance is a quantity to measure the dependence of two random vectors. We show that the original concept introduced and developed by Székely, Rizzo and Bakirov can be embedded into a more general framework based on symmetric Lévy measures and the corresponding real-valued continuous negative definite functions. The Lévy measures replace the weight functions used in the original definition of distance covariance. All essential properties of distance covariance are preserved in this new framework.

From a practical point of view this allows less restrictive moment conditions on the underlying random variables and one can use other distance functions than Euclidean distance, e.g. Minkowski distance. Most importantly, it serves as the basic building block for distance multivariance, a quantity to measure and estimate dependence of multiple random vectors, which is introduced in a follow-up paper [Distance Multivariance: New dependence measures for random vectors (submitted). Revised version of arXiv: 1711.07775v1] to the present article.

1 Introduction

The concept of distance covariance was introduced by Székely, Rizzo and Bakirov [37] as a measure of dependence between two random vectors of arbitrary dimensions. Their starting point is to consider a weighted ${L}^{2}$-integral of the difference of the (joint) characteristic functions ${f_{X}},{f_{Y}}$ and ${f_{(X,Y)}}$ of the (${\mathbb{R}}^{m}$- and ${\mathbb{R}}^{n}$-valued) random variables X, Y and $(X,Y)$,

(1)

\[ {\mathcal{V}}^{2}(X,Y;w)={\iint _{{\mathbb{R}}^{m+n}}}|{f_{(X,Y)}}(s,t)-{f_{X}}(s){f_{Y}}(t){|}^{2}\hspace{0.1667em}w(s,t)\hspace{0.1667em}\mathrm{d}s\hspace{0.1667em}\mathrm{d}t.\]

The weight w is given by $w(s,t):={c_{\alpha ,m}}|s{|}^{-m-\alpha }{c_{\alpha ,n}}|t{|}^{-n-\alpha }$ for $\alpha \in (0,2)$.

We are going to embed this into a more general framework. In order to illustrate the new features of our approach we need to recall some results on distance covariance. Among several other interesting properties, [37] shows that distance covariance characterizes independence, in the sense that ${V}^{2}(X,Y;w)=0$ if, and only if, X and Y are independent. Moreover, they show that in the case $\alpha =1$ the distance covariance ${}^{N}{V}^{2}(X,Y;w)$ of the empirical distributions of two samples $({x_{1}},{x_{2}},\dots ,{x_{N}})$ and $({y_{1}},{y_{2}},\dots ,{y_{N}})$ takes a surprisingly simple form. It can be represented as

(2)

\[ {}^{N}{V}^{2}(X,Y;w)=\frac{1}{{N}^{2}}{\sum \limits_{k,l=1}^{N}}{A_{kl}}{B_{kl}},\]

where A and B are double centrings (cf. Lemma 4.2) of the Euclidean distance matrices of the samples, i.e. of ${(|{x_{k}}-{x_{l}}|)}_{k,l=1,\dots ,N}$ and ${(|{y_{k}}-{y_{l}}|)}_{k,l=1,\dots ,N}$. If $\alpha \ne 1$ then Euclidean distance has to be replaced by its power with exponent α. The connection between the weight function w in (1) and the (centred) Euclidean distance matrices in (2) is given by the Lévy–Khintchine representation of negative definite functions, i.e.

\[ |x{|}^{\alpha }={c_{p}}{\int _{{\mathbb{R}}^{m}\setminus \{0\}}}(1-\cos s\cdot x)\hspace{0.1667em}\frac{\mathrm{d}s}{|s{|}^{m+\alpha }},\hspace{1em}x\in {\mathbb{R}}^{m},\]

where ${c_{p}}$ is a suitable constant, cf. Section 2.1, Table 1. Finally, the representation (2) of ${}^{N}{V}^{2}(X,Y;w)$ and its asymptotic properties as $N\to \infty $ are used by Székely, Rizzo and Bakirov to develop a statistical test for independence in [37].

Yet another interesting representation of distance covariance is given in the follow-up paper [34]: Let $({X_{\text{cop}}},{Y_{\text{cop}}})$ be an independent copy of $(X,Y)$ and let W and ${W^{\prime }}$ be Brownian random fields on ${\mathbb{R}}^{m}$ and ${\mathbb{R}}^{n}$, independent from each other and from $X,Y,{X_{\text{cop}}},{Y_{\text{cop}}}$. The paper [34] defines the Brownian covariance

(3)

\[ {\mathcal{W}}^{2}(X,Y)=\mathbb{E}\big[{X}^{W}{X_{\text{cop}}^{W}}{Y}^{{W^{\prime }}}{Y_{\text{cop}}^{{W^{\prime }}}}\big],\]

where ${X}^{W}:=W(X)-\mathbb{E}[W(X)\mid W]$ for any random variable X and random field W with matching dimensions. Surprisingly, as shown in [34], Brownian covariance coincides with distance covariance, i.e. ${\mathcal{W}}^{2}(X,Y)={\mathcal{V}}^{2}(X,Y;w)$ when $\alpha =1$ is chosen for the kernel w.

The paper [34] was accompanied by a series of discussion papers [25, 6, 22, 11, 15, 18, 26, 17, 35] where various extensions, applications and open questions were suggested. Let us highlight the three problems which we are going to address:

a) Can the weight function w in (1) be replaced by other weight functions? (Cf. [15, 18])
b) Can the Euclidean distance (or its α-power) in (2) be replaced by other distances? (Cf. [22, 18, 23])
c) Can the Brownian random fields $W,{W^{\prime }}$ in (3) be replaced by other random fields? (Cf. [22, 26])

While insights and partial results on these questions can be found in all of the mentioned discussion papers, a definitive and unifying answer was missing for a long time. In the present paper we propose a generalization of distance covariance which resolves these closely related questions. In a follow-up paper [9] we extend our results to the detection of independence of d random variables $({X}^{1},{X}^{2},\dots ,{X}^{d})$, answering a question of [15, 1].

More precisely, we introduce in Definition 3.1 the generalized distance covariance

\[ {V}^{2}(X,Y)={\int _{{\mathbb{R}}^{n}}}{\int _{{\mathbb{R}}^{m}}}|{f_{(X,Y)}}(s,t)-{f_{X}}(s){f_{Y}}(t){|}^{2}\hspace{0.1667em}\mu (\mathrm{d}s)\hspace{0.1667em}\nu (\mathrm{d}t),\]

where μ and ν are symmetric Lévy measures, as a natural extension of distance covariance of Székely et al. [37]. The Lévy measures μ and ν are linked to negative definite functions Φ and Ψ by the well-known Lévy–Khintchine representation, cf. Section 2 where examples and important properties of negative definite functions are discussed. In Section 3 we show that several different representations (related to [23]) of ${V}^{2}(X,Y)$ in terms of the functions Φ and Ψ can be given. In Section 4 we turn to the finite-sample properties of generalized distance covariance and show that the representation (2) of ${}^{N}{V}^{2}(X,Y)$ remains valid, with the Euclidean distance matrices replaced by the matrices

\[ {\big(\varPhi ({x_{k}}-{x_{l}})\big)}_{k,l=1,\dots ,N}\hspace{1em}\text{and}\hspace{1em}{\big(\varPsi ({y_{k}}-{y_{l}})\big)}_{k,l=1,\dots ,N}.\]

We also show asymptotic properties of ${}^{N}{V}^{2}(X,Y)$ as N tends to infinity, paralleling those of [34, 36] for Euclidean distance covariance. After some remarks on uniqueness and normalization, we show in Section 7 that the representation (3) remains also valid, when the Brownian random fields W and ${W^{\prime }}$ are replaced by centered Gaussian random fields ${G_{\varPhi }}$ and ${G_{\varPsi }}$ with covariance kernel

\[ \mathbb{E}\big[{G_{\varPhi }}(x){G_{\varPhi }}\big({x^{\prime }}\big)\big]=\varPhi (x)+\varPhi \big({x^{\prime }}\big)-\varPhi \big(x-{x^{\prime }}\big)\]

and analogously for ${G_{\varPsi }}$.

To use generalized distance covariance (and distance multivariance) in applications all necessary functions and tests are provided in the R package multivariance [8]. Extensive examples and simulations can be found in [7], therefore we concentrate in the current paper on the theoretical foundations.

Notation.

Most of our notation is standard or self-explanatory. Throughout we use positive (and negative) in the non-strict sense, i.e. $x\ge 0$ (resp. $x\le 0$) and we write $a\vee b=\max \{a,b\}$ and $a\wedge b=\min \{a,b\}$ for the maximum and minimum. For a vector $x\in {\mathbb{R}}^{d}$ the Euclidean norm is denoted by $|x|$.

2 Fundamental results

In this section we collect some tools and concepts which will be needed in the sequel.

2.1 Negative definite functions

A function $\varTheta :{\mathbb{R}}^{d}\to \mathbb{C}$ is called negative definite (in the sense of Schoenberg) if the matrix ${(\varTheta ({x_{i}})+\overline{\varTheta ({x_{j}})}-\varTheta ({x_{i}}-{x_{j}}))}_{i,j}\in {\mathbb{C}}^{m\times m}$ is positive semidefinite hermitian for every $m\in \mathbb{N}$ and ${x_{1}},\dots ,{x_{m}}\in {\mathbb{R}}^{d}$. It is not hard to see, cf. Berg & Forst [3] or Jacob [19], that this is equivalent to saying that $\varTheta (0)\ge 0$, $\varTheta (-x)=\overline{\varTheta (x)}$ and the matrix ${(-\varTheta ({x_{i}}-{x_{j}}))}_{i,j}\in {\mathbb{C}}^{m\times m}$ is conditionally positive definite, i.e.

\[ {\sum \limits_{i,j=1}^{m}}\big[-\varTheta ({x_{i}}-{x_{j}})\big]{\lambda _{i}}{\bar{\lambda }_{j}}\ge 0\hspace{1em}\forall {\lambda _{1}},\dots ,{\lambda _{m}}\in \mathbb{C}\hspace{2.5pt}\text{such that}\hspace{2.5pt}{\sum \limits_{k=1}^{m}}{\lambda _{k}}=0.\]

Because of this equivalence, the function $-\varTheta $ is also called conditionally positive definite (and some authors call Θ conditionally negative definite).

Negative definite functions appear naturally in several contexts, for instance in probability theory as characteristic exponents (i.e. logarithms of characteristic functions) of infinitely divisible laws or Lévy processes, cf. Sato [28] or [10], in harmonic analysis in connection with non-local operators, cf. Berg & Forst [3] or Jacob [19] and in geometry when it comes to characterize certain metrics in Euclidean spaces, cf. Benyamini & Lindenstrauss [2].

The following theorem, compiled from Berg & Forst [3, Sec. 7, pp. 39–48, Thm. 10.8, p. 75] and Jacob [19, Sec. 3.6–7, pp. 120–155], summarizes some basic equivalences and connections.

Theorem 2.1.

For a function $\varTheta :{\mathbb{R}}^{d}\to \mathbb{C}$ with $\varTheta (0)=0$ the following assertions are equivalent

a) Θ is negative definite.
b) $-\varTheta $ is conditionally positive definite.
c) ${\mathrm{e}}^{-t\varTheta }$ is positive definite for every $t>0$.
d) ${t}^{-1}(1-{\mathrm{e}}^{-t\varTheta })$ is negative definite for every $t>0$.

If Θ is continuous, the assertions a)–d) are also equivalent to

e) Θ has the following integral representation

(4)
\[\begin{aligned}{}\varTheta (x)& =\mathrm{i}l\cdot x+\frac{1}{2}x\cdot Qx\\{} & \hspace{1em}+{\int _{{\mathbb{R}}^{d}\setminus \{0\}}}\big(1-{\mathrm{e}}^{\mathrm{i}x\cdot r}+\mathrm{i}x\cdot r{\mathbb{1}_{(0,1)}}(|r|)\big)\hspace{0.1667em}\rho (\mathrm{d}r),\end{aligned}\]
where $l\in {\mathbb{R}}^{d}$, $Q\in {\mathbb{R}}^{d\times d}$ is symmetric and positive semidefinite and ρ is a measure on ${\mathbb{R}}^{d}\setminus \{0\}$ such that ${\int _{{\mathbb{R}}^{d}\setminus \{0\}}}(1\wedge |s{|}^{2})\hspace{0.1667em}\rho (\mathrm{d}s)<\infty $.

We will frequently use the abbreviation cndf instead of continuous negative definite function. The representation (4) is the Lévy–Khintchine formula and any measure ρ satisfying

(5)

\[ \rho \hspace{5pt}\text{is a measure on}\hspace{2.5pt}{\mathbb{R}}^{d}\setminus \{0\}\hspace{2.5pt}\text{such that}\hspace{5pt}{\int _{{\mathbb{R}}^{d}\setminus \{0\}}}\big(1\wedge |r{|}^{2}\big)\hspace{0.1667em}\rho (\mathrm{d}r)<\infty \]

is commonly called Lévy measure. To keep notation simple, we will write $\int \cdots \rho (\mathrm{d}r)$ or ${\int _{{\mathbb{R}}^{d}}}\cdots \rho (\mathrm{d}r)$ instead of the more precise ${\int _{{\mathbb{R}}^{d}\setminus \{0\}}}\cdots \rho (\mathrm{d}r)$.

The triplet $(l,Q,\rho )$ uniquely determines Θ; moreover Θ is real (hence, positive) if, and only if, $l=0$ and ρ is symmetric, i.e. $\rho (B)=\rho (-B)$ for any Borel set $B\subset {\mathbb{R}}^{d}\setminus \{0\}$. In this case (4) becomes

(6)

\[ \varTheta (x)=\frac{1}{2}x\cdot Qx+{\int _{{\mathbb{R}}^{d}}}(1-\cos x\cdot r)\hspace{0.1667em}\rho (\mathrm{d}r).\]

Using the representation (4) it is straightforward to see that we have ${\sup _{x}}|\varTheta (x)|<\infty $ if ρ is a finite measure, i.e., $\rho ({\mathbb{R}}^{d}\setminus \{0\})<\infty $, and $Q=0$. The converse is also true, see [29, pp. 1390–1391, Lem. 6.2].

Table 1 contains some examples of continuous negative definite functions along with the corresponding Lévy measures and infinitely divisible laws.

Table 1.

Some real-valued continuous negative definite functions (cndfs) on ${\mathbb{R}}^{d}$ and the corresponding Lévy measures and infinitely divisible distributions (IDD)

cndf $\Theta (x)$	Lévy measure $\rho (\mathrm{d}r)$	IDD
$\int \left(1-\cos x\cdot r\right)\hspace{0.1667em}\rho (\mathrm{d}r)$	ρ finite measure	compound Poisson (CP)
$\frac{\|x{\|}^{2}}{{\lambda }^{2}+\|x{\|}^{2}}$, $\lambda >0$, $x\in \mathbb{R}$	$2{\lambda }^{-1}{\mathrm{e}}^{-\lambda \|r\|}\hspace{0.1667em}\mathrm{d}r$	CP with exponential (1-d)
$\frac{\|x{\|}^{2}}{{\lambda }^{2}+\|x{\|}^{2}}$, $\lambda >0$, $x\in {\mathbb{R}}^{d}$	${\int _{0}^{\infty }}{\mathrm{e}}^{{\lambda }^{2}u-\frac{\|r{\|}^{2}}{4u}}\hspace{0.1667em}\frac{{\lambda }^{2}\hspace{0.1667em}\mathrm{d}u}{{(4\pi u)}^{d/2}}\hspace{0.1667em}\mathrm{d}r$	CP, cf. [29, Lem. 6.1]
$1-{\mathrm{e}}^{-\frac{1}{2}\|x{\|}^{2}}$	${(2\pi )}^{-d/2}{\mathrm{e}}^{-\frac{1}{2}\|r{\|}^{2}}\hspace{0.1667em}\mathrm{d}r$	CP with normal
$\|x\|$	$\frac{\Gamma \left(\frac{1+d}{2}\right)}{{\pi }^{\frac{1+d}{2}}}\frac{\mathrm{d}r}{\|r{\|}^{d+1}}$	Cauchy
$\frac{1}{2}\|x{\|}^{2}$	no Lévy measure	normal
$\|x{\|}^{\alpha },\hspace{2.5pt}\alpha \in (0,2)$	$\frac{\alpha {2}^{\alpha -1}\Gamma \left(\frac{\alpha +d}{2}\right)}{{\pi }^{d/2}\Gamma \left(1-\alpha 2\right)}\frac{\mathrm{d}r}{\|r{\|}^{d+\alpha }}$	α-stable
$\sqrt[p]{{\sum _{k=1}^{d}}\|{x_{k}}{\|}^{p}},p\in [1,2]$	see Lemma 2.2	see Lemma 2.2
$\ln (1+\frac{{x}^{2}}{2})$, $x\in \mathbb{R}$	$\frac{1}{\|r\|}{\mathrm{e}}^{-\sqrt{\frac{1}{2}}\|r\|}\hspace{0.1667em}\mathrm{d}r$	variance Gamma (1-d)
$\ln \cosh (x)$, $x\in \mathbb{R}$	$\frac{\mathrm{d}r}{2r\sinh (\pi r/2)}$	Meixner (1-d)
$\|x{\|}^{\alpha }+\|x{\|}^{\beta },\hspace{2.5pt}\alpha ,\beta \in (0,2)$		mixture of stable
${(1+\|x{\|}^{\alpha })}^{\frac{1}{\beta }}-1$, $\alpha \in (0,2)$, $\beta \ge \frac{\alpha }{2}$		relativistic stable

A measure ρ on a topological space X is said to have full (topological) support, if $\mu (G)>0$ for any open set $G\subset X$; for Lévy measures we have $X={\mathbb{R}}^{d}\setminus \{0\}$.

Lemma 2.2.

Let $p\in [1,2]$. The Minkowski distance function

\[ {\ell _{p}}(x):={\big(|{x_{1}}{|}^{p}+\cdots +|{x_{d}}{|}^{p}\big)}^{1/p},\hspace{1em}x=({x_{1}},\dots ,{x_{d}})\in {\mathbb{R}}^{d}\]

is a continuous negative definite function on ${\mathbb{R}}^{d}$. If $p\in (1,2]$, the Lévy measure has full support.

It is interesting to note that the Minkowski distances for $p>2$ and $d\ge 2$ are never negative definite functions. This is the consequence of Schoenberg’s problem, cf. Zastavnyi [40, p. 56, Eq. (3)].

Proof of Lemma 2.2.

Since each ${x_{i}}\mapsto |{x_{i}}{|}^{p}$, with $p\in [1,2]$, is a one-dimensional continuous negative definite function, we can use the formula (6) to see that

\[ {\ell _{p}^{p}}(x)=\left\{\begin{array}{l@{\hskip10.0pt}l}{\displaystyle \int _{{\mathbb{R}}^{d}}}(1-\cos x\cdot r)\hspace{0.1667em}{\displaystyle \sum \limits_{i=1}^{d}}\displaystyle \frac{{c_{p}}\hspace{0.1667em}\mathrm{d}{x_{i}}}{|{x_{i}}{|}^{1+p}}\otimes {\delta _{0}}(\mathrm{d}{x_{(i)}}),\hspace{1em}& \text{if}\hspace{5pt}p\in [1,2),\\{} x\cdot x,\hspace{1em}& \text{if}\hspace{5pt}p=2,\end{array}\right.\]

where ${x_{(i)}}=({x_{1}},\dots ,{x_{i-1}},{x_{i+1}},\dots {x_{d}})\in {\mathbb{R}}^{d-1}$ and ${c_{p}}=\frac{p{2}^{p-1}\varGamma (\frac{p+1}{2})}{{\pi }^{1/2}\varGamma (1-\frac{p}{2})}$ is the constant of the one-dimensional p-stable Lévy measure, cf. Table 1.

This means that ${\ell _{p}^{p}}$ is itself a continuous negative definite function, but its Lévy measure is concentrated on the coordinate axes. Writing ${\ell _{p}}(x)={f_{p}}({\ell _{p}^{p}}(x))$ with

\[ {f_{p}}(\tau )={\tau }^{1/p}={\gamma _{1/p}}{\int _{0}^{\infty }}\big(1-{e}^{-\tau t}\big)\hspace{0.1667em}\frac{\mathrm{d}t}{{t}^{1+1/p}},\hspace{1em}\frac{1}{p}\in \big[\frac{1}{2},1\big],\hspace{0.2778em}{\gamma _{1/p}}=\frac{1}{p\varGamma (1-\frac{1}{p})},\]

shows that ${\ell _{p}}$ can be represented as a combination of the Bernstein function ${f_{p}}$ and the negative definite function ${\ell _{p}^{p}}$. In other words, ${\ell _{p}}$ is subordinate to ${\ell _{p}^{p}}$ in the sense of Bochner (cf. Sato [28, Chap. 30] or [30, Chap. 5, Chap. 13.1]) and it is possible to find the corresponding Lévy–Khintchine representation, cf. [28, Thm. 30.1]. We have

\[ {\ell _{p}}(x)=\left\{\begin{array}{l@{\hskip10.0pt}l}{\displaystyle \int _{{\mathbb{R}}^{d}}}(1-\cos x\cdot r)\hspace{0.1667em}{\displaystyle \sum \limits_{i=1}^{d}}\displaystyle \frac{{c_{p}}\hspace{0.1667em}\mathrm{d}{x_{i}}}{|{x_{i}}{|}^{2}}\otimes {\delta _{0}}(\mathrm{d}{x_{(i)}}),\hspace{1em}& \text{if}\hspace{5pt}p=1,\\{} {\displaystyle \int _{{\mathbb{R}}^{d}}}(1-\cos x\cdot r)\hspace{0.1667em}{\displaystyle \int _{0}^{\infty }}{\displaystyle \prod \limits_{i=1}^{d}}{g_{t}}({x_{i}})\hspace{0.1667em}\displaystyle \frac{\mathrm{d}t}{{t}^{1+1/p}},\hspace{1em}& \text{if}\hspace{5pt}p\in (1,2),\\{} \sqrt{x\cdot x},\hspace{1em}& \text{if}\hspace{5pt}p=2,\end{array}\right.\]

where ${x_{i}}\mapsto {g_{t}}({x_{i}})$ is the probability density of the random variable ${t}^{p}X$ where X is a one-dimensional, symmetric $1/p$-stable random variable.

Although the $1/p$-stable density is known explicitly only for $1/p\in \{1,2\}$, one can show – this follows, e.g. from [28, Thm. 15.10] – that it is strictly positive, i.e. the Lévy measure of ${\ell _{p}}$, $p\in (1,2)$ has full support. For $p=1$ the measure does not have full support, since it is concentrated on the coordinate axes. For $p=2$, note that ${\ell _{2}}(x)=|x|$ corresponds to the Cauchy distribution with Lévy measure given in Table 1, which has full support. □

Using the Lévy–Khintchine representation (6) it is not hard to see, cf. [19, Lem. 3.6.21], that square roots of real-valued cndfs are subadditive, i.e.

(7)

\[ \sqrt{\varTheta (x+y)}\le \sqrt{\varTheta (x)}+\sqrt{\varTheta (y)},\hspace{1em}x,y\in {\mathbb{R}}^{d}\]

and, consequently,

(8)

\[ \varTheta (x+y)\le 2\big(\varTheta (x)+\varTheta (y)\big),\hspace{1em}x,y\in {\mathbb{R}}^{d}.\]

Using a standard argument, e.g. [10, p. 44], we can derive from (7), (8) that cndfs grow at most quadratically as $x\to \infty $,

(9)

\[ \varTheta (x)\le 2\underset{|y|\le 1}{\sup }\varTheta (y)\big(1+|x{|}^{2}\big).\]

We will assume that $\varTheta (0)=0$ is the only zero of the function Θ – incidentally, this means that $x\mapsto {\mathrm{e}}^{-\varTheta (x)}$ is the characteristic function of a(n infinitely divisible) random variable the distribution of which is non-lattice. This and (7) show that $(x,y)\mapsto \sqrt{\varTheta (x-y)}$ is a metric on ${\mathbb{R}}^{d}$ and $(x,y)\mapsto \varTheta (x-y)$ is a quasi-metric, i.e. a function which enjoys all properties of a metric, but the triangle inequality holds with a multiplicative constant $c>1$. Metric measure spaces of this type have been investigated by Jacob et al. [20]. Historically, the notion of negative definiteness has been introduced by I.J. Schoenberg [31] in a geometric context: he observed that for a real-valued cndf Θ the function ${d_{\varTheta }}(x,y):=\sqrt{\varTheta (x-y)}$ is a metric on ${\mathbb{R}}^{d}$ and that these are the only metrics such that $({\mathbb{R}}^{d},{d_{\varTheta }})$ can be isometrically embedded into a Hilbert space. In other words: ${d_{\varTheta }}$ behaves like a standard Euclidean metric in a possibly infinite-dimensional space.

2.2 Measuring independence of random variables with metrics

Let $X,Y$ be random variables with values in ${\mathbb{R}}^{m}$ and ${\mathbb{R}}^{n}$, respectively, and write $\mathcal{L}(X)$ and $\mathcal{L}(Y)$ for the corresponding probability laws. For any metric $d(\cdot ,\cdot )$ defined on the family of $(m+n)$-dimensional probability distributions we have

(10)

\[ d\big(\mathcal{L}(X,Y),\mathcal{L}(X)\otimes \mathcal{L}(Y)\big)=0\hspace{1em}\text{if, and only if,}\hspace{2.5pt}X\text{,}\hspace{2.5pt}Y\hspace{2.5pt}\text{are independent}.\]

This equivalence can obviously be extended to finitely many random variables ${X_{i}}$, $i=1,\dots ,n$, taking values in ${\mathbb{R}}^{{d_{i}}}$, respectively: Set $d:={d_{1}}+\cdots +{d_{n}}$, take any metric $d(\cdot ,\cdot )$ on the d-dimensional probability distributions and consider $d(\mathcal{L}({X_{1}},\dots ,{X_{n}}),{\bigotimes _{i=1}^{n}}\mathcal{L}({X_{i}}))$. Moreover, the random variables ${X_{i}},i=1,\dots ,n$, are independent if, and only if, $({X_{1}},\dots ,{X_{k-1}})$ and ${X_{k}}$ are independent for all $2\le k\le n$.2 In other words: ${X_{1}},\dots ,{X_{n}}$ are independent if, and only if, for metrics on the ${d_{1}}+\cdots +{d_{k}}$-dimensional probability distributions the distance of $\mathcal{L}({X_{1}},\dots ,{X_{k}})$ and $\mathcal{L}({X_{1}},\dots ,{X_{k-1}})\otimes \mathcal{L}({X_{k}})$ is zero for $k=2,\dots ,n$. Thus, as in (10), only the concept of independence of pairs of random variables is needed. In [9, Sec. 3.1] we use a variant of this idea to characterize multivariate independence.

Thus (10) is a good starting point for the construction of (new) estimators for independence. For this it is crucial that the (empirical) distance be computationally feasible. For discrete distributions with finitely many values this yields the classical chi-squared test of independence (using the ${\chi }^{2}$-distance). For more general distributions other commonly used distances (e.g. relative entropy, Hellinger distance, total variation, Prokhorov distance, Wasserstein distance) might be employed (e.g. [4]), provided that they are computationally feasible. It turns out that the latter is, in particular, satisfied by the following distance.

Definition 2.3.

Let $U,V$ be d-dimensional random variables and denote by ${f_{U}}$, ${f_{V}}$ their characteristic functions. For any symmetric measure ρ on ${\mathbb{R}}^{d}\setminus \{0\}$ with full support we define the distance

(11)

\[ {d_{\rho }}\big(\mathcal{L}(U),\mathcal{L}(V)\big):=\| {f_{U}}-{f_{V}}{\| _{{L}^{2}(\rho )}}={\bigg(\int |{f_{U}}(r)-{f_{V}}(r){|}^{2}\hspace{0.1667em}\rho (\mathrm{d}r)\bigg)}^{1/2}.\]

The assumption that ρ has full support, i.e. $\rho (G)>0$ for every nonempty open set $G\subset {\mathbb{R}}^{d}\setminus \{0\}$, ensures that ${d_{\rho }}(\mathcal{L}(U),\mathcal{L}(V))=0$ if, and only if, $\mathcal{L}(U)=\mathcal{L}(V)$, hence ${d_{\rho }}(\mathcal{L}(U),\mathcal{L}(V))$ is a metric. The symmetry assumption on ρ is not essential since the integrand appearing in (11) is even; therefore, we can always replace $\rho (\mathrm{d}r)$ by its symmetrization $\frac{1}{2}(\rho (\mathrm{d}r)+\rho (-\mathrm{d}r))$.

Currently it is unknown how the fact that the Lévy measure ρ has full support can be expressed in terms of the cndf $\varTheta (u)=\int (1-\cos u\cdot r)\hspace{0.1667em}\rho (\mathrm{d}r)$ given by ρ, see (6).

Note that ${d_{\rho }}(\mathcal{L}(U),\mathcal{L}(V))$ is always well-defined in $[0,\infty ]$. Any of the following conditions ensure that ${d_{\rho }}(\mathcal{L}(U),\mathcal{L}(V))$ is finite:

a) ρ is a finite measure;
b) ρ is a symmetric Lévy measure (cf. (5)) and $\mathbb{E}|U|+\mathbb{E}|V|<\infty $.

Indeed, ${d_{\rho }}(\mathcal{L}(U),\mathcal{L}(V))<\infty $ follows from the integrability properties (5) of the Lévy measure ρ and the elementary estimates
\[\begin{array}{l}\displaystyle |{f_{U}}(r)-{f_{V}}(r)|\le \mathbb{E}|{e}^{\mathrm{i}r\cdot U}-{e}^{\mathrm{i}r\cdot V}|\le \mathbb{E}|r\cdot (U-V)|\le |r|\cdot \mathbb{E}|U-V|\\{} \displaystyle \text{and}\hspace{1em}|{f_{U}}(r)-{f_{V}}(r)|\le 2.\end{array}\]

We obtain further sufficient conditions for $V(X,Y)<\infty $ in terms of moments of the real-valued cndf Θ, see (6), whose Lévy measure is ρ and with $Q=0$.

Proposition 2.4.

Let ρ be a symmetric Lévy measure on ${\mathbb{R}}^{d}\setminus \{0\}$ with full support and denote by $\varTheta (u)=\int (1-\cos u\cdot r)\hspace{0.1667em}\rho (\mathrm{d}r)$, $u\in {\mathbb{R}}^{d}$, the real-valued cndf with Lévy triplet $(l=0,Q=0,\rho )$. For all d-dimensional random variables $U,V$ the following assertions hold:

a) Assume that $({U^{\prime }},{V^{\prime }})$ is an i.i.d. copy of $(U,V)$. Then

(12)
\[ \mathbb{E}\varTheta \big(U-{U^{\prime }}\big)+\mathbb{E}\varTheta \big(V-{V^{\prime }}\big)\le 2\mathbb{E}\varTheta \big(U-{V^{\prime }}\big).\]
b) Let ${U^{\prime }}$ be an i.i.d. copy of U. Then

(13)
\[ \mathbb{E}\varTheta (U-V)\le 2\big(\mathbb{E}\varTheta (U)+\mathbb{E}\varTheta (V)\big)\]
and for $V={U^{\prime }}$ one has $\mathbb{E}\varTheta (U-{U^{\prime }})\le 4\mathbb{E}\varTheta (U)$.
c) In $[0,\infty ]$ we always have

(14)
\[ {d_{\rho }^{2}}\big(\mathcal{L}(U),\mathcal{L}(V)\big)\le 4\big(\mathbb{E}\varTheta (U)+\mathbb{E}\varTheta (V)\big).\]
d) Let $({U^{\prime }},{V^{\prime }})$ be an i.i.d. copy of $(U,V)$ and assume $\mathbb{E}\varTheta (U)+\mathbb{E}\varTheta (V)<\infty $. Then the following equality holds and all terms are finite

(15)
\[ {d_{\rho }^{2}}\big(\mathcal{L}(U),\mathcal{L}(V)\big)=2\mathbb{E}\varTheta \big(U-{V^{\prime }}\big)-\mathbb{E}\varTheta \big(U-{U^{\prime }}\big)-\mathbb{E}\varTheta \big(V-{V^{\prime }}\big).\]

Proof.

Let us assume first that ρ is a finite Lévy measure, thus, Θ a bounded cndf. We denote by ${U^{\prime }}$ an i.i.d. copy of U. Since $\mathcal{L}(U-{U^{\prime }})$ is symmetric, we can use Tonelli’s theorem to get

(16)

\[\begin{aligned}{}\int \big(1-{f_{U}}(r)\overline{{f_{U}}(r)}\big)\rho (\mathrm{d}r)& =\int \big(1-\mathbb{E}\big({\mathrm{e}}^{\mathrm{i}U\cdot r}{\mathrm{e}}^{-\mathrm{i}{U^{\prime }}\cdot r}\big)\big)\rho (\mathrm{d}r)\\{} & =\int \mathbb{E}\big(1-\cos \big[\big(U-{U^{\prime }}\big)\cdot r\big]\big)\hspace{0.1667em}\rho (\mathrm{d}r)\\{} & =\mathbb{E}\varTheta \big(U-{U^{\prime }}\big)\\{} & =\int \varTheta \big(u-{u^{\prime }}\big)\hspace{0.1667em}{\mathbb{P}_{U}}\otimes {\mathbb{P}_{U}}\big(\mathrm{d}u,\mathrm{d}{u^{\prime }}\big).\end{aligned}\]

Now we consider an i.i.d. copy $({U^{\prime }},{V^{\prime }})$ of $(U,V)$ and use the above equality in (11). This yields

(17)

\[\begin{aligned}{}& {d_{\rho }^{2}}\big(\mathcal{L}(U),\mathcal{L}(V)\big)\\{} & =\int \big({f_{U}}(r)\overline{{f_{U}}(r)}-{f_{U}}(r)\overline{{f_{V}}(r)}-{f_{V}}(r)\overline{{f_{U}}(r)}+{f_{V}}(r)\overline{{f_{V}}(r)}\big)\rho (\mathrm{d}r)\\{} & =\int \big({f_{U}}(r)\overline{{f_{U}}(r)}-1+2-2\big({f_{U}}(r)\overline{{f_{V}}(r)}\big)+{f_{V}}(r)\overline{{f_{V}}(r)}-1\big)\rho (\mathrm{d}r)\\{} & =2\mathbb{E}\varTheta \big(U-{V^{\prime }}\big)-\mathbb{E}\varTheta \big(U-{U^{\prime }}\big)-\mathbb{E}\varTheta \big(V-{V^{\prime }}\big).\end{aligned}\]

This proves (15) and, since ${d_{\rho }}(\mathcal{L}(U),\mathcal{L}(V))\ge 0$, also (12). Combining Part b) and (15) yields (14), while (13) immediately follows from the subadditivity of a cndf (8).

If ρ is an arbitrary Lévy measure, its truncation ${\rho _{\epsilon }}(\mathrm{d}r):={\mathbb{1}_{(\epsilon ,\infty )}}(|r|)\hspace{0.1667em}\rho (\mathrm{d}r)$ is a finite Lévy measure and the corresponding cndf ${\varTheta _{\epsilon }}$ is bounded. In particular, we have a)–d) for ${\rho _{\epsilon }}$ and ${\varTheta _{\epsilon }}$. Using monotone convergence we get

\[ \varTheta (u)=\underset{\epsilon >0}{\sup }{\varTheta _{\epsilon }}(u)=\underset{\epsilon >0}{\sup }{\int _{|r|>\epsilon }}(1-\cos u\cdot r)\hspace{0.1667em}\rho (\mathrm{d}r).\]

Again by monotone convergence we see that the assertions a)–c) remain valid for general Lévy measures – if we allow the expressions to attain values in $[0,\infty ]$. Because of (13), the moment condition assumed in Part d) ensures that the limits

\[ \underset{\epsilon \to 0}{\lim }\mathbb{E}\varTheta \big(U-{V^{\prime }}\big)=\underset{\epsilon >0}{\sup }\mathbb{E}\varTheta \big(U-{V^{\prime }}\big)\hspace{1em}\text{etc.}\]

are finite, and (15) carries over to the general situation. □

Remark 2.5.

a) Since U and V play symmetric roles in (17) it is clear that

(18)

\[\begin{aligned}{}& {d_{\rho }^{2}}\big(\mathcal{L}(U),\mathcal{L}(V)\big)\\{} & =\mathbb{E}\varTheta \big(U-{V^{\prime }}\big)+\mathbb{E}\varTheta \big(V-{U^{\prime }}\big)-\mathbb{E}\varTheta \big(U-{U^{\prime }}\big)-\mathbb{E}\varTheta \big(V-{V^{\prime }}\big)\\{} & =\int \varTheta \big(u-{u^{\prime }}\big)\hspace{0.1667em}(2{\mathbb{P}_{U}}\otimes {\mathbb{P}_{V}}-{\mathbb{P}_{U}}\otimes {\mathbb{P}_{U}}-{\mathbb{P}_{V}}\otimes {\mathbb{P}_{V}})\big(\mathrm{d}u,\mathrm{d}{u^{\prime }}\big).\end{aligned}\]

b) While ${d_{\rho }}(\mathcal{L}(U),\mathcal{L}(V))\in [0,\infty ]$ is always defined, the right-hand side of (15) needs attention. If we do not assume the moment condition $\mathbb{E}\varTheta (U)+\mathbb{E}\varTheta (V)<\infty $, we still have

(19)

\[ {d_{\rho }^{2}}\big(\mathcal{L}(U),\mathcal{L}(V)\big)=\underset{\epsilon \to 0}{\lim }\big(2\mathbb{E}{\varTheta _{\epsilon }}\big(U-{V^{\prime }}\big)-{\varTheta _{\epsilon }}\big(U-{U^{\prime }}\big)-{\varTheta _{\epsilon }}\big(V-{V^{\prime }}\big)\big),\]

but it is not clear whether the limits exist for each term.

The moment condition $\mathbb{E}\varTheta (U)+\mathbb{E}\varTheta (V)<\infty $ is sharp in the sense that it follows from $\mathbb{E}\varTheta (U-{V^{\prime }})<\infty $: Since U and ${V^{\prime }}$ are independent, Tonelli’s theorem entails that $\mathbb{E}\varTheta (u-{V^{\prime }})<\infty $ for some $u\in {\mathbb{R}}^{d}$. Using the symmetry and sub-additivity of $\sqrt{\varTheta }$, see (8), we get $\varTheta ({V^{\prime }})\le 2(\varTheta (u-{V^{\prime }})+\varTheta (u))$, i.e. $\mathbb{E}\varTheta ({V^{\prime }})<\infty $; $\mathbb{E}\varTheta (U)<\infty $ follows in a similar fashion.

c) Since a cndf Θ grows at most quadratically at infinity, see (9), it is clear that

(20)

\[ \mathbb{E}\big(|U{|}^{2}\big)+\mathbb{E}\big(|V{|}^{2}\big)<\infty \hspace{1em}\text{implies}\hspace{1em}\mathbb{E}\varTheta (U)+\mathbb{E}\varTheta (V)<\infty .\]

One should compare this to the condition $\mathbb{E}|U|+\mathbb{E}|V|<\infty $ which ensures the finiteness of ${d_{\rho }^{2}}(\mathcal{L}(U),\mathcal{L}(V))$, but not necessarily the finiteness of the terms appearing in the representation (15).

d) As described at the beginning of this section, a measure of independence of ${X_{1}},\dots ,{X_{n}}$ is given by ${d_{\rho }}(\mathcal{L}({X_{1}},\dots ,{X_{n}}),{\bigotimes _{i=1}^{n}}\mathcal{L}({X_{i}}))$. This can be estimated by empirical estimators for (15). For the 1-stable (i.e. Cauchy) cndf, see Table 1, this direct approach to (multivariate) independence has recently been proposed by [21] – but the exact estimators become computationally challenging even for small samples. A further approximation recovers a computationally feasible estimation, resulting in a loss of power compared with our approach, cf. [7].

It is worth mentioning that the metric ${d_{\rho }}$ can be used to describe convergence in distribution.

Lemma 2.6.

Let ρ be a finite symmetric measure with full support, then ${d_{\rho }}$ given in (11) is a metric which characterizes convergence in distribution, i.e. for random variables ${X_{n}}$, $n\in \mathbb{N}$, and X one has

(21)

\[ {X_{n}}{\underset{n\to \infty }{\overset{d}{\to }}}X\hspace{0.2778em}\Longleftrightarrow \hspace{0.2778em}{d_{\rho }}\big(\mathcal{L}({X_{n}}),\mathcal{L}(X)\big){\underset{n\to \infty }{\overset{}{\to }}}0.\]

The proof below shows that the implication “⇐” does not need the finiteness of the Lévy measure ρ.

Proof.

Convergence in distribution implies pointwise convergence of the characteristic functions. Therefore, we see by dominated convergence and because of the obvious estimates $|{f_{{X_{n}}}}|\le 1$ and $|{f_{X}}|\le 1$ that

\[ \underset{n\to \infty }{\lim }{\int _{{\mathbb{R}}^{d}}}|{f_{{X_{n}}}}(r)-{f_{X}}(r){|}^{2}\hspace{0.1667em}\rho (\mathrm{d}r)={\int _{{\mathbb{R}}^{d}}}\underset{n\to \infty }{\lim }|{f_{{X_{n}}}}(r)-{f_{X}}(r){|}^{2}\hspace{0.1667em}\rho (\mathrm{d}r)=0.\]

Conversely, assume that ${\lim _{n\to \infty }}{d_{\rho }}(\mathcal{L}({X_{n}}),\mathcal{L}(X))=0$. If we interpret this as convergence in ${L}^{2}(\rho )$, we see that there is a Lebesgue a.e. convergent subsequence ${f_{{X_{n(k)}}}}\to {f_{X}}$; since ${f_{{X_{n(k)}}}}$ and ${f_{X}}$ are characteristic functions, this convergence is already pointwise, hence locally uniform, see Sasvári [27, Thm. 1.5.2]. By Lévy’s continuity theorem, this entails the convergence in distribution of the corresponding random variables. Since the limit does not depend on the subsequence, the whole sequence must converge in distribution. □

2.3 An elementary estimate for log-moments

Later on we need certain log-moments of the norm of a random vector. The following lemma allows us to formulate these moment conditions in terms of the coordinate processes.

Lemma 2.7.

Let $X,Y$ be one-dimensional random variables and $\epsilon >0$. Then $\mathbb{E}{\log }^{1+\epsilon }(1\vee \sqrt{{X}^{2}+{Y}^{2}})$ is finite if, and only if, the moments $\mathbb{E}{\log }^{1+\epsilon }(1+{X}^{2})$ and $\mathbb{E}{\log }^{1+\epsilon }(1+{Y}^{2})$ are finite.

Proof.

Assume that $\mathbb{E}{\log }^{1+\epsilon }(1+{X}^{2})+\mathbb{E}{\log }^{1+\epsilon }(1+{Y}^{2})<\infty $. Since

\[ 1\vee \sqrt{{X}^{2}+{Y}^{2}}=\sqrt{\big({X}^{2}+{Y}^{2}\big)\vee 1}\le \sqrt{\big(1+{X}^{2}\big)\big(1+{Y}^{2}\big)},\]

we can use the elementary estimate ${(a+b)}^{1+\epsilon }\le {2}^{\epsilon }({a}^{1+\epsilon }+{b}^{1+\epsilon })$, $a,b\ge 0$, to get

\[\begin{aligned}{}\mathbb{E}{\log }^{1+\epsilon }\big(1\vee \sqrt{{X}^{2}+{Y}^{2}}\big)& \le \mathbb{E}\bigg[{\bigg(\frac{1}{2}\log \big(1+{X}^{2}\big)+\frac{1}{2}\log \big(1+{Y}^{2}\big)\bigg)}^{1+\epsilon }\bigg]\\{} & \le \frac{1}{2}\mathbb{E}{\log }^{1+\epsilon }\big(1+{X}^{2}\big)+\frac{1}{2}\mathbb{E}{\log }^{1+\epsilon }\big(1+{Y}^{2}\big).\end{aligned}\]

Conversely, assume that $\mathbb{E}{\log }^{1+\epsilon }(1\vee \sqrt{{X}^{2}+{Y}^{2}})<\infty $. Then we have

\[\begin{aligned}{}\mathbb{E}& {\log }^{1+\epsilon }\big(1+{X}^{2}\big)\\{} & =\mathbb{E}\big[{\mathbb{1}_{\{|X|<1\}}}{\log }^{1+\epsilon }\big(1+{X}^{2}\big)\big]+\mathbb{E}\big[{\mathbb{1}_{\{|X|\ge 1\}}}{\log }^{1+\epsilon }\big(1+{X}^{2}\big)\big]\\{} & \le {\log }^{1+\epsilon }2+\mathbb{E}{\log }^{1+\epsilon }\big[\big(2{X}^{2}\big)\vee 2\big]\\{} & \le {\log }^{1+\epsilon }2+\mathbb{E}\big[{\big(\log 2+\log \big(1\vee {X}^{2}\big)\big)}^{1+\epsilon }\big]\\{} & \le \big(1+{2}^{\epsilon }\big){\log }^{1+\epsilon }2+\mathbb{E}{\log }^{1+\epsilon }\big(1\vee \big({X}^{2}+{Y}^{2}\big)\big)\\{} & \le \big(1+{2}^{\epsilon }\big){\log }^{1+\epsilon }2+{2}^{1+\epsilon }\mathbb{E}{\log }^{1+\epsilon }\big(1\vee \sqrt{{X}^{2}+{Y}^{2}}\big),\end{aligned}\]

and $\mathbb{E}{\log }^{1+\epsilon }(1+{Y}^{2})<\infty $ follows similarly. □

3 Generalized distance covariance

Székely et al. [37, 34] introduced distance covariance for two random variables X and Y with values in ${\mathbb{R}}^{m}$ and ${\mathbb{R}}^{n}$ as

\[ {\mathcal{V}}^{2}(X,Y;w):={\int _{{\mathbb{R}}^{n}}}{\int _{{\mathbb{R}}^{m}}}{\big|{f_{(X,Y)}}(x,y)-{f_{X}}(x){f_{Y}}(y)\big|}^{2}\hspace{0.1667em}w(x,y)\hspace{0.1667em}\mathrm{d}x\hspace{0.1667em}\mathrm{d}y,\]

with the weight $w(x,y)={w_{\alpha ,m}}(x){w_{\alpha ,n}}(y)$ where ${w_{\alpha ,m}}(x)=c(p,\alpha )|x{|}^{-m-\alpha }$, $m,n\in \mathbb{N}$, $\alpha \in (0,2)$. It is well known from the study of infinitely divisible distributions (see also Székely & Rizzo [33]) that ${w_{\alpha ,m}}(x)$ is the density of an m-dimensional α-stable Lévy measure, and the corresponding cndf is just $|x{|}^{\alpha }$.

We are going to extend distance covariance to products of Lévy measures.

Definition 3.1.

Let X and Y be random variables with values in ${\mathbb{R}}^{m}$ and ${\mathbb{R}}^{n}$ and $\rho :=\mu \otimes \nu $ where μ and ν are symmetric Lévy measures on ${\mathbb{R}}^{m}\setminus \{0\}$ and ${\mathbb{R}}^{n}\setminus \{0\}$, both having full support. The generalized distance covariance $V(X,Y)$ is defined as

(22)

\[ {V}^{2}(X,Y)={\iint _{{\mathbb{R}}^{m+n}}}|{f_{(X,Y)}}(s,t)-{f_{X}}(s){f_{Y}}(t){|}^{2}\hspace{0.1667em}\mu (\mathrm{d}s)\hspace{0.1667em}\nu (\mathrm{d}t).\]

By definition, $V(X,Y)=\| {f_{(X,Y)}}-{f_{X}}\otimes {f_{Y}}{\| _{{L}^{2}(\rho )}}$ and, in view of the discussion in Section 2.2, we have

(23)

\[ V(X,Y)={d_{\rho }}\big(\mathcal{L}(U),\mathcal{L}(V)\big),\]

where $U:=({X_{1}},{Y_{1}}),V:=({X_{2}},{Y_{3}})$ and $({X_{1}},{Y_{1}}),({X_{2}},{Y_{2}}),({X_{3}},{Y_{3}})$ are i.i.d. copies of $(X,Y)$.3 It is clear that the product measure ρ inherits the properties “symmetry” and “full support” from its marginals μ and ν.

From the discussion following Definition 2.3 we immediately get the next lemma.

Lemma 3.2.

Let ${V}^{2}(X,Y)$ be generalized distance covariance of the m- resp. n-dimensional random variables X and Y, cf. (22). The random variables X and Y are independent if, and only if, ${V}^{2}(X,Y)=0$.

3.1 Generalized distance covariance with finite Lévy measures

Fix the dimensions $m,n\in \mathbb{N}$, set $d:=m+n$, and assume that the measure ρ is of the form $\rho =\mu \otimes \nu $ where μ and ν are finite symmetric Lévy measures on ${\mathbb{R}}^{m}\setminus \{0\}$ and ${\mathbb{R}}^{n}\setminus \{0\}$, respectively. If we integrate the elementary estimates

\[\begin{array}{l}\displaystyle 1\wedge |s{|}^{2}\le 1\wedge \big(|s{|}^{2}+|t{|}^{2}\big)\le \big(1\wedge |s{|}^{2}\big)+\big(1\wedge |{t}^{2}|\big),\\{} \displaystyle 1\wedge |t{|}^{2}\le 1\wedge \big(|s{|}^{2}+|t{|}^{2}\big)\le \big(1\wedge |s{|}^{2}\big)+\big(1\wedge |{t}^{2}|\big),\end{array}\]

with respect to $\rho (\mathrm{d}s,\mathrm{d}t)=\mu (\mathrm{d}s)\hspace{0.1667em}\nu (\mathrm{d}t)$, it follows that ρ is a Lévy measure if μ and ν are finite Lévy measures.4 We also assume that μ and ν, hence ρ, have full support.

Since $V(X,Y)$ is a metric in the sense of Section 2.2 we can use all results from the previous section to derive various representations of generalized distance covariance.

We write Φ, Ψ and Θ for the bounded cndfs induced by $\mu (\mathrm{d}s)$, $\nu (\mathrm{d}t)$, and $\rho (\mathrm{d}r)=\mu (\mathrm{d}s)\hspace{0.1667em}\nu (\mathrm{d}t)$,

(24)

\[\begin{aligned}{}\varPhi (x)=& {\int _{{\mathbb{R}}^{m}}}(1-\cos x\cdot s)\hspace{0.1667em}\mu (\mathrm{d}s),\hspace{1em}\varPsi (y)={\int _{{\mathbb{R}}^{n}}}(1-\cos y\cdot t)\hspace{0.1667em}\nu (\mathrm{d}t)\\{} & \text{and}\hspace{1em}\varTheta (x,y)=\varTheta (u)={\int _{{\mathbb{R}}^{m+n}}}(1-\cos u\cdot r)\hspace{0.1667em}\rho (\mathrm{d}r),\end{aligned}\]

with $u=(x,y)\in {\mathbb{R}}^{m+n}$ and $r=(s,t)\in {\mathbb{R}}^{m+n}$. The symmetry in each variable and the elementary identity

\[\begin{aligned}{}\frac{1}{2}\big[1-& \cos (x\cdot s-y\cdot t)+1-\cos (x\cdot s+y\cdot t)\big]\\{} & =(1-\cos y\cdot t)+(1-\cos x\cdot s)-(1-\cos x\cdot s)(1-\cos y\cdot t)\end{aligned}\]

yield the representation

(25)

\[ \varTheta (x,y)=\varPsi (y)\mu \big({\mathbb{R}}^{m}\big)+\varPhi (x)\nu \big({\mathbb{R}}^{n}\big)-\varPhi (x)\varPsi (y).\]

We can now easily apply the results from Section 2.2. In order to do so, we consider six i.i.d. copies $({X_{i}},{Y_{i}})$, $i=1,\dots ,6$, of the random vector $(X,Y)$, and set $U:=({X_{1}},{Y_{1}}),V:=({X_{2}},{Y_{3}}),{U^{\prime }}:=({X_{4}},{Y_{4}}),{V^{\prime }}:=({X_{5}},{Y_{6}})$. This is a convenient way to say that

\[ \mathcal{L}(U)=\mathcal{L}\big({U^{\prime }}\big)=\mathcal{L}\big((X,Y)\big)\hspace{1em}\text{and}\hspace{1em}\mathcal{L}(V)=\mathcal{L}\big({V^{\prime }}\big)=\mathcal{L}(X)\otimes \mathcal{L}(Y)\]

and $U,{U^{\prime }},V$ and ${V^{\prime }}$ are independent.5

The following formulae follow directly from Proposition 2.4.d) and Remark 2.5.a).

Proposition 3.3.

Let X and Y be random variables with values in ${\mathbb{R}}^{m}$ and ${\mathbb{R}}^{n}$ and assume that $\mu ,\nu $ are finite symmetric Lévy measures on ${\mathbb{R}}^{m}\setminus \{0\}$ and ${\mathbb{R}}^{n}\setminus \{0\}$ with full support. Generalized distance covariance has the following representations

(26)

\[\begin{aligned}{}{V}^{2}(X,Y)& ={d_{\rho }^{2}}\big(\mathcal{L}(U),\mathcal{L}(V)\big)\end{aligned}\]

(27)

\[\begin{aligned}{}& =2\mathbb{E}\varTheta \big(U-{V^{\prime }}\big)-\mathbb{E}\varTheta \big(U-{U^{\prime }}\big)-\mathbb{E}\varTheta \big(V-{V^{\prime }}\big)\end{aligned}\]

(28)

\[\begin{aligned}{}& =2\mathbb{E}\varTheta ({X_{1}}-{X_{5}},{Y_{1}}-{Y_{6}})-\mathbb{E}\varTheta ({X_{1}}-{X_{4}},{Y_{1}}-{Y_{4}})\\{} & \hspace{2em}\hspace{1em}-\mathbb{E}\varTheta ({X_{2}}-{X_{5}},{Y_{3}}-{Y_{6}})\end{aligned}\]

(29)

\[\begin{aligned}{}& =\mathbb{E}\varPhi ({X_{1}}-{X_{4}})\varPsi ({Y_{1}}-{Y_{4}})+\mathbb{E}\varPhi ({X_{2}}-{X_{5}})\mathbb{E}\varPsi ({Y_{3}}-{Y_{6}})\\{} & \hspace{2em}\hspace{1em}-2\mathbb{E}\varPhi ({X_{1}}-{X_{5}})\varPsi ({Y_{1}}-{Y_{6}}).\end{aligned}\]

The latter equality follows from (25) since the terms depending only on one of the variables cancel as the random variables $({X_{i}},{Y_{i}})$ are i.i.d. This gives rise to various further representations of $V(X,Y)$.

Corollary 3.4.

Let $(X,Y)$, $({X_{i}},{Y_{i}})$, $\varPhi ,\varPsi $ and $\mu ,\nu $ be as in Proposition 3.3. Generalized distance covariance has the following representations

(30)

\[\begin{aligned}{}& {V}^{2}(X,Y)\\{} & =\mathbb{E}\varPhi ({X_{1}}-{X_{4}})\varPsi ({Y_{1}}-{Y_{4}})-2\mathbb{E}\varPhi ({X_{1}}-{X_{2}})\varPsi ({Y_{1}}-{Y_{3}})\\{} & \hspace{2em}\hspace{1em}+\mathbb{E}\varPhi ({X_{1}}-{X_{2}})\mathbb{E}\varPsi ({Y_{3}}-{Y_{4}})\end{aligned}\]

(31)

\[\begin{aligned}{}& =\mathbb{E}\big[\varPhi ({X_{1}}-{X_{4}})\cdot \big\{\varPsi ({Y_{1}}-{Y_{4}})-2\varPsi ({Y_{1}}-{Y_{3}})+\varPsi ({Y_{2}}-{Y_{3}})\big\}\big]\end{aligned}\]

(32)

\[\begin{aligned}{}& =\mathbb{E}\big[\big\{\varPhi ({X_{1}}-{X_{4}})-\varPhi ({X_{4}}-{X_{2}})\big\}\cdot \big\{\varPsi ({Y_{4}}-{Y_{1}})-\varPsi ({Y_{1}}-{Y_{3}})\big\}\big]\end{aligned}\]

(33)

\[\begin{aligned}{}& =\mathbb{E}\big[\big\{\varPhi ({X_{1}}-{X_{4}})-\mathbb{E}\big(\varPhi ({X_{4}}-{X_{1}})\mid {X_{4}}\big)\big\}\\{} & \hspace{2em}\hspace{1em}\cdot \big\{\varPsi ({Y_{4}}-{Y_{1}})-\mathbb{E}\big(\varPsi ({Y_{1}}-{Y_{4}})\mid {Y_{1}}\big)\big\}\big].\end{aligned}\]

Corollary 3.4 shows, in particular, that $V(X,Y)$ can be written as a function of $({X_{i}},{Y_{i}})$, $i=1,\dots ,4$,

(34)

\[ {V}^{2}(X,Y)=\mathbb{E}\big[g\big(({X_{1}},{Y_{1}}),\dots ,({X_{4}},{Y_{4}})\big)\big]\]

for an appropriate function g; for instance, the formula (30) follows for

\[\begin{aligned}{}& g\big(({x_{1}},{y_{1}}),\dots ,({x_{4}},{y_{4}})\big)\\{} & =\varPhi ({x_{1}}-{x_{4}})\varPsi ({y_{1}}-{y_{4}})-2\varPhi ({x_{1}}-{x_{2}})\varPsi ({y_{1}}-{y_{3}})+\varPhi ({x_{1}}-{x_{2}})\varPsi ({y_{3}}-{y_{4}}).\end{aligned}\]

Corollary 3.5.

Let $(X,Y)$, $({X_{i}},{Y_{i}})$, $\varPhi ,\varPsi $ and $\mu ,\nu $ be as in Proposition 3.3 and write

\[\begin{aligned}{}\overline{\overline{\varPhi }}& =\varPhi ({X_{1}}-{X_{4}})-\mathbb{E}\big(\varPhi ({X_{4}}-{X_{1}})\mid {X_{4}}\big)\\{} & \hspace{1em}\hspace{2em}-\mathbb{E}\big(\varPhi ({X_{4}}-{X_{1}})\mid {X_{1}}\big)+\mathbb{E}\varPhi ({X_{1}}-{X_{4}}),\\{} \overline{\overline{\varPsi }}& =\varPsi ({Y_{1}}-{Y_{4}})-\mathbb{E}\big(\varPsi ({Y_{4}}-{Y_{1}})\mid {Y_{4}}\big)\\{} & \hspace{1em}\hspace{2em}-\mathbb{E}\big(\varPsi ({Y_{4}}-{Y_{1}})\mid {Y_{1}}\big)+\mathbb{E}\varPsi ({Y_{1}}-{Y_{4}}),\end{aligned}\]

for the “doubly centered” versions of $\varPhi ({X_{1}}-{X_{4}})$ and $\varPsi ({Y_{1}}-{Y_{4}})$. Generalized distance covariance has the following representation

(35)

\[ {V}^{2}(X,Y)=\mathbb{E}[\hspace{0.1667em}\overline{\overline{\varPhi }}\cdot \overline{\overline{\varPsi }}\hspace{0.1667em}].\]

Proof.

Denote by $\overline{\varPhi }=\varPhi ({X_{1}}-{X_{4}})-\mathbb{E}(\varPhi ({X_{4}}-{X_{1}})\mid {X_{4}})$ and $\overline{\varPsi }=\varPsi ({Y_{1}}-{Y_{4}})-\mathbb{E}(\varPsi ({Y_{4}}-{Y_{1}})\mid {Y_{1}})$ the centered random variables appearing in (33). Clearly,

\[ \overline{\overline{\varPhi }}=\overline{\varPhi }-\mathbb{E}(\overline{\varPhi }\mid {X_{1}})\hspace{1em}\text{and}\hspace{1em}\overline{\overline{\varPsi }}=\overline{\varPsi }-\mathbb{E}(\overline{\varPsi }\mid {Y_{4}}).\]

Thus,

\[\begin{aligned}{}\mathbb{E}[\overline{\overline{\varPhi }}\cdot \overline{\overline{\varPsi }}]& =\mathbb{E}\big[\big(\overline{\varPhi }-\mathbb{E}(\overline{\varPhi }\mid {X_{1}})\big)\cdot \big(\overline{\varPsi }-\mathbb{E}(\overline{\varPsi }\mid {Y_{4}})\big)\big]\\{} & =\mathbb{E}[\overline{\varPhi }\cdot \overline{\varPsi }]+\mathbb{E}\big[\mathbb{E}(\overline{\varPhi }\mid {X_{1}})\cdot \mathbb{E}(\overline{\varPsi }\mid {Y_{4}})\big]\\{} & \hspace{2em}-\mathbb{E}\big[\mathbb{E}(\overline{\varPhi }\mid {X_{1}})\cdot \overline{\varPsi }\big]-\mathbb{E}\big[\overline{\varPhi }\cdot \mathbb{E}(\overline{\varPsi }\mid {Y_{4}})\big].\end{aligned}\]

Since ${X_{1}}$ and ${Y_{4}}$ are independent, we have

\[ \mathbb{E}\big[\mathbb{E}(\overline{\varPhi }\mid {X_{1}})\cdot \mathbb{E}(\overline{\varPsi }\mid {Y_{4}})\big]=\mathbb{E}\big[\mathbb{E}(\overline{\varPhi }\mid {X_{1}})\big]\cdot \mathbb{E}\big[\mathbb{E}(\overline{\varPsi }\mid {Y_{4}})\big]=\mathbb{E}[\overline{\varPhi }]\cdot \mathbb{E}[\overline{\varPsi }]=0.\]

Using the tower property and the independence of $({X_{1}},{Y_{1}})$ and ${Y_{4}}$ we get

\[\begin{aligned}{}\mathbb{E}\big[\mathbb{E}(\overline{\varPhi }\mid {X_{1}})\cdot \overline{\varPsi }\big]& =\mathbb{E}\big[\mathbb{E}\big[\mathbb{E}(\overline{\varPhi }\mid {X_{1}})\mid {Y_{1}},{Y_{4}}\big]\cdot \overline{\varPsi }\big]\\{} & =\mathbb{E}\big[\mathbb{E}\big[\mathbb{E}(\overline{\varPhi }\mid {X_{1}})\mid {Y_{1}}\big]\cdot \overline{\varPsi }\big]=0,\end{aligned}\]

where we use, for the last equality, that $\overline{\varPsi }$ is orthogonal to the ${L}^{2}$-space of ${Y_{1}}$-measurable functions. In a similar fashion we see $\mathbb{E}[\overline{\varPhi }\cdot \mathbb{E}(\overline{\varPsi }\mid {Y_{4}})]=0$, and the assertion follows because of (33). □

In Section 4 we will encounter further representations of the generalized distance covariance if X and Y have discrete distributions with finitely many values, as it is the case for empirical distributions.

3.2 Generalized distance covariance with arbitrary Lévy measures

So far, we have been considering finite Lévy measures $\mu ,\nu $ and bounded cndfs Φ and Ψ (24). We will now extend our considerations to products of unbounded Lévy measures. The measure $\rho :=\mu \otimes \nu $ satisfies the integrability condition

(36)

\[ \underset{{\mathbb{R}}^{m+n}}{\iint }\hspace{-0.1667em}\hspace{-0.1667em}\big(1\wedge |x{|}^{2}\big)\big(1\wedge |y{|}^{2}\big)\hspace{0.1667em}\rho (\mathrm{d}x,\mathrm{d}y)=\hspace{-0.1667em}\hspace{-0.1667em}\underset{{\mathbb{R}}^{m}}{\int }\hspace{-0.1667em}\hspace{-0.1667em}\big(1\wedge |x{|}^{2}\big)\hspace{0.1667em}\mu (\mathrm{d}x)\hspace{-0.1667em}\hspace{-0.1667em}\underset{{\mathbb{R}}^{n}}{\int }\hspace{-0.1667em}\hspace{-0.1667em}\big(1\wedge |y{|}^{2}\big)\hspace{0.1667em}\nu (\mathrm{d}y)<\infty .\]

Other than in the case of finite marginals, ρ is no longer a Lévy measure, see the footnote on page . Thus, the function Θ defined in (24) need not be a cndf and we cannot directly apply Proposition 2.4; instead we need the following result ensuring the finiteness of $V(X,Y)$.

Lemma 3.6.

Let $X,{X^{\prime }}$ be i.i.d. random variables on ${\mathbb{R}}^{m}$ and $Y,{Y^{\prime }}$ be i.i.d. random variables on ${\mathbb{R}}^{n}$; μ and ν are symmetric Lévy measures on ${\mathbb{R}}^{m}$ and ${\mathbb{R}}^{n}$ with full support and with corresponding cndfs Φ and Ψ as in (24). Then

(37)

\[ {V}^{2}(X,Y)\le \mathbb{E}\varPhi \big(X-{X^{\prime }}\big)\cdot \mathbb{E}\varPsi \big(Y-{Y^{\prime }}\big)\le 16\hspace{0.1667em}\mathbb{E}\varPhi (X)\cdot \mathbb{E}\varPsi (Y).\]

Proof.

Following Székely et al. [37, p. 2772] we get

(38)

\[\begin{aligned}{}|{f_{X,Y}}(s,t)-{f_{X}}(s){f_{Y}}(t){|}^{2}& ={\big|\mathbb{E}\big(\big({\mathrm{e}}^{\mathrm{i}X\cdot s}-{f_{X}}(s)\big)\big({\mathrm{e}}^{\mathrm{i}Y\cdot t}-{f_{Y}}(t)\big)\big)\big|}^{2}\\{} & \le {\big[\mathbb{E}\big(|{\mathrm{e}}^{\mathrm{i}X\cdot s}-{f_{X}}(s)||{\mathrm{e}}^{\mathrm{i}Y\cdot t}-{f_{Y}}(t)|\big)\big]}^{2}\\{} & \le \mathbb{E}\big[|{\mathrm{e}}^{\mathrm{i}X\cdot s}-{f_{X}}(s){|}^{2}\big]\cdot \mathbb{E}\big[|{\mathrm{e}}^{\mathrm{i}Y\cdot t}-{f_{Y}}(t){|}^{2}\big]\end{aligned}\]

and

(39)

\[\begin{aligned}{}\mathbb{E}\big[|{\mathrm{e}}^{\mathrm{i}X\cdot s}-{f_{X}}(s){|}^{2}\big]& =\mathbb{E}\big[\big({\mathrm{e}}^{\mathrm{i}sX}-{f_{X}}(s)\big)\big({\mathrm{e}}^{-\mathrm{i}sX}-\overline{{f_{X}}(s)}\big)\big]\\{} & =1-|{f_{X}}(s){|}^{2}=1-{f_{X}}(s)\overline{{f_{X}}(s)}.\end{aligned}\]

Using (16) for $\rho =\mu $, $\varTheta =\varPhi $, $U=X$ and (13) for $\varTheta =\varPhi $, $U=X$, $V={X^{\prime }}$ shows

\[ {\int _{{\mathbb{R}}^{m}}}\mathbb{E}\big[|{\mathrm{e}}^{\mathrm{i}X\cdot s}-{f_{X}}(s){|}^{2}\big]\hspace{0.1667em}\mu (\mathrm{d}s)=\mathbb{E}\varPhi \big(X-{X^{\prime }}\big)\le 4\mathbb{E}\varPhi (X),\]

and an analogous argument for ν and Y yields the bound (37). □

Looking at the various representations (29)–(33) of $V(X,Y)$ it is clear that these make sense as soon as all expectations in these expressions are finite, i.e. some moment condition in terms of Φ and Ψ should be enough to ensure the finiteness of $V(X,Y)$ and all terms appearing in the respective representations.

In order to use the results of the previous section we fix $\epsilon >0$ and consider the finite symmetric Lévy measures

(40)

\[ {\mu _{\epsilon }}(\mathrm{d}s):=\frac{|s{|}^{2}}{{\epsilon }^{2}+|s{|}^{2}}\hspace{0.1667em}\mu (\mathrm{d}s)\hspace{1em}\text{and}\hspace{1em}{\nu _{\epsilon }}(\mathrm{d}t):=\frac{|t{|}^{2}}{{\epsilon }^{2}+|t{|}^{2}}\hspace{0.1667em}\nu (\mathrm{d}t),\]

and the corresponding cndfs ${\varPhi _{\epsilon }}$ and ${\varPsi _{\epsilon }}$ given by (24); the product measure ${\rho _{\epsilon }}:={\mu _{\epsilon }}\otimes {\nu _{\epsilon }}$ is a finite Lévy measure and the corresponding cndf ${\varTheta _{\epsilon }}$ is also bounded (it can be expressed by ${\varPhi _{\epsilon }}$ and ${\varPsi _{\epsilon }}$ through the formula (25)).

This allows us to derive the representations (29)–(33) for each $\epsilon >0$ and with ${\varPhi _{\epsilon }}$ and ${\varPsi _{\epsilon }}$. Since we have $\varPhi ={\sup _{\epsilon }}{\varPhi _{\epsilon }}$ and $\varPsi ={\sup _{\epsilon }}{\varPsi _{\epsilon }}$, we can use monotone convergence to get the representations for the cndfs Φ and Ψ with Lévy measures μ and ν, respectively. Of course, this requires the existence of certain (mixed) Φ-Ψ moments of the random variables $(X,Y)$.

Theorem 3.7.

Let μ and ν be symmetric Lévy measures on ${\mathbb{R}}^{m}\setminus \{0\}$ and ${\mathbb{R}}^{n}\setminus \{0\}$ with full support and corresponding cndfs Φ and Ψ given by (24). For any random vector $(X,Y)$ with values in ${\mathbb{R}}^{m+n}$ satisfying the following moment condition

(41)

\[ \mathbb{E}\varPhi (X)+\mathbb{E}\varPsi (Y)<\infty ,\]

the generalized distance correlation $V(X,Y)$ is finite. If additionally

(42)

\[ \mathbb{E}\big(\varPhi (X)\varPsi (Y)\big)<\infty ,\]

then also the representations (29)–(33) hold with all terms finite.6

Proof.

We only have to check the finiteness. Using Lemma 3.6, we see that (41) guarantees that $V(X,Y)<\infty $. The finiteness of (all the terms appearing in) the representations (29)–(33) follows from the monotone convergence argument, since the moment condition (42) ensures the finiteness of the limiting expectations. □

Remark 3.8.

a) Using the Hölder inequality the following condition implies (41) and (42):

(43)

\[ \mathbb{E}{\varPhi }^{p}(X)+\mathbb{E}{\varPsi }^{q}(Y)<\infty \hspace{1em}\text{for some}\hspace{2.5pt}p,q>1\hspace{2.5pt}\text{with}\hspace{2.5pt}\frac{1}{p}+\frac{1}{q}=1.\]

If one of Ψ or Φ is bounded then (41) implies (42), and if both are bounded then the expectations are trivially finite.

Since continuous negative definite functions grow at most quadratically, we see that (41) and (42) also follow if

(44)

\[ \mathbb{E}|X{|}^{2p}+\mathbb{E}|Y{|}^{2q}<\infty \hspace{1em}\text{for some}\hspace{2.5pt}p,q>1\hspace{2.5pt}\text{with}\hspace{2.5pt}\frac{1}{p}+\frac{1}{q}=1.\]

b) A slightly different set-up was employed by Lyons [23]: If the cndfs Φ and Ψ are subadditive, then the expectation in (32) is finite. This is a serious restriction on the class of cndfs since subadditivity means that Φ and Ψ can grow at most linearly at infinity, whereas general cndfs grow at most quadratically, cf. (9). Note, however, that square roots of real cndfs are always subadditive, cf. (7).

4 Estimating generalized distance covariance

Let ${({x_{i}},{y_{i}})}_{i=1,\dots ,N}$ be a sample of $(X,Y)$ and denote by $({\widehat{X}}^{(N)},{\widehat{Y}}^{(N)})$ the random variable which has the corresponding empirical distribution, i.e. the uniform distribution on ${({x_{i}},{y_{i}})}_{i=1,\dots ,N}$; to be precise, repeated points will have the corresponding multiple weight. By definition, $({\widehat{X}}^{(N)},{\widehat{Y}}^{(N)})$ is a bounded random variable and for i.i.d. copies with $\mathcal{L}(({\widehat{X}_{i}^{(N)}},{\widehat{Y}_{i}^{(N)}}))=\mathcal{L}(({\widehat{X}}^{(N)},{\widehat{Y}}^{(N)}))$ for $i=1,\dots ,4$, we have using (34)

(45)

\[\begin{aligned}{}\mathbb{E}& \big[g\big(\big({\widehat{X}_{1}^{(N)}},{\widehat{Y}_{1}^{(N)}}\big),\dots ,\big({\widehat{X}_{4}^{(N)}},{\widehat{Y}_{4}^{(N)}}\big)\big)\big]\\{} & ={\int _{{\mathbb{R}}^{n}\setminus \{0\}}}{\int _{{\mathbb{R}}^{m}\setminus \{0\}}}{\big|{f_{({\widehat{X}}^{(N)},{\widehat{Y}}^{(N)})}}(s,t)-{f_{{\widehat{X}}^{(N)}}}(s){f_{{\widehat{Y}}^{(N)}}}(t)\big|}^{2}\hspace{0.1667em}\mu (\mathrm{d}s)\hspace{0.1667em}\nu (\mathrm{d}t).\end{aligned}\]

The formulae (27)–(33) hold, in particular, for the empirical random variables $({\widehat{X}}^{(N)},{\widehat{Y}}^{(N)})$, and so we get

(46)

\[\begin{aligned}{}& \mathbb{E}\big[g\big(\big({\widehat{X}_{1}^{(N)}},{\widehat{Y}_{1}^{(N)}}\big),\dots ,\big({\widehat{X}_{4}^{(N)}},{\widehat{Y}_{4}^{(N)}}\big)\big)\big]\\{} & =\frac{1}{{N}^{4}}{\sum \limits_{i,j,k,l=1}^{N}}g\big(({x_{i}},{y_{i}}),({x_{j}},{y_{j}}),({x_{k}},{y_{k}}),({x_{l}},{y_{l}})\big)\end{aligned}\]

(47)

\[\begin{aligned}{}& =\frac{1}{{N}^{4}}{\sum \limits_{i,j,k,l=1}^{N}}\big[\varPhi ({x_{i}}-{x_{k}})\varPsi ({y_{i}}-{y_{k}})-2\varPhi ({x_{i}}-{x_{j}})\varPsi ({y_{i}}-{y_{l}})\\{} & \phantom{=\frac{1}{{N}^{4}}\sum \sum [}+\varPhi ({x_{i}}-{x_{j}})\varPsi ({y_{k}}-{y_{l}})\big]\end{aligned}\]

(48)

\[\begin{aligned}{}& =\frac{1}{{N}^{2}}{\sum \limits_{i,k=1}^{N}}\varPhi ({x_{i}}-{x_{k}})\varPsi ({y_{i}}-{y_{k}})-\frac{2}{{N}^{3}}{\sum \limits_{i,j,l=1}^{N}}\varPhi ({x_{i}}-{x_{j}})\varPsi ({y_{i}}-{y_{l}})\\{} & \hspace{2em}+\frac{1}{{N}^{2}}{\sum \limits_{i,j=1}^{N}}\varPhi ({x_{i}}-{x_{j}})\frac{1}{{N}^{2}}{\sum \limits_{k,l=1}^{N}}\varPsi ({y_{k}}-{y_{l}})\end{aligned}\]

(49)

\[\begin{aligned}{}& =\bigg(\frac{1}{{N}^{2}}-\frac{2}{{N}^{3}}+\frac{2}{{N}^{4}}\bigg){\sum \limits_{\begin{array}{c}i,k=1\\{} \text{distinct}\end{array}}^{N}}\varPhi ({x_{i}}-{x_{k}})\varPsi ({y_{i}}-{y_{k}})\\{} & \hspace{2em}+\bigg(\frac{4}{{N}^{4}}-\frac{2}{{N}^{3}}\bigg){\sum \limits_{\begin{array}{c}i,j,l=1\\{} \text{distinct}\end{array}}^{N}}\varPhi ({x_{i}}-{x_{j}})\varPsi ({y_{i}}-{y_{l}})\\{} & \hspace{2em}+\frac{1}{{N}^{4}}{\sum \limits_{\begin{array}{c}i,j,k,l=1\\{} \text{distinct}\end{array}}^{N}}\varPhi ({x_{i}}-{x_{j}})\varPsi ({y_{k}}-{y_{l}}).\end{aligned}\]

The sum in (46) is – for functions g which are symmetric under permutations of their variables – a V-statistic.

Definition 4.1.

The estimator ${}^{N}{V}^{2}:={}^{N}{V}^{2}(({x_{1}},{y_{1}}),\dots ,({x_{N}},{y_{N}}))$ of ${V}^{2}(X,Y)$ is defined by (47).

In abuse of notation, we also write ${}^{N}{V}^{2}(({x_{1}},\dots ,{x_{N}}),({y_{1}},\dots ,{y_{N}}))$ and ${}^{N}{V}^{2}(\boldsymbol{x},\boldsymbol{y})$ with $\boldsymbol{x}=({x_{1}},\dots ,{x_{N}})$, $\boldsymbol{y}=({y_{1}},\dots ,{y_{N}})$ instead of the precise ${}^{N}{V}^{2}(({x_{1}},{y_{1}}),\dots ,({x_{N}},{y_{N}}))$.

Since all random variables are bounded, we could use any of the representations (46)–(49) to define an estimator. A computationally feasible representation is given in the following lemma.

Lemma 4.2.

The estimator ${}^{N}{V}^{2}$ of ${V}^{2}(X,Y)$ has the following representation using matrix notation:

(50)

\[ {}^{N}{V}^{2}\big(({x_{1}},{y_{1}}),\dots ,({x_{N}},{y_{N}})\big)=\frac{1}{{N}^{2}}\operatorname{trace}\big({B}^{\top }A\big)=\frac{1}{{N}^{2}}{\sum \limits_{k,l=1}^{N}}{A_{kl}}{B_{kl}},\]

where

(51)

\[ \begin{aligned}{}A& ={C}^{\top }aC,\hspace{1em}a={\big(\varPhi ({x_{k}}-{x_{l}})\big)}_{k,l=1,..,N},\\{} B& ={C}^{\top }bC,\hspace{1em}b={\big(\varPsi ({y_{k}}-{y_{l}})\big)}_{k,l=1,..,N},\end{aligned}\]

and $C=I-\frac{1}{N}\mathbb{1}$ with $\mathbb{1}={(1)_{k,l=1,\dots ,N}}$ and $I={({\delta _{jk}})}_{j,k=1,\dots ,N}$.

Remark 4.3.

If $\varPhi (x)=|x|$ and $\varPsi (y)=|y|$ the matrices a and b in (51) are Euclidean distance matrices. For general cndfs Φ and Ψ, the matrices $-a$ and $-b$ are conditionally positive definite (see Theorem 2.1), and A, B are positive definite. This gives a simple explanation as to why the right-hand side of (50) is positive.

Proof of Lemma 4.2.

By definition,

(52)

\[\begin{aligned}{}\operatorname{trace}\big({b}^{\top }a\big)& ={\sum \limits_{i,j=1}^{N}}{a_{ij}}{b_{ij}}={\sum \limits_{i,j=1}^{N}}\varPhi ({x_{i}}-{x_{j}})\varPsi ({y_{i}}-{y_{j}}),\end{aligned}\]

(53)

\[\begin{aligned}{}\operatorname{trace}\big({b}^{\top }\mathbb{1}a\big)& ={\sum \limits_{i,j,k=1}^{N}}{a_{ij}}{b_{ik}}={\sum \limits_{i,j,k=1}^{N}}\varPhi ({x_{i}}-{x_{j}})\varPsi ({y_{i}}-{y_{k}}),\end{aligned}\]

(54)

\[\begin{aligned}{}\operatorname{trace}\big(\mathbb{1}{b}^{\top }\mathbb{1}a\big)& ={\sum \limits_{i,j,k,l=1}^{N}}{a_{ij}}{b_{kl}}={\sum \limits_{i,j=1}^{N}}\varPhi ({x_{i}}-{x_{j}}){\sum \limits_{k,l=1}^{N}}\varPsi ({y_{k}}-{y_{l}}),\end{aligned}\]

and this allows us to rewrite (48) as

(55)

\[ {}^{N}{V}^{2}=\frac{1}{{N}^{2}}\operatorname{trace}\big({b}^{\top }a\big)-\frac{2}{{N}^{3}}\operatorname{trace}\big({b}^{\top }\mathbb{1}a\big)+\frac{1}{{N}^{4}}\operatorname{trace}\big(\mathbb{1}{b}^{\top }\mathbb{1}a\big).\]

Observe that $C={C}^{\top }$ and $C{C}^{\top }=C$. Using this and the fact that the trace is invariant under cyclic permutations, we get

\[\begin{aligned}{}\operatorname{trace}\big({B}^{\top }A\big)& =\operatorname{trace}\big({C}^{\top }{b}^{\top }C{C}^{\top }aC\big)\\{} & =\operatorname{trace}\big(C{C}^{\top }{b}^{\top }C{C}^{\top }a\big)=\operatorname{trace}\big(C{b}^{\top }Ca\big).\end{aligned}\]

Plugging in the definition of C now gives

\[\begin{aligned}{}\operatorname{trace}\big(& {B}^{\top }A\big)=\operatorname{trace}\big(\big({b}^{\top }-\frac{1}{N}\mathbb{1}{b}^{\top }\big)\big(a-\frac{1}{N}\mathbb{1}a\big)\big)\\{} & =\operatorname{trace}\big({b}^{\top }a\big)-\frac{1}{N}\operatorname{trace}\big(\mathbb{1}{b}^{\top }a\big)-\frac{1}{N}\operatorname{trace}\big({b}^{\top }\mathbb{1}a\big)+\frac{1}{{N}^{2}}\operatorname{trace}\big(\mathbb{1}{b}^{\top }\mathbb{1}a\big)\\{} & =\operatorname{trace}\big({b}^{\top }a\big)-\frac{2}{N}\operatorname{trace}\big({b}^{\top }\mathbb{1}a\big)+\frac{1}{{N}^{2}}\operatorname{trace}\big(\mathbb{1}{b}^{\top }\mathbb{1}a\big).\end{aligned}\]

For the last equality we use that a and b are symmetric matrices. □

We will now show that ${}^{N}{V}^{2}$ is a consistent estimator for ${V}^{2}(X,Y)$.

Theorem 4.4 (Consistency).

Let ${({X_{i}},{Y_{i}})}_{i=1,\dots ,N}$ be i.i.d. copies of $(X,Y)$ and write $\boldsymbol{X}:=({X_{1}},\dots ,{X_{N}})$, $\boldsymbol{Y}:=({Y_{1}},\dots ,{Y_{N}})$. If $\mathbb{E}[\varPhi (X)+\varPsi (Y)]<\infty $ holds, then

(56)

\[ {}^{N}{V}^{2}(\boldsymbol{X},\boldsymbol{Y}){\underset{N\to \infty }{\overset{}{\to }}}{V}^{2}(X,Y)\hspace{1em}\textit{a.s.}\]

Proof.

The moment condition $\mathbb{E}[\varPhi (X)+\varPsi (Y)]<\infty $ ensures that the generalized distance covariance ${V}^{2}(X,Y)$ is finite, cf. Lemma 3.6. Define ${\mu _{\epsilon }}$ and ${\nu _{\epsilon }}$ as in (40), and write ${V_{\epsilon }^{2}}(X,Y)$ for the corresponding generalized distance covariance and ${}^{N}{V_{\epsilon }^{2}}(\boldsymbol{X},\boldsymbol{Y})$ for its estimator. By the triangle inequality we obtain

\[\begin{aligned}{}& |{V}^{2}(X,Y)-{}^{N}{V}^{2}(\boldsymbol{X},\boldsymbol{Y})|\\{} & \hspace{2em}\le |{V}^{2}(X,Y)-{V_{\epsilon }^{2}}(X,Y)|+|{V_{\epsilon }^{2}}(X,Y)-{}^{N}{V_{\epsilon }^{2}}(\boldsymbol{X},\boldsymbol{Y})|\\{} & \hspace{2em}\hspace{2em}+|{}^{N}{V_{\epsilon }^{2}}(\boldsymbol{X},\boldsymbol{Y})-{}^{N}{V}^{2}(\boldsymbol{X},\boldsymbol{Y})|.\end{aligned}\]

We consider the three terms on the right-hand side separately. The first term vanishes as $\epsilon \to 0$, since

\[ \underset{\epsilon \to 0}{\lim }{V_{\epsilon }^{2}}(X,Y)={V}^{2}(X,Y)\]

by monotone convergence. For each $\epsilon >0$, the second term converges to zero as $N\to \infty $, since

\[ \underset{N\to \infty }{\lim }{}^{N}{V_{\epsilon }^{2}}(\boldsymbol{X},\boldsymbol{Y})={V_{\epsilon }^{2}}(X,Y)\hspace{1em}\text{a.s.}\]

by the strong law of large numbers (SLLN) for V-statistics; note that this is applicable since the functions ${\varPhi _{\epsilon }}$ and ${\varPsi _{\epsilon }}$ are bounded (because of the finiteness of the Lévy measures ${\mu _{\epsilon }}$ and ${\nu _{\epsilon }}$).

For the third term we set ${\mu }^{\epsilon }=\mu -{\mu _{\epsilon }}$, ${\nu }^{\epsilon }=\nu -{\nu _{\epsilon }}$ and write ${\varPhi }^{\epsilon }$, ${\varPsi }^{\epsilon }$ for the corresponding continuous negative definite functions. Lemma 3.6 yields the inequality

\[\begin{aligned}{}& |{}^{N}{V_{\epsilon }^{2}}\big(({x_{1}},{y_{1}}),\dots ,({x_{N}},{y_{N}})\big)-{}^{N}{V}^{2}\big(({x_{1}},{y_{1}}),\dots ,({x_{N}},{y_{N}})\big)|\\{} & \hspace{2em}=\iint {\big|{f_{{\widehat{X}}^{(N)},{\widehat{Y}}^{(N)}}}(s,t)-{f_{{\widehat{X}}^{(N)}}}(s){f_{{\widehat{Y}}^{(N)}}}(t)\big|}^{2}{\mu }^{\epsilon }(\mathrm{d}s)\hspace{0.1667em}{\nu }^{\epsilon }(\mathrm{d}t)\\{} & \hspace{2em}\le 16\hspace{0.1667em}\mathbb{E}{\varPhi }^{\epsilon }\big({\widehat{X}}^{(N)}\big)\cdot \mathbb{E}{\varPsi }^{\epsilon }\big({\widehat{Y}}^{(N)}\big)=16{\sum \limits_{i=1}^{N}}\frac{1}{N}{\varPhi }^{\epsilon }({x_{i}})\cdot {\sum \limits_{i=1}^{N}}\frac{1}{N}{\varPsi }^{\epsilon }({y_{i}}).\end{aligned}\]

From the representation (24) we know that ${\varPhi }^{\epsilon }(x)\le \varPhi (x)$, hence also $\mathbb{E}{\varPhi }^{\epsilon }(X)\le \mathbb{E}\varPhi (X)$ and this is finite by assumption. Therefore, we can use monotone convergence to conclude that ${\lim _{\epsilon \to 0}}\mathbb{E}{\varPhi }^{\epsilon }(X)=0$. Thus, the classical SLLN applies and proves

\[\begin{aligned}{}\underset{\epsilon \to 0}{\lim }\underset{N\to \infty }{\limsup }|{}^{N}{V_{\epsilon }^{2}}(\boldsymbol{X},\boldsymbol{Y})-{}^{N}{V}^{2}(\boldsymbol{X},\boldsymbol{Y})|& \le \underset{\epsilon \to 0}{\lim }\mathbb{E}{\varPhi }^{\epsilon }(X)\cdot \mathbb{E}{\varPsi }^{\epsilon }(Y)=0\hspace{1em}\text{a.s.}\end{aligned}\]

□

Next we study the behaviour of the estimator under the hypothesis of independence.

Theorem 4.5.

If X and Y are independent and satisfy the moment conditions

(57)

\[ \mathbb{E}\big[\varPhi (X)+\varPsi (Y)\big]<\infty \hspace{1em}\textit{and}\hspace{1em}\mathbb{E}\big[{\log }^{1+\epsilon }\big(1+|X{|}^{2}\big)+{\log }^{1+\epsilon }\big(1+|Y{|}^{2}\big)\big]<\infty \]

for some $\epsilon >0$, then

(58)

\[ N\cdot {}^{N}{V}^{2}(\boldsymbol{X},\boldsymbol{Y}){\underset{N\to \infty }{\overset{d}{\to }}}\iint |\mathbb{G}(s,t){|}^{2}\hspace{0.1667em}\mu (\mathrm{d}s)\hspace{0.1667em}\nu (\mathrm{d}t)=\| \mathbb{G}{\| _{\mu \otimes \nu }^{2}}\]

in distribution, where ${(\mathbb{G}(s,t))_{(s,t)\in {\mathbb{R}}^{m+n}}}$ is a complex-valued Gaussian random field with $\mathbb{E}(\mathbb{G}(s,t))=0$ and

(59)

\[\begin{aligned}{}& \operatorname{Cov}\big(\mathbb{G}(s,t),\mathbb{G}\big({s^{\prime }},{t^{\prime }}\big)\big)\\{} & \hspace{1em}=\big({f_{X}}\big(s-{s^{\prime }}\big)-{f_{X}}(s)\overline{{f_{X}}\big({s^{\prime }}\big)}\big)\cdot \big({f_{Y}}\big(t-{t^{\prime }}\big)-{f_{Y}}(t)\overline{{f_{Y}}\big({t^{\prime }}\big)}\big).\end{aligned}\]

Proof.

Let $X,Y$ and $\mathbb{G}$ be as above and let ${X^{\prime }},{X_{i}}$ and ${Y^{\prime }},{Y_{i}}$ ($i\in \mathbb{N}$) be independent random variables with laws $\mathcal{L}(X)$ and $\mathcal{L}(Y)$, respectively. Define for $N\in \mathbb{N}$, $s\in {\mathbb{R}}^{m}$, $t\in {\mathbb{R}}^{n}$

(60)

\[ {Z_{N}}(s,t):=\frac{1}{N}{\sum \limits_{k=1}^{N}}{\mathrm{e}}^{\mathrm{i}s\cdot {X_{k}}+\mathrm{i}t\cdot {Y_{k}}}-\frac{1}{{N}^{2}}{\sum \limits_{k,l=1}^{N}}{\mathrm{e}}^{\mathrm{i}s\cdot {X_{k}}+\mathrm{i}t\cdot {Y_{l}}}.\]

Then

(61)

\[ N\cdot {}^{N}{V}^{2}(\boldsymbol{X},\boldsymbol{Y})=\| \sqrt{N}{Z_{N}}{\| _{\mu \otimes \nu }^{2}}.\]

The essential idea is to show that (on an appropriate space) $\sqrt{N}{Z_{N}}\stackrel{d}{\to }\mathbb{G}$ and then to apply the continuous mapping theorem.

Since X and Y are independent, we have

(62)

\[\begin{aligned}{}& \mathbb{E}\big({Z_{N}}(s,t)\big)=0,\end{aligned}\]

(63)

\[\begin{aligned}{}& \mathbb{E}\big({Z_{N}}(s,t)\overline{{Z_{N}}\big({s^{\prime }},{t^{\prime }}\big)}\big)\\{} & \hspace{1em}=\frac{N-1}{{N}^{2}}\big({f_{X}}\big(s-{s^{\prime }}\big)-{f_{X}}(s)\overline{{f_{X}}\big({s^{\prime }}\big)}\big)\big({f_{Y}}\big(t-{t^{\prime }}\big)-{f_{Y}}(t)\overline{{f_{Y}}\big({t^{\prime }}\big)}\big),\end{aligned}\]

(64)

\[\begin{aligned}{}& \mathbb{E}|\sqrt{N}{Z_{N}}(s,t){|}^{2}=\frac{N-1}{N}\big(1-|{f_{X}}(s){|}^{2}\big)\big(1-|{f_{Y}}(t){|}^{2}\big).\end{aligned}\]

The first identity (62) is obvious and (64) follows from (63) if we set $s={s^{\prime }}$ and $t={t^{\prime }}$. The proof of (63) is deferred to Lemma 4.6 following this proof.

The convergence $\sqrt{N}{Z_{N}}\stackrel{d}{\to }\mathbb{G}$ in ${\mathcal{C}_{T}}:=(C({K_{T}}),\| .{\| _{{K_{T}}}})$ with ${K_{T}}=\{x\in {\mathbb{R}}^{m+n}:|x|\le T\}$, i.e. in the space of continuous functions on ${K_{T}}$ equipped with the supremum norm, holds if $\mathbb{E}{\log }^{1+\epsilon }(\sqrt{|X{|}^{2}+|Y{|}^{2}}\vee 1)<\infty $, see Csörgő [14, Thm. on p. 294] or Ushakov [38, Sec. 3.7]. This log-moment condition is equivalent to the log-moment condition (57), see Lemma 2.7.

In fact, the result in [14] is cast in a more general setting, proving the convergence for vectors $({X_{1}},\dots ,{X_{N}})$, but only one-dimensional marginals are considered. The proof for multidimensional marginals is very similar, so we will only give an outline:

Let ${F_{Z}}$ denote the distribution function of the random variable Z and ${}^{N}{F_{z}}$ the empirical distribution of a sample ${z_{1}},\dots ,{z_{N}}$; if the sample is replaced by N independent copies of Z, we write ${}^{N}{F_{Z}}$. Using this notation and the independence of X and Y yields the representation

(65)

\[\begin{aligned}{}\sqrt{N}{Z_{N}}(s,t)& =\sqrt{N}\Bigg(\frac{1}{N}{\sum \limits_{k=1}^{N}}{\mathrm{e}}^{\mathrm{i}s\cdot {X_{k}}+\mathrm{i}t\cdot {Y_{k}}}-{f_{(X,Y)}}(s,t)\Bigg)\\{} & \hspace{2em}-\sqrt{N}\Bigg(\frac{1}{N}{\sum \limits_{k=1}^{N}}{\mathrm{e}}^{\mathrm{i}s\cdot {X_{k}}}-{f_{X}}(s)\Bigg)\Bigg(\frac{1}{N}{\sum \limits_{l=1}^{N}}{\mathrm{e}}^{\mathrm{i}t\cdot {Y_{l}}}\Bigg)\\{} & \hspace{2em}-\sqrt{N}\Bigg(\frac{1}{N}{\sum \limits_{l=1}^{N}}{\mathrm{e}}^{\mathrm{i}t\cdot {Y_{l}}}-{f_{Y}}(t)\Bigg){f_{X}}(s)\\{} & =\int {\mathrm{e}}^{\mathrm{i}(s\cdot x+t\cdot y)}\hspace{0.1667em}\mathrm{d}\big(\sqrt{N}\big({}^{N}{F_{(X,Y)}}(x,y)-{F_{(X,Y)}}(x,y)\big)\big)\\{} & \hspace{2em}-\int {\mathrm{e}}^{\mathrm{i}s\cdot x}\hspace{0.1667em}\mathrm{d}\big(\sqrt{N}\big({}^{N}{F_{X}}(x)-{F_{X}}(x)\big)\big)\cdot \Bigg(\frac{1}{N}{\sum \limits_{l=1}^{N}}{\mathrm{e}}^{\mathrm{i}t\cdot {Y_{l}}}\Bigg)\\{} & \hspace{2em}-\int {\mathrm{e}}^{\mathrm{i}t\cdot y}\hspace{0.1667em}\mathrm{d}\big(\sqrt{N}\big({}^{N}{F_{Y}}(y)-{F_{Y}}(y)\big)\big)\cdot {f_{X}}(s).\end{aligned}\]

Note that for bivariate distributions the integrals with respect to a single variable reduce to integrals with respect to the corresponding marginal distribution, e.g. ${\int _{{\mathbb{R}}^{m+n}}}h(x)d{F_{(X,Y)}}(x,y)={\int _{{\mathbb{R}}^{m}}}h(x)d{F_{X}}(x)$. Therefore, a straightforward calculation shows that $\sqrt{N}{Z_{N}}(s,t)$ equals

(66)

\[ \int g(x,y)\hspace{2.5pt}\mathrm{d}\big(\sqrt{N}\big({}^{N}{F_{(X,Y)}}(x,y)-{F_{(X,Y)}}(x,y)\big)\big)\]

with the integrand

\[ g(x,y):={\mathrm{e}}^{\mathrm{i}(s\cdot x+t\cdot y)}-{\mathrm{e}}^{\mathrm{i}s\cdot x}\Bigg[\frac{1}{N}{\sum \limits_{l=1}^{N}}{\mathrm{e}}^{\mathrm{i}t\cdot {Y_{l}}}\Bigg]-{f_{X}}(t){\mathrm{e}}^{\mathrm{i}t\cdot y}.\]

Following Csörgő [13] we obtain for $N\to \infty $ the limit

(67)

\[ {\int _{{\mathbb{R}}^{m+n}}}\big({\mathrm{e}}^{\mathrm{i}s\cdot x+\mathrm{i}t\cdot y}-{\mathrm{e}}^{\mathrm{i}s\cdot x}{f_{Y}}(t)-{f_{X}}(s){\mathrm{e}}^{\mathrm{i}t\cdot y}\big)\mathrm{d}B(x,y),\]

where B is a Brownian bridge; as in [13, Eq. (3.2)] one can show that it is a Gaussian process indexed by ${\mathbb{R}}^{m+n}$ satisfying

(68)

\[\begin{aligned}{}\mathbb{E}\big(B(x,y)\big)& =0,\end{aligned}\]

(69)

\[\begin{aligned}{}\mathbb{E}\big(B(x,y)B\big({x^{\prime }},{y^{\prime }}\big)\big)& =\mathbb{P}\big(X\le x\wedge {x^{\prime }},Y\le y\wedge {y^{\prime }}\big)\\{} & \hspace{2em}-\mathbb{P}(X\le x,Y\le y)\mathbb{P}\big(X\le {x^{\prime }},Y\le {y^{\prime }}\big),\end{aligned}\]

(70)

\[\begin{aligned}{}\underset{x\to -\infty }{\lim }B(x,y)& =\underset{y\to -\infty }{\lim }B(x,y)=\underset{(x,y)\to (\infty ,\infty )}{\lim }B(x,y)=0.\end{aligned}\]

The limit (67) is continuous if, and only if, a rather complicated tail condition is satisfied [13, Thm. 3.1]; Csörgő [14, p. 294] shows that this condition is implied by the simpler moment condition (57), cf. Lemma 2.7. Thus, $\sqrt{N}{Z_{N}}\stackrel{d}{\to }\mathbb{G}$ in ${\mathcal{C}_{T}}:=(C({K_{T}}),\| .{\| _{{K_{T}}}})$.

Pick $\epsilon >0$, set $\delta :=1/\epsilon $ and define

(71)

\[ {\mu _{\epsilon ,\delta }}(A):=\mu \big(A\cap \{\epsilon \le |s|<\delta \}\big)\hspace{1em}\text{and}\hspace{1em}{\mu }^{\epsilon ,\delta }:=\mu -{\mu _{\epsilon ,\delta }};\]

the measures ${\nu _{\epsilon ,\delta }}$ and ${\nu }^{\epsilon ,\delta }$ are defined analogously. Note that

(72)

\[\begin{aligned}{}{\big|\| h{\| _{{\mu _{\epsilon ,\delta }}\otimes {\nu _{\epsilon ,\delta }}}}-\| {h^{\prime }}{\| _{{\mu _{\epsilon ,\delta }}\otimes {\nu _{\epsilon ,\delta }}}}\big|}^{2}& \le \| h-{h^{\prime }}{\| _{{\mu _{\epsilon ,\delta }}\otimes {\nu _{\epsilon ,\delta }}}^{2}}\\{} & =\int |h-{h^{\prime }}{|}^{2}\hspace{0.1667em}\mathrm{d}{\mu _{\epsilon ,\delta }}\otimes {\nu _{\epsilon ,\delta }}\\{} & \le \| h-{h^{\prime }}{\| _{{K_{T}}}^{2}}\cdot {\mu _{\epsilon ,\delta }}\big({\mathbb{R}}^{n}\big)\cdot {\nu _{\epsilon ,\delta }}\big({\mathbb{R}}^{m}\big)\end{aligned}\]

shows that $h\mapsto \| h{\| _{{\mu _{\epsilon ,\delta }}\otimes {\nu _{\epsilon ,\delta }}}^{2}}$ is continuous on ${\mathcal{C}_{T}}$. Thus, the continuous mapping theorem implies

(73)

\[ \| \sqrt{N}{Z_{N}}{\| _{{\mu _{\epsilon ,\delta }}\otimes {\nu _{\epsilon ,\delta }}}^{2}}{\underset{N\to \infty }{\overset{d}{\to }}}\| \mathbb{G}{\| _{{\mu _{\epsilon ,\delta }}\otimes {\nu _{\epsilon ,\delta }}}^{2}}.\]

By the triangle inequality we have

(74)

\[\begin{aligned}{}\big|N\cdot {}^{N}{V}^{2}(\boldsymbol{X},\boldsymbol{Y})-\| \mathbb{G}{\| _{\mu \otimes \nu }^{2}}\big|& \le \big|N\cdot {}^{N}{V}^{2}(\boldsymbol{X},\boldsymbol{Y})-\| \sqrt{N}{Z_{N}}{\| _{{\mu _{\epsilon ,\delta }}\otimes {\nu _{\epsilon ,\delta }}}^{2}}\big|\\{} & \hspace{2em}+\big|\| \sqrt{N}{Z_{N}}{\| _{{\mu _{\epsilon ,\delta }}\otimes {\nu _{\epsilon ,\delta }}}^{2}}-\| \mathbb{G}{\| _{{\mu _{\epsilon ,\delta }}\otimes {\nu _{\epsilon ,\delta }}}^{2}}\big|\\{} & \hspace{2em}+\big|\| \mathbb{G}{\| _{{\mu _{\epsilon ,\delta }}\otimes {\nu _{\epsilon ,\delta }}}^{2}}-\| \mathbb{G}{\| _{\mu \otimes \nu }^{2}}\big|.\end{aligned}\]

Thus, it remains to show that the first and last terms on the right-hand side vanish uniformly as $\epsilon \to 0$. Note that

(75)

\[\begin{aligned}{}& \| \mathbb{G}{\| _{\mu \otimes \nu }^{2}}-\| \mathbb{G}{\| _{{\mu _{\epsilon ,\delta }}\otimes {\nu _{\epsilon ,\delta }}}^{2}}\\{} & \hspace{1em}=\| \mathbb{G}{\| _{{\mu }^{\epsilon ,\delta }\otimes {\nu }^{\epsilon ,\delta }}^{2}}+\| \mathbb{G}{\| _{{\mu _{\epsilon ,\delta }}\otimes {\nu }^{\epsilon ,\delta }}^{2}}+\| \mathbb{G}{\| _{{\mu }^{\epsilon ,\delta }\otimes {\nu _{\epsilon ,\delta }}}^{2}}{\underset{\epsilon \to 0}{\overset{}{\to }}}0\hspace{2.5pt}\text{a.s.}\end{aligned}\]

This follows from the dominated convergence theorem, since

(76)

\[\begin{aligned}{}\mathbb{E}\big(\| \mathbb{G}{\| _{\mu \otimes \nu }^{2}}\big)& ={\big\| 1-|{f_{X}}{|}^{2}\big\| _{\mu }^{2}}\cdot {\big\| 1-|{f_{Y}}{|}^{2}\big\| _{\nu }^{2}}\\{} & =\mathbb{E}\varPhi \big(X-{X^{\prime }}\big)\cdot \mathbb{E}\varPsi \big(Y-{Y^{\prime }}\big)<\infty .\end{aligned}\]

Moreover,

(77)

\[\begin{aligned}{}\mathbb{E}& \big(\big|N\cdot {}^{N}{V}^{2}(\boldsymbol{X},\boldsymbol{Y})-\| \sqrt{N}{Z_{N}}{\| _{{\mu _{\epsilon ,\delta }}\otimes {\nu _{\epsilon ,\delta }}}^{2}}\big|\big)\\{} & =\mathbb{E}\| \sqrt{N}{Z_{N}}{\| _{{\mu }^{\epsilon ,\delta }\otimes {\nu }^{\epsilon ,\delta }}^{2}}+\mathbb{E}\| \sqrt{N}{Z_{N}}{\| _{{\mu _{\epsilon ,\delta }}\otimes {\nu }^{\epsilon ,\delta }}^{2}}+\mathbb{E}\| \sqrt{N}{Z_{N}}{\| _{{\mu }^{\epsilon ,\delta }\otimes {\nu _{\epsilon ,\delta }}}^{2}}\end{aligned}\]

and for the first term we have

(78)

\[ {\big\| \mathbb{E}\big(|\sqrt{N}{Z_{N}}{|}^{2}\big)\big\| _{{\mu }^{\epsilon ,\delta }\otimes {\nu }^{\epsilon ,\delta }}^{2}}={\big(\frac{N-1}{N}\big)}^{2}\cdot {\big\| 1-|{f_{X}}{|}^{2}\big\| _{{\mu }^{\epsilon ,\delta }}^{2}}\cdot {\big\| 1-|{f_{Y}}{|}^{2}\big\| _{{\nu }^{\epsilon ,\delta }}^{2}}{\underset{\epsilon \to 0}{\overset{}{\to }}}0\]

by dominated convergence, since we have $\| 1-|{f_{X}}{|}^{2}{\| _{{\mu }^{\epsilon ,\delta }}^{2}}\le \mathbb{E}\varPhi (X-{X^{\prime }})<\infty $ and $\| 1-|{f_{Y}}{|}^{2}{\| _{{\nu }^{\epsilon ,\delta }}^{2}}\le \mathbb{E}\varPsi (Y-{Y^{\prime }})<\infty $. The other summands are dealt with similarly.

The result follows since the convergence in (75) and (78) is uniform in N. □

We still have to prove (63).

Lemma 4.6.

In the setting of (the proof of ) Theorem 4.5 we have

Proof.

Observe that

\[\begin{aligned}{}{Z_{N}}(s,t)& =\frac{1}{N}{\sum \limits_{k=1}^{N}}{\mathrm{e}}^{\mathrm{i}s\cdot {X_{k}}+\mathrm{i}t\cdot {Y_{k}}}-\frac{1}{{N}^{2}}{\sum \limits_{k,l=1}^{N}}{\mathrm{e}}^{\mathrm{i}s\cdot {X_{k}}+\mathrm{i}t\cdot {Y_{l}}}\\{} & =\frac{1}{N}{\sum \limits_{k=1}^{N}}\Bigg({\mathrm{e}}^{\mathrm{i}s\cdot {X_{k}}}-\frac{1}{N}{\sum \limits_{l=1}^{N}}{\mathrm{e}}^{\mathrm{i}s\cdot {X_{l}}}\Bigg)\Bigg({\mathrm{e}}^{\mathrm{i}t\cdot {Y_{k}}}-\frac{1}{N}{\sum \limits_{l=1}^{N}}{\mathrm{e}}^{\mathrm{i}t\cdot {Y_{l}}}\Bigg).\end{aligned}\]

Using this formula and the independence of the random variables $({X_{1}},\dots ,{X_{N}})$ and $({Y_{1}},\dots ,{Y_{N}})$ yields

\[\begin{aligned}{}& \mathbb{E}\big({Z_{N}}(s,t)\overline{{Z_{N}}\big({s^{\prime }},{t^{\prime }}\big)}\big)\\{} & =\frac{1}{{N}^{2}}{\sum \limits_{j,k=1}^{N}}\mathbb{E}\Bigg[\Bigg({\mathrm{e}}^{\mathrm{i}s\cdot {X_{k}}}-\frac{1}{N}{\sum \limits_{l=1}^{N}}{\mathrm{e}}^{\mathrm{i}s\cdot {X_{l}}}\Bigg)\Bigg({\mathrm{e}}^{\mathrm{i}t\cdot {Y_{k}}}-\frac{1}{N}{\sum \limits_{l=1}^{N}}{\mathrm{e}}^{\mathrm{i}t\cdot {Y_{l}}}\Bigg)\\{} & \hspace{2em}\hspace{2em}\hspace{2em}\hspace{2em}\times \Bigg({\mathrm{e}}^{-\mathrm{i}{s^{\prime }}\cdot {X_{j}}}-\frac{1}{N}{\sum \limits_{l=1}^{N}}{\mathrm{e}}^{-\mathrm{i}{s^{\prime }}\cdot {X_{l}}}\Bigg)\Bigg({\mathrm{e}}^{-\mathrm{i}{t^{\prime }}\cdot {Y_{j}}}-\frac{1}{N}{\sum \limits_{l=1}^{N}}{\mathrm{e}}^{-\mathrm{i}{t^{\prime }}\cdot {Y_{l}}}\Bigg)\Bigg]\\{} & =\frac{1}{{N}^{2}}{\sum \limits_{j,k=1}^{N}}\mathbb{E}\Bigg[\Bigg({\mathrm{e}}^{\mathrm{i}s\cdot {X_{k}}}-\frac{1}{N}{\sum \limits_{l=1}^{N}}{\mathrm{e}}^{\mathrm{i}s\cdot {X_{l}}}\Bigg)\Bigg({\mathrm{e}}^{-\mathrm{i}{s^{\prime }}\cdot {X_{j}}}-\frac{1}{N}{\sum \limits_{l=1}^{N}}{\mathrm{e}}^{-\mathrm{i}{s^{\prime }}\cdot {X_{l}}}\Bigg)\Bigg]\\{} & \hspace{2em}\hspace{2em}\hspace{2em}\hspace{2em}\times \mathbb{E}\Bigg[\Bigg({\mathrm{e}}^{\mathrm{i}t\cdot {Y_{k}}}-\frac{1}{N}{\sum \limits_{l=1}^{N}}{\mathrm{e}}^{\mathrm{i}t\cdot {Y_{l}}}\Bigg)\Bigg({\mathrm{e}}^{-\mathrm{i}{t^{\prime }}\cdot {Y_{j}}}-\frac{1}{N}{\sum \limits_{l=1}^{N}}{\mathrm{e}}^{-\mathrm{i}{t^{\prime }}\cdot {Y_{l}}}\Bigg)\Bigg].\end{aligned}\]

A lengthy but otherwise straightforward calculation shows that

\[\begin{aligned}{}& \mathbb{E}\Bigg[\Bigg({\mathrm{e}}^{\mathrm{i}s\cdot {X_{k}}}-\frac{1}{N}{\sum \limits_{l=1}^{N}}{\mathrm{e}}^{\mathrm{i}s\cdot {X_{l}}}\Bigg)\Bigg({\mathrm{e}}^{-\mathrm{i}{s^{\prime }}\cdot {X_{j}}}-\frac{1}{N}{\sum \limits_{l=1}^{N}}{\mathrm{e}}^{-\mathrm{i}{s^{\prime }}\cdot {X_{l}}}\Bigg)\Bigg]\\{} & \hspace{2em}=\mathbb{E}\big({\mathrm{e}}^{\mathrm{i}s\cdot {X_{k}}-\mathrm{i}{s^{\prime }}\cdot {X_{j}}}\big)-\frac{N-1}{N}{f_{X}}(s)\overline{{f_{X}}\big({s^{\prime }}\big)}-\frac{1}{N}{f_{X}}\big(s-{s^{\prime }}\big),\end{aligned}\]

and an analogous formula holds for the ${Y_{i}}$. Summing over $k,j=1,\dots ,N$ and distinguishing between the cases $k=j$ and $k\ne j$ finally gives

\[\begin{aligned}{}& \mathbb{E}\big({Z_{N}}(s,t)\overline{{Z_{N}}\big({s^{\prime }},{t^{\prime }}\big)}\big)\\{} & =\bigg(\frac{{(N-1)}^{2}}{{N}^{3}}+\frac{N-1}{{N}^{3}}\bigg)\big({f_{X}}\big(s-{s^{\prime }}\big)-{f_{X}}(s)\overline{{f_{X}}\big({s^{\prime }}\big)}\big)\big({f_{Y}}\big(t-{t^{\prime }}\big)-{f_{Y}}(t)\overline{{f_{Y}}\big({t^{\prime }}\big)}\big),\end{aligned}\]

and the lemma follows. □

Remark 4.7.

a) If we symmetrize the expression for $N\cdot {}^{N}{V}^{2}$ in a suitable way, we can transform it into a degenerate U-statistic. For random variables with bounded second moments we can then use classical results to show the convergence to ${\sum _{k=1}^{\infty }}{\lambda _{k}}{X_{k}^{2}}$, where ${\lambda _{k}}$ are some coefficients and ${X_{k}}$ are i.i.d. standard normal random variables, see e.g. Serfling [32, Sec. 5.5.2] or Witting & Müller-Funk [39, Satz 7.183]. In order to relax the bounds on the moments one would have to show convergence of the corresponding ${\lambda _{k}}$.

b) The log-moment condition can be slightly relaxed, but this leads to a much more involved expression, cf. Csörgő [14], and for the case $\epsilon =0$ a counterexample is known, see Csörgő [12, p. 133]. Unfortunately, the convergence of the characteristic functions is stated without any moment condition in Murata [24, Thm. 4] which is based on Feuerverger & Mureika [16, Thm. 3.1].

Corollary 4.8.

Assume that X, Y are non-degenerate with $\mathbb{E}\varPhi (X)+\mathbb{E}\varPsi (Y)<\infty $ and set

\[ {a_{N}}:=\frac{1}{{N}^{2}}{\sum \limits_{i,k=1}^{N}}\varPhi ({X_{i}}-{X_{k}})\hspace{1em}\textit{and}\hspace{1em}{b_{N}}:=\frac{1}{{N}^{2}}{\sum \limits_{j,l=1}^{N}}\varPsi ({Y_{j}}-{Y_{l}}).\]

a) If X, Y are independent random variables satisfying the log-moment conditions $\mathbb{E}{\log }^{1+\epsilon }(1+|X{|}^{2})<\infty $ and $\mathbb{E}{\log }^{1+\epsilon }(1+|Y{|}^{2})<\infty $ for some $\epsilon >0$, then

(79)
\[ \frac{N\cdot {}^{N}{V}^{2}}{{a_{N}}{b_{N}}}{\underset{N\to \infty }{\overset{d}{\to }}}{\sum \limits_{k=1}^{\infty }}{\alpha _{k}}{X_{k}^{2}},\]
where ${X_{i}}$ are independent standard normal random variables and the coefficients ${\alpha _{k}}$ satisfy ${\alpha _{k}}\ge 0$ with ${\sum _{k=1}^{\infty }}{\alpha _{k}}=1$.
b) If the random variables X and Y are not independent, then

(80)
\[ \frac{N\cdot {}^{N}{V}^{2}}{{a_{N}}{b_{N}}}{\underset{N\to \infty }{\overset{d}{\to }}}\infty .\]

Proof.

We can almost literally follow the proof of Corollary 2 and Theorem 6 in Székely et al. [37]. Just note that ${a_{N}}{b_{N}}$ is an estimator for $\mathbb{E}\varPhi (X-{X^{\prime }})\cdot \mathbb{E}\varPsi (Y-{Y^{\prime }})$ and this is, by (64) and (76), the limit of the expectation of the numerator. □

5 Remarks on the uniqueness of the Cauchy distance covariance

It is a natural question whether it is possible to extend distance covariance further by taking a measure ρ in the definition (22) which does not factorize. To ensure the finiteness of $V(X,Y)$ one still has to assume

(81)

\[ \int \big(1\wedge |s{|}^{2}\big)\big(1\wedge |t{|}^{2}\big)\hspace{0.1667em}\rho (\mathrm{d}s,\mathrm{d}t)<\infty ,\]

see also Székely & Rizzo [36, Eq. (2.4)] and (22) at the beginning of Section 3.2. Furthermore, it is no restriction to assume that ρ is symmetric, since the integrand in (22) is symmetric, hence we can always symmetrize any non-symmetric measure ρ without changing the value of the integral. Thus, the function

(82)

\[ \varTheta (x,y):={\int _{{\mathbb{R}}^{n}}}{\int _{{\mathbb{R}}^{m}}}\big(1-\cos (x\cdot s)\big)\big(1-\cos (x\cdot t)\big)\hspace{0.1667em}\rho (\mathrm{d}s,\mathrm{d}t)\]

is well-defined and symmetric in each variable. The corresponding generalized distance covariance ${V}^{2}(X,Y)$ can be expressed by (28), if the expectations on the right-hand side are finite. Note, however, that the nice and computationally feasible representations of ${V}^{2}(X,Y)$ make essential use of the factorization of ρ which means that they are no longer available in this setting.

Let X and Y be random variables with values in ${\mathbb{R}}^{m}$ and ${\mathbb{R}}^{n}$, respectively, such that for some $x\in {\mathbb{R}}^{m}$ and $y\in {\mathbb{R}}^{n}$

(83)

\[ \mathbb{P}(X=0)=\mathbb{P}(X=x)=\mathbb{P}(Y=0)=\mathbb{P}(Y=y)=\frac{1}{2}.\]

A direct calculation of (28), using $\varTheta (0,y)=\varTheta (x,0)=0$, gives

(84)

\[ {V}^{2}(X,Y)=\gamma \cdot \varTheta (x,y),\]

with

(85)

\[\begin{aligned}{}\gamma & =2\mathbb{P}({X_{1}}\ne {X_{5}},{Y_{1}}\ne {Y_{6}})-\mathbb{P}({X_{1}}\ne {X_{4}},{Y_{1}}\ne {Y_{4}})\\{} & \hspace{1em}-\mathbb{P}({X_{2}}\ne {X_{5}})\mathbb{P}({Y_{3}}\ne {Y_{6}}),\end{aligned}\]

where $({X_{i}},{Y_{i}}),i=1,\dots ,6$, are i.i.d. copies of the random vector $(X,Y)$.

Now suppose that ${V}^{2}(X,Y)$ is homogeneous and/or rotationally invariant, i.e. for some $\alpha ,\beta \in (0,2)$, all scalars $a,b>0$ and orthogonal matrices $A\in {\mathbb{R}}^{n\times n},B\in {\mathbb{R}}^{m\times m}$

(86)

\[\begin{aligned}{}{V}^{2}(aX,bY)& ={a}^{\alpha }{b}^{\beta }{V}^{2}(X,Y),\end{aligned}\]

(87)

\[\begin{aligned}{}{V}^{2}(AX,BY)& ={V}^{2}(X,Y)\end{aligned}\]

hold. The homogeneity, (86), yields

(88)

\[ \varTheta (x,y)=|x{|}^{\alpha }|y{|}^{\beta }\varTheta \big(\frac{x}{|x|},\frac{y}{|y|}\big),\]

and the rotational invariance, (87), shows that $\varTheta (x/|x|,y/|y|)$ is a constant. In particular, homogeneity of degree $\alpha =\beta =1$ and rotational invariance yield that $\varTheta (x,y)=\text{const}\cdot |x|\cdot |y|$. Since the Lévy–Khintchine formula furnishes a one-to-one correspondence between the cndf and its Lévy triplet, see (the comments following) Theorem 2.1, this determines ρ uniquely: it factorizes into two Cauchy Lévy measures. This means that – even in a larger class of weights – the assumptions (86) and (87) imply a unique (up to a constant) choice of weights, and we have recovered Székely-and-Rizzo’s uniqueness result from [36].

Lemma 5.1.

Let ${V}^{2}(X,Y):=\| {f_{X,Y}}-{f_{X}}\otimes {f_{Y}}{\| _{{L}^{2}(\rho )}^{2}}$ be a generalized distance covariance as in Definition 2.3 and assume that the symmetric measure ρ satisfies the integrability condition (81). If ${V}^{2}(X,Y)$ is homogeneous of order $\alpha \in (0,2)$ and $\beta \in (0,2)$ and rotationally invariant in each argument, then the measure ρ defining ${V}^{2}(X,Y)$ is – up to a multiplicative constant – of the form

\[ \rho (\mathrm{d}s,\mathrm{d}t)=c(\alpha ,m)c(\beta ,n)|s{|}^{-\alpha -m}|t{|}^{-\beta -n}\hspace{0.1667em}\mathrm{d}s\hspace{0.1667em}\mathrm{d}t.\]

Moreover, ${V}^{2}(X,Y)$ can be represented by (28) with $\varTheta (x,y)=C\cdot |x{|}^{\alpha }\cdot |y{|}^{\beta }$.

For completeness, let us mention that the constants $c(\alpha ,m)$ and $c(\beta ,n)$ are of the form

\[ c(\alpha ,m)=\alpha {2}^{\alpha -1}{\pi }^{-m/2}\varGamma \big(\frac{\alpha +m}{2}\big)/\varGamma \big(1-\frac{\alpha }{2}\big),\]

see e.g. [10, p. 34, Example 2.4.d)] or [3, p. 184, Exercise 18.23].

6 Generalized distance correlation

We continue our discussion in the setting of Section 3.2. Let $\rho =\mu \otimes \nu $ and assume that μ and ν are symmetric Lévy measures on ${\mathbb{R}}^{m}\setminus \{0\}$ and ${\mathbb{R}}^{n}\setminus \{0\}$, each with full support. For m- and n-dimensional random variables X and Y the generalized distance covariance is, cf. Definition 3.1,

(89)

\[ V(X,Y)=\sqrt{\iint |{f_{(X,Y)}}(s,t)-{f_{X}}(s){f_{Y}}(t){|}^{2}\hspace{0.1667em}\mu (\mathrm{d}s)\hspace{0.1667em}\nu (\mathrm{d}t)}.\]

We set

(90)

\[\begin{aligned}{}V(X):=& \sqrt{\int |{f_{(X,X)}}(s,t)-{f_{X}}(s){f_{X}}(t){|}^{2}\hspace{0.1667em}\mu (\mathrm{d}s)\hspace{0.1667em}\mu (\mathrm{d}t)}\\{} =& \| {f_{(X,X)}}-{f_{X}}\otimes {f_{X}}{\| _{{L}^{2}(\mu \otimes \mu )}},\end{aligned}\]

(91)

\[\begin{aligned}{}V(Y):=& \sqrt{\int |{f_{(Y,Y)}}(s,t)-{f_{Y}}(s){f_{Y}}(t){|}^{2}\hspace{0.1667em}\nu (\mathrm{d}s)\hspace{0.1667em}\nu (\mathrm{d}t)}\\{} =& \| {f_{(Y,Y)}}-{f_{Y}}\otimes {f_{Y}}{\| _{{L}^{2}(\nu \otimes \nu )}},\end{aligned}\]

and define generalized distance correlation as

(92)

\[ R(X,Y):=\left\{\begin{array}{l@{\hskip10.0pt}l}\frac{V(X,Y)}{\sqrt{V(X)V(Y)}},\hspace{1em}& \text{if}\hspace{5pt}V(X)\cdot V(Y)>0,\\{} 0,\hspace{1em}& \text{otherwise.}\end{array}\right.\]

Using the Cauchy–Schwarz inequality it follows from (38) that, whenever $R(X,Y)$ is well defined, one has

(93)

\[ 0\le R(X,Y)\le 1\hspace{1em}\text{and}\hspace{1em}R(X,Y)=0\hspace{2.5pt}\text{iff}\hspace{2.5pt}X,Y\hspace{2.5pt}\text{are independent.}\]

The sample distance correlation is given by

(94)

\[ {}^{N}R\big(({x_{1}},{y_{1}}),\dots ,({x_{N}},{y_{N}})\big)={\Bigg(\frac{\frac{1}{{N}^{2}}{\textstyle\sum _{k,l=1}^{N}}{A_{kl}}{B_{kl}}}{\sqrt{\frac{1}{{N}^{2}}{\textstyle\sum _{k,l=1}^{N}}{A_{kl}}{A_{kl}}}\cdot \sqrt{\frac{1}{{N}^{2}}{\textstyle\sum _{k,l=1}^{N}}{B_{kl}}{B_{kl}}}}\Bigg)}^{\frac{1}{2}},\]

where we use the notation introduced in Lemma 4.2.

Example 6.1.

For standard normal random variables $X,Y$ with $\rho =\operatorname{Cor}(X,Y)$, $\varPhi (x)=|x|$ and $\varPsi (y)=|y|$ the distance correlation becomes

(95)

\[ R(X,Y)={\bigg(\frac{\sqrt{1-{\rho }^{2}}-\sqrt{4-{\rho }^{2}}+\rho (\arcsin \rho -\arcsin \frac{\rho }{2})+1}{1-\sqrt{3}+\pi /3}\bigg)}^{1/2}\le |\rho |,\]

cf. Székely & Rizzo [34, Thm. 6].

7 Gaussian covariance

Let us finally show that the results on Brownian covariance of Székely & Rizzo [34, Sec. 3] do have an analogue for the generalized distance covariance.

For a symmetric Lévy measure with corresponding continuous negative definite function $\varPhi :{\mathbb{R}}^{m}\to \mathbb{R}$ let ${G_{\varPhi }}$ be the Gaussian field indexed by ${\mathbb{R}}^{m}$ with

(96)

\[ \mathbb{E}{G_{\varPhi }}(x)=0\hspace{1em}\text{and}\hspace{1em}\mathbb{E}\big({G_{\varPhi }}(x){G_{\varPhi }}\big({x^{\prime }}\big)\big)=\varPhi (x)+\varPhi \big({x^{\prime }}\big)-\varPhi \big(x-{x^{\prime }}\big).\]

Analogously we define the random field ${G_{\varPsi }}$.

For a random variable Z with values in ${\mathbb{R}}^{d}$ and for a Gaussian random field G indexed by ${\mathbb{R}}^{d}$ we set

(97)

\[ {Z}^{G}:=G(Z)-\mathbb{E}\big(G(Z)\mid G\big).\]

Definition 7.1.

Let ${G_{\varPhi }}$, ${G_{\varPsi }}$ be mean-zero Gaussian random fields indexed by ${\mathbb{R}}^{m}$ and ${\mathbb{R}}^{n}$ and with covariance structure given by the cndfs Φ and Ψ, respectively. For any two m- and n-dimensional random variables X and Y the Gaussian covariance is defined as

(98)

\[ {G}^{2}(X,Y):={\operatorname{Cov}_{{G_{\varPhi }},{G_{\varPsi }}}^{2}}(X,Y):=\mathbb{E}\big({X_{1}^{{G_{\varPhi }}}}{X_{2}^{{G_{\varPhi }}}}{Y_{1}^{{G_{\varPsi }}}}{Y_{2}^{{G_{\varPsi }}}}\big),\]

where $({X_{1}},{Y_{1}}),({X_{2}},{Y_{2}})$ are i.i.d. copies of $(X,Y)$.

We can now identify Gaussian covariance and generalized distance covariance.

Theorem 7.2 (Gaussian covariance is generalized distance covariance).

Assume that X and Y satisfy $\mathbb{E}\varPhi (X)+\mathbb{E}\varPsi (Y)<\infty $. If ${G_{\varPhi }}$ and ${G_{\varPsi }}$ are independent, then

(99)

\[ {G}^{2}(X,Y)={V}^{2}(X,Y).\]

Proof.

The proof is similar to Székely & Rizzo [34, Thm. 8]. By conditioning and the independence of ${G_{\varPhi }}$ and ${G_{\varPsi }}$ we see

(100)

\[\begin{aligned}{}& \mathbb{E}\big({X_{1}^{{G_{\varPhi }}}}{X_{2}^{{G_{\varPhi }}}}{Y_{1}^{{G_{\varPsi }}}}{Y_{2}^{{G_{\varPsi }}}}\big)\\{} & =\mathbb{E}\big(\mathbb{E}\big({X_{1}^{{G_{\varPhi }}}}{X_{2}^{{G_{\varPhi }}}}\mid {X_{1}},{X_{2}},{Y_{1}},{Y_{2}}\big)\cdot \mathbb{E}\big({Y_{1}^{{G_{\varPsi }}}}{Y_{2}^{{G_{\varPsi }}}}\mid {X_{1}},{X_{2}},{Y_{1}},{Y_{2}}\big)\big).\end{aligned}\]

Using $\mathbb{E}({G_{\varPhi }}(x){G_{\varPhi }}({x^{\prime }}))=\varPhi (x)+\varPhi ({x^{\prime }})-\varPhi (x-{x^{\prime }})=:\varphi (x,{x^{\prime }})$ yields

\[\begin{aligned}{}& \mathbb{E}\big({X_{1}^{{G_{\varPhi }}}}{X_{2}^{{G_{\varPhi }}}}\mid {X_{1}},{X_{2}},{Y_{1}},{Y_{2}}\big)\\{} & =\varphi ({X_{1}},{X_{2}})-\mathbb{E}\big(\varphi ({X_{1}},{X_{2}})\mid {X_{1}}\big)-\mathbb{E}\big(\varphi ({X_{1}},{X_{2}})\mid {X_{2}}\big)+\mathbb{E}\varphi ({X_{1}},{X_{2}})\\{} & =-\varPhi ({X_{1}}-{X_{2}})+\mathbb{E}\big(\varPhi ({X_{1}}-{X_{2}})\mid {X_{1}}\big)+\mathbb{E}\big(\varPhi ({X_{1}}-{X_{2}})\mid {X_{2}}\big)\\{} & \hspace{2em}-\mathbb{E}\varPhi ({X_{1}}-{X_{2}}),\end{aligned}\]

where the second equality is due to cancellations. An analogous calculation for $\mathbb{E}({Y_{1}^{{G_{\varPsi }}}}{Y_{2}^{{G_{\varPsi }}}}\mid {X_{1}},{X_{2}},{Y_{1}},{Y_{2}})$ turns (100) into (35).

For (100) we have to make sure that $\mathbb{E}(|{X_{1}^{{G_{\varPhi }}}}{X_{2}^{{G_{\varPhi }}}}{Y_{1}^{{G_{\varPsi }}}}{Y_{2}^{{G_{\varPsi }}}}|)<\infty $. This follows from

\[\begin{aligned}{}& \mathbb{E}\left(|{X_{1}^{{G_{\Phi }}}}{X_{2}^{{G_{\Phi }}}}{Y_{1}^{{G_{\Psi }}}}{Y_{2}^{{G_{\Psi }}}}|\right)\\{} & =\mathbb{E}\left(\mathbb{E}\left[|{X_{1}^{{G_{\Phi }}}}{X_{2}^{{G_{\Phi }}}}|\hspace{0.2778em}\Big|\hspace{0.2778em}{X_{1}},{X_{2}},{Y_{1}},{Y_{2}}\right]\mathbb{E}\left[|{Y_{1}^{{G_{\Psi }}}}{Y_{2}^{{G_{\Psi }}}}|\hspace{0.2778em}\Big|\hspace{0.2778em}{X_{1}},{X_{2}},{Y_{1}},{Y_{2}}\right]\right)\\{} & \le \mathbb{E}\bigg(\sqrt{\mathbb{E}\left[|{X_{1}^{{G_{\Phi }}}}{|}^{2}\hspace{0.2778em}\Big|\hspace{0.2778em}{X_{1}},{X_{2}},{Y_{1}},{Y_{2}}\right]\mathbb{E}\left[|{X_{2}^{{G_{\Phi }}}}{|}^{2}\hspace{0.2778em}\Big|\hspace{0.2778em}{X_{1}},{X_{2}},{Y_{1}},{Y_{2}}\right]}\\{} & \hspace{2em}\hspace{1em}\times \sqrt{\mathbb{E}\left[|{Y_{1}^{{G_{\Psi }}}}{|}^{2}\hspace{0.2778em}\Big|\hspace{0.2778em}{X_{1}},{X_{2}},{Y_{1}},{Y_{2}}\right]\mathbb{E}\left[|{Y_{2}^{{G_{\Psi }}}}{|}^{2}\hspace{0.2778em}\Big|\hspace{0.2778em}{X_{1}},{X_{2}},{Y_{1}},{Y_{2}}\right]}\bigg)\\{} & =\mathbb{E}\bigg(\sqrt{\mathbb{E}\left[|{X_{1}^{{G_{\Phi }}}}{|}^{2}\hspace{0.2778em}\Big|\hspace{0.2778em}{X_{1}},{Y_{1}}\right]\mathbb{E}\left[|{X_{2}^{{G_{\Phi }}}}{|}^{2}\hspace{0.2778em}\Big|\hspace{0.2778em}{X_{2}},{Y_{2}}\right]}\\{} & \hspace{2em}\hspace{1em}\times \sqrt{\mathbb{E}\left[|{Y_{1}^{{G_{\Psi }}}}{|}^{2}\hspace{0.2778em}\Big|\hspace{0.2778em}{X_{1}},{Y_{1}}\right]\mathbb{E}\left[|{Y_{2}^{{G_{\Psi }}}}{|}^{2}\hspace{0.2778em}\Big|\hspace{0.2778em}{X_{2}},{Y_{2}}\right]}\bigg)\\{} & =\mathbb{E}{\bigg(\sqrt{\mathbb{E}\left[|{X_{1}^{{G_{\Phi }}}}{|}^{2}\hspace{0.2778em}\Big|\hspace{0.2778em}{X_{1}},{Y_{1}}\right]}\bigg)}^{2}\mathbb{E}{\bigg(\sqrt{\mathbb{E}\left[|{Y_{1}^{{G_{\Psi }}}}{|}^{2}\hspace{0.2778em}\Big|\hspace{0.2778em}{X_{1}},{Y_{1}}\right]}\bigg)}^{2}\\{} & \le \mathbb{E}\left(|{X_{1}^{{G_{\Phi }}}}{|}^{2}\right)\mathbb{E}\left(|{Y_{1}^{{G_{\Psi }}}}{|}^{2}\right).\end{aligned}\]

In this calculation we use first the independence of ${G_{\Phi }}$ and ${G_{\Psi }}$, the conditional Cauchy–Schwarz inequality and the fact that the random variables $({X_{1}},{Y_{1}})$ and $({X_{2}},{Y_{2}})$ are i.i.d. In the final estimate we use again the Cauchy–Schwarz inequality. In order to see that the right-hand side is finite, we note that (96) and (97) yield

\[\begin{aligned}{}\mathbb{E}\left(|{X}^{{G_{\Phi }}}{|}^{2}\right)& =\mathbb{E}\left(|{G_{\Phi }}(X)-\mathbb{E}({G_{\Phi }}(X)\mid {G_{\Phi }}){|}^{2}\right)\\{} & \le 2\mathbb{E}\left(|{G_{\Phi }}(X){|}^{2}\right)=4\mathbb{E}\Phi (X)<\infty .\end{aligned}\]

A similar estimate for Y completes the proof. □

8 Conclusion

We have shown that the concept of distance covariance introduced by Székely et al. [37] can be embedded into a more general framework based on Lévy measures, cf. Section 3. In this generalized setting the key results for statistical applications are: the convergence of the estimators and the fact that also the limit distribution of the (scaled) estimators is known, cf. Section 4. Moreover – for applications this is of major importance – the estimators have the numerically efficient representation (50).

The results allow the use of generalized distance covariance in the tests for independence developed for distance covariance, e.g. tests based on a general Gaussian quadratic form estimate or resampling tests. The test statistic is the function $T:=\frac{N\cdot {}^{N}{V}^{2}}{{a_{N}}{b_{N}}}$ discussed in Corollary 4.8. Using the quadratic form estimate (see [37] for details) its p-value can be estimated by $1-F(T)$ where F is the distribution function of the Chi-squared distribution with 1 degree of freedom. This test and resampling tests are studied in detail in [9] and [7], respectively. In addition, these papers contain illustrating examples which show that the new flexibility provided by the choice of the Lévy measures (equivalently: by the choice of the continuous negative definite function) can be used to improve the power of these tests. Moreover, new test procedures using distance covariance and its generalizations are developed in [5].

Finally, the presented results are also the foundation for a new approach to testing and measuring multivariate dependence, i.e. the mutual (in)dependence of an arbitrary number of random vectors. This approach is developed in [9] accompanied by extensive examples and further applications in [7]. All functions required for the use of generalized distance covariance in applications are implemented in the R package multivariance [8].

Acknowledgement

We are grateful to Ulrich Brehm (TU Dresden) for insightful discussions on (elementary) symmetric polynomials and to Georg Berschneider (TU Dresden) who read and commented on the whole text. We would also like to thank Gabor J. Székely (NSF) for advice on the current literature.

Footnotes

² This is an immediate consequence of the characterization of independence using characteristic functions: X, Y are independent if, and only if, $\mathbb{E}{\mathrm{e}}^{\mathrm{i}\xi \cdot X+\mathrm{i}\eta \cdot Y}=\mathbb{E}{\mathrm{e}}^{\mathrm{i}\xi \cdot X}\mathbb{E}{\mathrm{e}}^{\mathrm{i}\eta \cdot Y}$ for all $\xi ,\eta $.

³ This is a convenient way to say that $\mathcal{L}(U)=\mathcal{L}((X,Y))$ while $\mathcal{L}(V)=\mathcal{L}(X)\otimes \mathcal{L}(Y)$.

⁴ This argument also shows that the product measure ρ can only be a Lévy measure, if the marginals are finite measures. In this case, ρ is itself a finite measure.

⁵ In other words: in an expression of the form $f({X_{i}},{X_{j}},{Y_{k}},{Y_{l}})$ all random variables are independent if, and only if, all indices are different. As soon as two indices coincide, we have (some kind of) dependence.

⁶ As before, we denote in these formulae by $({X_{i}},{Y_{i}})$, $i=1,\dots ,6$, i.i.d. copies of $(X,Y)$.

References

[1]

Bakirov, N.K., Székely, G.J.: Brownian covariance and central limit theorem for stationary sequences. Theory of Probability and its Applications 55(3), 371–394 (2011) MR2768533. https://doi.org/10.1137/S0040585X97984954

[2]

Banyamini, Y., Lindenstrauss, J.: Geometric Nonlinear Functional Analysis. American Mathematical Society, Providence (RI) (2000) MR1727673

[3]

Berg, C., Forst, G.: Potential Theory on Locally Compact Abelian Groups. Springer, Berlin (1975) MR0481057

[4]

Berrett, T.B., Samworth, R.J.: Nonparametric independence testing via mutual information. arXiv: 1711.06642v1 (2017)

[5]

Berschneider, G., Böttcher, B.: On complex Gaussian random fields, Gaussian quadratic forms and sample distance multivariance. arXiv: 1808.07280v1 (2018)

[6]

Bickel, P.J., Xu, Y.: Discussion of: Brownian distance covariance. The Annals of Applied Statistics 3(4), 1266–1269 (2009) MR2752128. https://doi.org/10.1214/09-AOAS312A

[7]

Böttcher, B.: Dependence structures - estimation and visualization using distance multivariance. arXiv: 1712.06532v1 (2017)

[8]

Böttcher, B.: Multivariance: Measuring Multivariate Dependence Using Distance Multivariance. (2017). R package version 1.0.5

[9]

Böttcher, B., Keller-Ressel, M., Schilling, R.L.: Distance Multivariance: New dependence measures for random vectors (submitted). Revised version of arXiv: 1711.07775v1 (2017)

[10]

Böttcher, B., Schilling, R.L., Wang, J.: Lévy-Type Processes: Construction, Approximation and Sample Path Properties. Lecture Notes in Mathematics, Lévy Matters, vol. 2099. Springer (2013) MR3156646. https://doi.org/10.1007/978-3-319-02684-8

[11]

Cope, L.: Discussion of: Brownian distance covariance. The Annals of Applied Statistics 3(4), 1279–1281 (2009). doi:https://doi.org/10.1214/00-AOAS312C MR2752130. https://doi.org/10.1214/00-AOAS312C

[12]

Csörgő, S.: Limit behaviour of the empirical characteristic function. The Annals of Probability, 130–144 (1981)MR0606802

[13]

Csörgő, S.: Multivariate empirical characteristic functions. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 55(2), 203–229 (1981). MR0608017. https://doi.org/10.1007/BF00535160

[14]

Csörgő, S.: Testing for independence by the empirical characteristic function. Journal of Multivariate Analysis 16(3), 290–299 (1985) MR0793494. https://doi.org/10.1016/0047-259X(85)90022-3

[15]

Feuerverger, A.: Discussion of: Brownian distance covariance. The Annals of Applied Statistics 3(4), 1282–1284 (2009) MR2752131. https://doi.org/10.1214/09-AOAS312D

[16]

Feuerverger, A., Mureika, R.A.: The empirical characteristic function and its applications. The Annals of Statistics, 88–97 (1977)MR0428584

[17]

Genovese, C.R.: Discussion of: Brownian distance covariance. The Annals of Applied Statistics 3(4), 1299–1302 (2009). doi:https://doi.org/10.1214/09-AOAS312G MR2752134. https://doi.org/10.1214/09-AOAS312G

[18]

Gretton, A., Fukumizu, K., Sriperumbudur, B.K.: Discussion of: Brownian distance covariance. The Annals of Applied Statistics 3(4), 1285–1294 (2009). MR2752132. https://doi.org/10.1214/09-AOAS312E

[19]

Jacob, N.: Pseudo-Differential Operators and Markov Processes I. Fourier Analysis and Semigroups. Imperial College Press, London (2001) MR1873235. https://doi.org/10.1142/9781860949746

[20]

Jacob, N., Knopova, V., Landwehr, S., Schilling, R.L.: A geometric interpretation of the transition density of a symmetric Lévy process. Science China: Mathematics 55, 1099–1126 (2012) MR2925579. https://doi.org/10.1007/s11425-012-4368-0

[21]

Jin, Z., Matteson, D.S.: Generalizing Distance Covariance to Measure and Test Multivariate Mutual Dependence. arXiv: 1709.02532v1 (2017)

[22]

Kosorok, M.R.: Discussion of: Brownian distance covariance. The Annals of Applied Statistics 3(4), 1270–1278 (2009). doi:https://doi.org/10.1214/09-AOAS312B MR2752129. https://doi.org/10.1214/09-AOAS312B

[23]

Lyons, R.: Distance covariance in metric spaces. The Annals of Probability 41(5), 3284–3305 (2013) MR3127883. https://doi.org/10.1214/12-AOP803

[24]

Murata, N.: Properties of the empirical characteristic function and its application to testing for independence. In: Third International Workshop on Independent Component Analysis and Signal Separation (ICA2001), pp. 295–300 (2001)

[25]

Newton, M.A.: Introducing the discussion paper by Székely and Rizzo. The Annals of Applied Statistics 3(4), 1233–1235 (2009). MR2752126. https://doi.org/10.1214/09-AOAS34INTRO

[26]

Rémillard, B.: Discussion of: Brownian distance covariance. The Annals of Applied Statistics 3(4), 1295–1298 (2009). doi:https://doi.org/10.1214/09-AOAS312F MR2752133. https://doi.org/10.1214/09-AOAS312F

[27]

Sasvári, Z.: Positive Definite and Definitizable Functions. Akademie-Verlag, Berlin (1994) MR1270018

[28]

Sato, K.: Lévy Processes and Infinitely Divisible Distributions. Cambridge University Press, Cambridge (1999) MR1739520

[29]

Schilling, R., Schnurr, A.: The symbol associated with the solution of a stochastic differential equation. Electronic Journal of Probability 15, 1369–1393 (2010) MR2721050. https://doi.org/10.1214/EJP.v15-807

[30]

Schilling, R.L., Song, R., Vondraček, Z.: Bernstein Functions. Theory and Applications, 2nd edn. de Gruyter (2012) MR2978140. https://doi.org/10.1515/9783110269338

[31]

Schoenberg, I.J.: Metric spaces and positive definite functions. Transactions of the American Mathematical Society 44(3), 522–536 (1938) MR1501980. https://doi.org/10.2307/1989894

[32]

Serfling, R.J.: Approximation Theorems of Mathematical Statistics. John Wiley & Sons (2009) MR0595165

[33]

Székely, G.J., Rizzo, M.L.: Hierarchical clustering via joint between-within distances: Extending ward’s minimum variance method. Journal of Classification 22(2), 151–183 (2005) MR2231170. https://doi.org/10.1007/s00357-005-0012-9

[34]

Székely, G.J., Rizzo, M.L.: Brownian distance covariance. The Annals of Applied Statistics 3(4), 1236–1265 (2009) MR2752127. https://doi.org/10.1214/09-AOAS312

[35]

Székely, G.J., Rizzo, M.L.: Rejoinder: Brownian distance covariance. The Annals of Applied Statistics 3(4), 1303–1308 (2009). doi:https://doi.org/10.1214/09-AOAS312REJ MR2752135. https://doi.org/10.1214/09-AOAS312REJ

[36]

Székely, G.J., Rizzo, M.L.: On the uniqueness of distance covariance. Statistics and Probability Letters 82(12), 2278–2282 (2012) MR2979766. https://doi.org/10.1016/j.spl.2012.08.007

[37]

Székely, G.J., Rizzo, M.L., Bakirov, N.K.: Measuring and testing dependence by correlation of distances. The Annals of Statistics 35(6), 2769–2794 (2007) MR2382665. https://doi.org/10.1214/009053607000000505

[38]

Ushakov, N.G.: Selected Topics in Characteristic Functions. VSP (1999) MR1745554. https://doi.org/10.1515/9783110935981

[39]

Witting, H., Müller-Funk, U.: Mathematische Statistik II. Teubner, Stuttgart (1995) MR1363716. https://doi.org/10.1007/978-3-322-90152-1

[40]

Zastavnyi, V.P.: On positive definiteness of some functions. Journal of Multivariate Analysis 73(3), 55–81 (2000) MR1766121. https://doi.org/10.1006/jmva.1999.1864

Exit Reading

Table of contents

1 Introduction
2 Fundamental results
3 Generalized distance covariance
4 Estimating generalized distance covariance
5 Remarks on the uniqueness of the Cauchy distance covariance
6 Generalized distance correlation
7 Gaussian covariance
8 Conclusion
Acknowledgement
Footnotes
References

RSS

Authors

Abstract

1 Introduction

(1)

(2)

(3)

Notation.

2 Fundamental results

2.1 Negative definite functions

Theorem 2.1.

(4)

(5)

(6)

Table 1.

Lemma 2.2.

Proof of Lemma 2.2.

(7)

(8)

(9)

2.2 Measuring independence of random variables with metrics

(10)

Definition 2.3.

(11)

Proposition 2.4.

(12)

(13)

(14)

(15)

Proof.

(16)

(17)

Remark 2.5.

(18)

(19)

(20)

Lemma 2.6.

(21)

Proof.

2.3 An elementary estimate for log-moments

Lemma 2.7.

Proof.

3 Generalized distance covariance

Definition 3.1.

(22)

(23)

Lemma 3.2.

3.1 Generalized distance covariance with finite Lévy measures

(24)

(25)

Proposition 3.3.

(26)

(27)

(28)

(29)

Corollary 3.4.

(30)

(31)

(32)

(33)

(34)

Corollary 3.5.

(35)

Proof.

3.2 Generalized distance covariance with arbitrary Lévy measures

(36)

Lemma 3.6.

(37)

Proof.

(38)

(39)

(40)

Theorem 3.7.

(41)

(42)

Proof.

Remark 3.8.

(43)

(44)

4 Estimating generalized distance covariance

(45)