Optimal transport between determinantal point processes and application to fast simulation

Decreusefond, Laurent; Moroz, Guillaume

doi:10.15559/21-VMSTA180

Abstract

Two optimal transport problems between determinantal point processes (DPP for short) are investigated. It is shown how to estimate the Kantorovitch–Rubinstein and Wasserstein-2 distances between distributions of DPP. These results are applied to evaluate the accuracy of a fast but approximate simulation algorithm of the Ginibre point process restricted to a circle. One can now simulate in a reasonable amount of time more than ten thousands points.

1 Introduction

Determinantal point processes (DPP) have been introduced in the seventies [24] to model fermionic particles with repulsion like electrons. They recently regained interest since they represent locations of eigenvalues of some random matrices. A determinantal point process is characterized by an integral operator with kernel K and a reference measure m. The integral operator is compact and symmetric and thus characterized by its eigenfunctions and its eigenvalues. Following [18], the eigenvalues are not measurable functions of the realizations of the point process so it is difficult to devise how a modification of the eigenfunctions, respectively of the eigenvalues or of the reference measure, modifies the random configurations of a DPP.

A careful analysis of the simulation algorithm given in [18] yields several answers to these questions. For instance, it is clear that the eigenvalues control the distribution of the number of points and the eigenfunctions determine the positions of the atoms once their number is known. The above mentioned algorithm is a beautiful piece of work but requires to draw points according to distributions whose densities are not expressed as combinations of classical functions, hence the necessity to use rejection sampling method. Unfortunately, as the number of drawn points increases, the densities present high peaks and deep valleys which is the most adverse situation for rejection sampling, see Figure 1.

Fig. 1.

Peaks and valleys of some densities

As a consequence, it is hardly feasible to simulate a DPP with more than one thousand points in less than a few hours. As some DPP arise as locations of eigenvalues of some matrix ensembles, it may seem faster and simpler to draw random matrices and compute their eigenvalues with the optimized libraries to do so. However, there are several drawbacks to this approach: a) we cannot control the domain into which the points fall, and for some applications it may be important to simulate DPP restricted to some compact sets, b) as eigenvalues belong to R or C, we cannot imagine DPP in higher dimensions with this approach, c) for a stationary DPP, it is often useful to simulate under the Palm measure (see below) which no longer corresponds to a known random matrix ensemble.

Several refinements of Algorithm 1 have been proposed along the years but the most advanced contributions have been made for DPP on lattices, which are of a totally different nature than continuous DPP. We here propose to speed up the simulation of a DPP by reducing the number of eigenvalues considered and approximating the eigenfunctions by functions whose quadrature can be easily inversed to get rid of the rejection part.

We evaluate the impact of these approximations by bounding the distances between the original distribution of the DPP to be simulated and the distribution according to which the points are drawn. The computations of the error margin are specific to the Ginibre DPP but could be done for other processes with radial symmetry like polyanalytic Ginibre ensembles [17] or the Bergman DPP [19].

Actually, there are several notions of distances between the distributions of point processes (see [11] and references therein). We focus here on the total variation distance and on the quadratic Wasserstein distance. The former counts the difference of the number of points in an optimal coupling between two distributions. The latter evaluates the matching distance between two realizations of an optimal coupling provided that it exists.

The paper is organized as follows. We first recall the definition and salient properties of DPP. In Section 3, we briefly introduce the optimal transportation problem in its full generality and give some elements dedicated to point processes. In Section 4, we show how the eigenvalues and eigenfunctions do appear in the evaluation of the distances under scrutiny. In Section 5, we apply these results to the simulation of the Ginibre process.

2 Determinantal point processes

Let E be a Polish space, $\mathcal{O}(E)$ the family of all nonempty open subsets of E and $\mathcal{B}$ denotes the corresponding Borel σ-algebra. In the sequel, m is a Radon measure on $(E,\mathcal{B})$. Let ${\mathfrak{N}_{E}}$ be the space of locally finite subsets of E, also called the configuration space:

\[ {\mathfrak{N}_{E}}=\{\xi \subset E\hspace{0.1667em}:\hspace{0.1667em}|\Lambda \cap \xi |<\infty \hspace{0.1667em}\hspace{2.5pt}\text{for any compact set}\hspace{2.5pt}\Lambda \subset E\},\]

equipped with the topology of the vague convergence – the topology generated by the seminorms

\[ {p_{f}}(\xi )=\left|\sum \limits_{x\in \xi }f(x)\right|\]

for $f\hspace{0.1667em}:\hspace{0.1667em}E\to \mathbf{R}$ continuous with compact support. This makes ${\mathfrak{N}_{E}}$ a Polish space and we denote by ${\mathcal{F}_{{\mathfrak{N}_{E}}}}$ its Borelean σ-field (see [20]). We call elements of ${\mathfrak{N}_{E}}$ configurations and identify a locally finite configuration ξ with the atomic Radon measure ${\textstyle\sum _{y\in \xi }}{\delta _{y}}$, where we have written ${\delta _{y}}$ for the Dirac measure at $y\in E$.

Next, let ${\mathfrak{N}_{E}^{f}}=\{\xi \in {\mathfrak{N}_{E}}\hspace{0.1667em}:\hspace{0.1667em}\xi (E)<\infty \}$, the space of all finite configurations on E. It is naturally equipped with the trace σ-algebra ${\mathcal{F}_{{\mathfrak{N}_{E}^{f}}}}=\mathcal{F}{|_{{\mathfrak{N}_{E}^{f}}}}$. Note that

\[ {\mathfrak{N}_{E}^{f}}={\bigcup \limits_{n=0}^{\infty }}{\mathfrak{N}_{E}^{(n)}}\]

where

\[ {\mathfrak{N}_{E}^{(n)}}=\{\xi \in {\mathfrak{N}_{E}^{f}},\hspace{0.1667em}\xi (E)=n\}.\]

Since ${\mathfrak{N}_{E}^{(n)}}$ can be identified with ${E^{n}}/{\mathfrak{S}_{n}}$ where ${\mathfrak{S}_{n}}$ is the group of permutations over n elements, every function $f\hspace{0.1667em}:\hspace{0.1667em}{\mathfrak{N}_{E}^{f}}\to \mathbf{R}$ is in fact equivalent to a family of functions $({f_{n}},\hspace{0.1667em}n\ge 1)$ where ${f_{n}}$ is a symmetric function from ${E^{n}}$ to R.

Definition 1.

A locally finite point process is defined as a probability measure on $({\mathfrak{N}_{E}},{\mathcal{F}_{{\mathfrak{N}_{E}}}})$ or as an ${\mathfrak{N}_{E}}$-valued random variable.

A finite point process is a probability measure on $({\mathfrak{N}_{E}^{f}},{\mathcal{F}_{{\mathfrak{N}_{E}^{f}}}})$ or an ${\mathfrak{N}_{E}^{f}}$-valued random variable.

For ${\Xi _{\mu }}$, an ${\mathfrak{N}_{E}^{f}}$-valued random variable of distribution μ, we denote by ${\mu _{n}}$ the distribution of ${\Xi _{\mu }}$ given ${\Xi _{\mu }}(E)=n$. In view of the previous remark, it can be seen as a probability measure on ${E^{n}}$, invariant by the action of ${\mathfrak{S}_{n}}$, which describes the location of the particles given their number.

A random point process μ is characterized by its Laplace transform, which is defined for any measurable nonnegative function f on E as

\[ {\mathcal{L}_{\mu }}(f)={\int _{{\mathfrak{N}_{E}}}}{e^{-{\textstyle\sum _{x\in \xi }}f(x)}}\operatorname{d}\hspace{-0.1667em}\mu (\xi ).\]

A point process is simple if with probability 1, for any $x\in E$, ${\Xi _{\mu }}(\{x\})\le 1$. In the sequel, we consider only simple point processes so that we remove the word simple. Point processes are also often characterized via their correlation functions defined as follows.

Definition 2.

A point process μ is said to have correlation functions $({\rho _{n}},\hspace{0.1667em}n\ge 1)$ if for any ${A_{1}},\dots ,{A_{n}}$, disjoint bounded Borel subsets of E,

\[ \mathbf{E}\left[{\prod \limits_{i=1}^{n}}{\Xi _{\mu }}({A_{i}})\right]={\int _{{A_{1}}\times \cdots \times {A_{n}}}}{\rho _{n}}({x_{1}},\dots ,{x_{n}})\operatorname{d}\hspace{-0.1667em}m({x_{1}})\dots \operatorname{d}\hspace{-0.1667em}m({x_{n}}).\]

We see that ${\rho _{1}}$ is the mean density of particles with respect to m, and

\[ {\rho _{n}}({x_{1}},\dots ,{x_{n}})\operatorname{d}\hspace{-0.1667em}m({x_{1}})\dots \operatorname{d}\hspace{-0.1667em}m({x_{n}})\]

is the probability of finding at least one particle in the vicinity of each ${x_{i}}$, $i=1,\dots ,n$.

Definition 3.

A measure μ on ${\mathfrak{N}_{E}^{f}}$ is regular with respect to the reference measure m when, for all $n\ge 1$, there exist

\[\begin{aligned}{}{j_{n}}\hspace{0.1667em}:\hspace{0.1667em}{\mathfrak{N}_{E}^{(n)}}& \longrightarrow {\mathbf{R}^{+}},\\ {} \{{x_{1}},\dots ,{x_{n}}\}& \longmapsto {j_{n}}({x_{1}},\dots ,{x_{n}})\end{aligned}\]

where ${j_{n}}$ is symmetric on ${E^{n}}$ such that for any measurable bounded $f:{\mathfrak{N}_{E}^{f}}\to \mathbf{R}$,

(1)

\[ E[f({\Xi _{\mu }})]=f(\varnothing )+{\sum \limits_{n=1}^{\infty }}\frac{1}{n!}{\int _{{E^{n}}}}f({x_{1}},\dots ,{x_{n}}){j_{n}}({x_{1}},\dots ,{x_{n}})\operatorname{d}\hspace{-0.1667em}m({x_{1}})\dots \operatorname{d}\hspace{-0.1667em}m({x_{n}}).\]

The function ${j_{n}}$ is called the n-th Janossy density. Intuitively, it can be viewed as the probability to have exactly n points located in the vicinity of $({x_{1}},\dots ,{x_{n}})$.

For details about the relationships between correlation functions and Janossy densities, see [7].

2.1 Determinantal point processes

For details, we mainly refer to [30]. A (symmetric) determinantal point process on X is characterized by a kernel K and a reference measure m. The map K is supposed to be a Hilbert–Schmidt operator from ${L^{2}}(E,\hspace{0.1667em}m)$ into ${L^{2}}(E,\hspace{0.1667em}m)$ which satisfies the following conditions:

1. K is a bounded symmetric integral operator on ${L^{2}}(E,\hspace{0.1667em}m)$, with kernel $K(\cdot ,\cdot )$, i.e., for any $x\in E$,
\[ Kf(x)={\int _{E}}K(x,y)f(y)\operatorname{d}\hspace{-0.1667em}m(y).\]
2. The spectrum of K is included in $[0,\hspace{0.1667em}1]$.
3. The map K is locally trace-class, i.e., for all compact $\Lambda \subset E$, the restriction ${K_{\Lambda }}={P_{\Lambda }}K{P_{\Lambda }}$ of K to ${L^{2}}(\Lambda ,\hspace{0.1667em}m)$ is trace-class. Here and hereafter, ${P_{\Lambda }}$ denotes the orthogonal projection from ${L^{2}}(E,\hspace{0.1667em}m)$ onto ${L^{2}}(\Lambda ,\hspace{0.1667em}m)$.

Definition 4.

The determinantal measure on ${\mathfrak{N}_{E}}$ with characteristics K and m can be defined through its correlation functions

\[ {\rho _{n,\hspace{0.1667em}K}}({x_{1}},\dots ,\hspace{0.1667em}\hspace{0.1667em}{x_{n}})=\det \big(K{\left({x_{k}},\hspace{0.1667em}{x_{l}}\big)\right)_{1\le k,l\le n}},\]

and for $n=0$, ${\rho _{0,\hspace{0.1667em}K}}(\varnothing )=1$.

There is a particular class of DPP which is the building blocks on which general DPP are built upon.

Definition 5.

A DPP whose spectrum is reduced to the singleton $\{1\}$ is called a projection DPP. Actually, its kernel is of the form

\[ {K_{\phi }}(x,y)={\sum \limits_{j=0}^{M}}{\phi _{j}}(x){\phi _{j}}(y)\]

where $M\in \mathbf{N}\cup \{\infty \}$ and $({\phi _{j}},\hspace{0.1667em}j=0,\dots ,M)$ is a family of orthonormal functions of ${L^{2}}(E,m)$.

If M is finite, then almost all configurations of such a point process have M atoms.

Alternatively, when the spectrum of K does not contain 1, we can define a DPP through its Janossy densities. In this situation, the properties of K ensure that there exists a sequence $({\lambda _{j}},\hspace{0.1667em}j\ge 0)$ of elements of $[0,1)$ with no accumulation point but 0 and a complete orthonormal basis $({\phi _{j}},\hspace{0.1667em}j\ge 0)$ of ${L^{2}}(E,m)$ such that

\[ {K_{\phi }}(x,y)=\sum \limits_{j\ge 0}{\lambda _{j}}\hspace{2.5pt}{\phi _{j}}(x){\phi _{j}}(y).\]

Note that if ${L^{2}}(E,\hspace{0.1667em}m)$ is a C-vector space, we must modify this definition accordingly:

\[ {K_{\phi }}(x,y)=\sum \limits_{j\ge 0}{\lambda _{j}}\hspace{2.5pt}{\phi _{j}}(x)\overline{{\phi _{j}}(y)}.\]

For a compact subset $\Lambda \subset X$, the map ${J_{\Lambda }}$ is defined by

\[ {J_{\Lambda }}={\left(\operatorname{Id}-{K_{\Lambda }}\right)^{-1}}{K_{\Lambda }},\]

so that we have

\[ \left(\operatorname{Id}-{K_{\Lambda }}\right)\left(\operatorname{Id}+{J_{\Lambda }}\right)=\operatorname{Id}.\]

For any compact $\Lambda \subset E$, the operator ${J_{\Lambda }}$ is a Hilbert–Schmidt, trace-class operator, whose spectrum is included in $[0,+\infty )$. We denote by ${J_{\Lambda }}$ its kernel. For any $n\in \mathbf{N}$, any compact $\Lambda \subset E$, and any $({x_{1}},\dots ,\hspace{0.1667em}{x_{n}})\in {\Lambda ^{n}}$, the n-th Janossy density is given by

(2)

\[ {j_{n}^{\Lambda }}\left({x_{1}},\hspace{0.1667em}\dots ,\hspace{0.1667em}{x_{n}}\right)=\det {\left({J_{\Lambda }}\left({x_{k}},\hspace{0.1667em}{x_{l}}\right)\right)_{1\le k,l\le n}}.\]

We can now state how the characteristics of a DPP are modified by some usual transformations on the configurations.

Theorem 6.

Let μ be a DPP on ${\mathbf{R}^{k}}$ with kernel K and reference measure $m=h\operatorname{d}\hspace{-0.1667em}x$. Let $({\lambda _{j}},\hspace{0.1667em}j\ge 0)$ be its eigenvalues counted with multiplicity and $({\phi _{j}},\hspace{0.1667em}j\ge 0)$ the corresponding eigenfunctions.

1. A random thinning of probability p (i.e. we keep each atom with probability p independently of the others) transforms μ into a DPP with kernel $pK$.
2. A dilation of ratio ρ transforms μ into a DPP of kernel
\[ {K_{\rho }}(x,y)=\frac{1}{\rho }\hspace{0.1667em}K({\rho ^{-1/k}}x,\hspace{0.1667em}{\rho ^{-1/k}}y).\]
3. If H is a ${\mathcal{C}^{1}}$-diffeomorphism on E, then
\[\begin{aligned}{}\mathcal{H}\hspace{0.1667em}:\hspace{0.1667em}{\mathfrak{N}_{E}}& \longrightarrow {\mathfrak{N}_{E}},\\ {} \sum \limits_{x\in \xi }{\delta _{x}}& \longmapsto \sum \limits_{x\in \xi }{\delta _{H(x)}}\end{aligned}\]
transforms μ into a DPP of kernel
\[ {K_{H}}(x,y)=K\left({H^{-1}}(x),\hspace{0.1667em}{H^{-1}}(y)\right)\]
and reference measure $m\circ {H^{-1}}$, the image measure of m by H, see [6].
4. If $K(x,y)=k(x-y)$ then μ is stationary [22] and thus admits a stationary Palm measure ${\mu ^{0}}$. From [30, Theorem 6.5], we know that ${\mu ^{0}}$ is distributed as the DPP of kernel

(3)
\[ {K^{0}}(x,y)=K(x,y)-\frac{K(x,0)K(0,y)}{K(0,0)}\cdot \]

Remark 1.

It is straightforward to see that the spectrum of ${K_{H}}$ in ${L^{2}}(E,\hspace{0.1667em}m\circ {H^{-1}})$ is the same as the spectrum of K in ${L^{2}}(E,\hspace{0.1667em}m)$. Actually, this transformation will be a particular case of the optimal maps obtained in solving the MKP for the Wassertein-2 distance (see Theorem 17).

Remark 2.

The Ginibre point process, denoted by $\mathfrak{G}$, which will be our toy model, is the stationary DPP with kernel

\[ K(x,y)=\frac{1}{\pi }{e^{-\frac{1}{2}(|x{|^{2}}+|y{|^{2}})}}{e^{x\overline{y}}}\]

for $x,y\in \mathbf{C}$. Expanding the right-most exponential we obtain

(4)

\[ K(x,y)={\sum \limits_{k=0}^{\infty }}\frac{{x^{k}}}{\sqrt{\pi }\sqrt{k!}}{e^{-|x{|^{2}}/2}}\hspace{2.5pt}\frac{{\overline{y}^{k}}}{\sqrt{\pi }\sqrt{k!}}{e^{-|y{|^{2}}/2}}.\]

By a change of variables in polar coordinates, we have

\[\begin{aligned}{}{\int _{\mathbf{C}}}{({x_{1}}+i{x_{2}})^{k}}{({x_{1}}-i{x_{2}})^{m}}{e^{-({x_{1}^{2}}+{x_{2}^{2}})}}\operatorname{d}\hspace{-0.1667em}{x_{1}}\operatorname{d}\hspace{-0.1667em}{x_{2}}& =2\pi {\int _{{\mathbf{R}^{+}}}}{r^{2k+1}}{e^{-{r^{2}}}}\operatorname{d}\hspace{-0.1667em}r\hspace{2.5pt}{\mathbf{1}_{\{k=m\}}}\\ {} & =\pi \hspace{0.1667em}k!{\mathbf{1}_{\{k=m\}}},\end{aligned}\]

so that (4) gives the orthonormal eigenfunction expansion of K. If we truncate this expansion to order N, we obtain the kernel of the point process made by the eigenvalues of complex $N\times N$ Hermitian matrices with Gaussian entries. This would yield an easy way to simulate an approximation of $\mathfrak{G}$ limited by the effectiveness of the algorithms which compute the eigenvalues. There are several reasons for which this random matrix based algorithm is not satisfactory, all governed by the practical applications of the Ginibre point process as a model for locations of some items with repulsiveness. For instance, in mobile telecommunications systems, antennas cannot be too close to each other if we want to mitigate interference. This proscribes to use Poisson point processes, which are the paragon of processes with no dependency among the particles, as models for their distribution and suggests to use point processes with repulsiveness, like DPP. The first problems with the standard Ginibre point process are that we cannot modify neither the mean number of points per unit of surface, which turns to be $1/\pi $, nor the strength of the repulsiveness. We then introduce the $(\lambda ,\beta )$-Ginibre point process, denoted by ${\mathfrak{G}^{\lambda ,\beta }}$, whose kernel is given by

(5)

\[ {K_{\lambda ,\beta }}(x,y)=\frac{\lambda }{\pi }{e^{-\frac{\lambda }{2\beta }(|x{|^{2}}+|y{|^{2}})}}{e^{\frac{\lambda }{\beta }x\overline{y}}}\]

where $\lambda \in {\mathbf{R}^{+}}$ and $\beta \in (0,1)$. The mean number per unit of surface of ${\mathfrak{G}^{\lambda ,1}}$ is $\lambda /\pi $ and ${\mathfrak{G}^{\lambda ,\beta }}$ is obtained by the β-thinning of ${\mathfrak{G}^{\lambda ,1}}$ followed by a $\sqrt{\beta }$ rescaling, so that the mean number of points per unit of surface remains unchanged. We know that ${\mathfrak{G}^{\lambda ,\beta }}$ converges in distribution to a Poisson point process of intensity $\lambda /\pi $ when β goes to zero [9]. This means that the more severe the thinning is (i.e. β small), the more the repulsiveness decreases. In [16], it is shown that the locations of antennas in mobile systems of the last generation can be considered as realizations of ${\mathfrak{G}^{\lambda ,\beta }}$ with λ varying between 1.5 and 3.5 and $\beta \in [0.6,\hspace{0.1667em}1]$, depending on the operator which deployed the antennas. There is a priori no set of random matrices whose eigenvalues exhibit such a behavior. Furthermore, in practical situations, we are often forced to simulate point processes under their Palm measure. We know from [15], that the Palm measure of $\mathfrak{G}$ is realized by adding an atom at 0 and removing the atom of $\mathfrak{G}$ with a modulus distributed as the absolute value of a Gaussian random variable. On a given realization, this atom is clearly impossible to identify but in view of (3), we see that

\[ {K^{0}}(x,y)=\frac{1}{\pi }{e^{-(|x{|^{2}}+|y{|^{2}})/2}}({e^{x\overline{y}}}-1).\]

This amounts to remove the first eigenfunction and to the best of our knowledge, this does not coincide anymore with a DPP arising from a known space of random matrices. All these considerations show how important it is to have a good simulation tool of DPP based solely on the eigenfunction expansion.

2.1.1 Simulation of DPP

The simulation algorithm introduced in [18] produces random configurations distributed according to a determinantal point process. It is based on the following lemma.

Lemma 7.

Let ${\mu _{K,m}}$ be a determinantal point process of a trace-class kernel K and reference measure m. Let $\operatorname{sp}(K;{L^{2}}(E,m))=\{{\lambda _{j}},j\ge 0\}$ and $({\phi _{j}},\hspace{0.1667em}j\ge 0)$ be a CONB of ${L^{2}}(E,\hspace{0.1667em}m)$ composed of eigenfunctions of K. Let $(\operatorname{B}({\lambda _{j}}),j\ge 0)$ be a family of independent Bernoulli random variables of respective parameter ${\lambda _{j}}$. Let

\[ I=\{j\ge 0,\hspace{0.1667em}\operatorname{B}({\lambda _{j}})=1\}.\]

Since $\mathbf{E}\left[|I|\right]={\textstyle\sum _{j=0}^{\infty }}{\lambda _{j}}=\operatorname{trace}(K)<\infty $, I is a.s. a finite subset of N. Consider

\[ {K_{I}}(x,y)=\sum \limits_{j\in I}{\phi _{j}}(x){\phi _{j}}(y)\]

and

\[ {p_{I}}({x_{1}},\dots ,{x_{|I|}})=\frac{1}{|I|!}\det \Big({K_{I}}({x_{k}},{x_{l}}),\hspace{0.1667em}1\le k,l\le |I|\Big).\]

Construct a random configuration ξ as follows: Given I, draw points $({W_{1}},\dots ,{W_{|I|}})$ with joint density ${p_{I}}$. Then ξ is distributed according to ${\mu _{K,m}}$.

In the following, let ${\phi _{I}}(x)=({\phi _{j}}(x),j\in I)$.

Algorithm 1:

Sampling of the locations of the points given the set I of active Bernoulli random variables

We have two kinds of difficulties here: drawing of ${W_{i}}$ according to a density function with no particular feature so we usually have to resort to rejection sampling; furthermore, when $|I|$ is large the computation of the density may be costly as it contains a sum of $|I|$ terms. Figure 1 also suggests that when the number of points becomes high, the profile of the conditional density might be very chaotic with high peaks and deep valleys, involving a large number of rejections in the sampling of this density.

To illustrate this second difficulty, we simulate ${\mathfrak{G}^{\lambda ,\beta }}$ restricted to $B(0,R)$, which we denote by ${\mathfrak{G}_{R}^{\lambda ,\beta }}$. A nice feature of radially symmetric DPP, like Ginibre, Bergmann or polyanalytic Ginibre point processes is that the eigenfunctions expansion of their restriction to a ball is easily calculated by just renormalizing the eigenvalues and the eigenfunctions. For ${\mathfrak{G}_{R}^{\lambda ,\beta }}$, we obtain

\[\begin{aligned}{}{\lambda _{j}}& =\frac{\gamma (j+1,{R^{2}})}{j!},\\ {} {\phi _{j}}(x)& =\sqrt{\frac{\lambda }{\pi \hspace{2.5pt}\gamma (j+1,{R^{2}})}}\hspace{2.5pt}{\left(\sqrt{\frac{\lambda }{\beta }}\hspace{0.1667em}x\right)^{j}}{e^{-\frac{\lambda }{2\beta }|x{|^{2}}}}.\end{aligned}\]

For a simulation of a realization of ${\mathfrak{G}_{R}^{\lambda ,\beta }}$, we define the overhead due to rejections by the ratio of the total number of points drawn and the number of accepted points.

Table 1.

Average overhead (over 10 runs) due to rejections in the simulation of ${\mathfrak{G}_{10}^{3,\beta }}$

β	1	0.7	0.5	0.25	0.1
Overhead	3.1	4.5	6.1	13	29

These are the problems we intend to address in the following.

Remark 3.

Note that this algorithm is fully applicable even if E is a discrete finite space. It has been improved in several ways [21, 31, 14] but when it comes to simulate a DPP with a large number of points as it is necessary in some applications [4], the best way remains to use MCMC methods [1]. By its very construction, this last approach is not applicable when the underlying space E is continuous, though there exist some approximate algorithms (see [28] and references therein) based on it.

3 Distances derived from optimal transport

For details on optimal transport in ${\mathbf{R}^{d}}$ and in general Polish spaces, we refer to [33, 32]. For two Polish spaces X and Y, for μ (respectively ν) a probability measure on X (respectively Y), $\Sigma (\mu ,\hspace{0.1667em}\nu )$ is the set of probability measures on $X\times Y$ whose first marginal is μ and second marginal is ν. We also need to consider a lower semicontinuous function c from $X\times Y$ to ${\mathbf{R}^{+}}$. The Monge–Kantorovitch problem associated to μ, ν and c, denoted by $\operatorname{MKP}(\mu $, ν, c) for short, consists in finding

(6)

\[ \underset{\gamma \in \Sigma (\mu ,\hspace{0.1667em}\nu )}{\inf }{\int _{X\times Y}}c(x,y)\operatorname{d}\hspace{-0.1667em}\gamma (x,\hspace{0.1667em}y).\]

More precisely, since X and Y are Polish and c is l.s.c., it is known from the general theory of optimal transportation, that there exists an optimal measure $\gamma \in \Sigma (\mu ,\hspace{0.1667em}\nu )$ and that the minimum coincides with

\[ \underset{(F,\hspace{0.1667em}G)\in {\Phi _{c}}}{\sup }\left({\int _{X}}F\operatorname{d}\hspace{-0.1667em}\mu +{\int _{Y}}G\operatorname{d}\hspace{-0.1667em}\nu \right),\]

where $(F,\hspace{0.1667em}G)\in {\Phi _{c}}$ if and only if $F\in {L^{1}}(\operatorname{d}\hspace{-0.1667em}\mu )$, $G\in {L^{1}}(\operatorname{d}\hspace{-0.1667em}\nu )$ and $F(x)+G(y)\le c(x,\hspace{0.1667em}y)$. We will denote by ${\mathcal{W}_{c}}(\mu ,\hspace{0.1667em}\nu )$ the value of the infimum in (6). In the sequel, we need the following theorem of Brenier [32, Chapter 2] or [25].

Theorem 8.

Let $c(x,y)={2^{-1}}\| x-y{\| ^{2}}$ be the Euclidean distance on ${\mathbf{R}^{k}}$ and μ, ν two probability measures with finite second moment. If the measure μ is absolutely continuous with respect to the Lebesgue measure, there exists a unique optimal measure ${\gamma _{\textit{opt}}}$ which realizes the minimum in (6). Moreover, there exists a unique function $\psi \hspace{0.1667em}:\hspace{0.1667em}{\mathbf{R}^{k}}\to \mathbf{R}$ such that

\[ y=x-\nabla \psi (x),\hspace{2.5pt}{\gamma _{\textit{opt}}}\textit{-a.s.}\]

Then we have

\[ {\mathcal{W}_{e}}(\mu ,\nu )=\frac{1}{2}{\int _{{\mathbf{R}^{k}}}}\| \nabla \psi {\| _{{\mathbf{R}^{k}}}^{2}}\operatorname{d}\hspace{-0.1667em}\mu .\]

The square root of ${\mathcal{W}_{e}}(\mu ,\nu )$ (the subscript e stands for Euclidean) defines a distance on ${\mathfrak{M}_{1}}({\mathbf{R}^{k}})$, the set of probability measures on ${\mathbf{R}^{k}}$, called the Wasserstein-2 distance.

When $X=Y$ and c is a distance on X, ${\mathcal{W}_{c}}$ also defines a distance on ${\mathfrak{M}_{1}}({\mathbf{R}^{k}})$, often called Kantorovitch–Rubinstein or Wasserstein-1 distance. It admits the alternative characterization.

Theorem 9 (See [12, Chapter 11]).

Let $(X,c)$ be a Polish space. For μ and ν, two probability measures on X,

\[ {\operatorname{dist}_{\operatorname{KR}}}(\mu ,\nu ):={\mathcal{W}_{c}}(\mu ,\nu )=\underset{f\in {\operatorname{Lip}_{1}}(X,c)}{\sup }\left({\int _{X}}f\operatorname{d}\hspace{-0.1667em}\mu -{\int _{X}}f\operatorname{d}\hspace{-0.1667em}\nu \right),\]

where

\[ {\operatorname{Lip}_{1}}(X,c)=\left\{f\hspace{0.1667em}:X\to \mathbf{R},\hspace{2.5pt}\forall x,y\in X,|f(x)-f(y)|\le c(x,y)\right\}.\]

4 Distances between point processes

There are several ways to define a distance between point processes on the same underlying space E. We here focus on two of them. They are constructed similarly: Choose a cost function c on ${\mathfrak{N}_{E}}$ and then consider ${\mathcal{W}_{c}}$ defined by the solution of $\operatorname{MKP}(\mu ,\nu ,c)$ for μ and ν, two elements of ${\mathfrak{M}_{1}}({\mathfrak{N}_{E}})$.

Definition 10.

Consider the distance in total variation ${\operatorname{dist}_{\operatorname{TV}}}$ between two configurations (viewed as discrete measures):

\[ {\operatorname{dist}_{\operatorname{TV}}}(\xi ,\zeta )=(\xi \Delta \zeta )(E)\]

where $\xi \Delta \zeta $ is the symmetric difference between the two sets ξ and ζ, i.e. we count the number of distinct atoms between ξ and ζ. Then, for μ and ν belonging to ${\mathfrak{M}_{1}}({\mathfrak{N}_{E}})$, their Kantorovitch–Rubinstein distance is defined by

(7)

\[\begin{aligned}{}{\operatorname{dist}_{\operatorname{KR}}}(\mu ,\nu )& =\underset{\substack{\text{law}({\Xi _{\mu }})=\mu \\ {} \text{law}({\Xi _{\nu }})=\nu }}{\inf }\mathbf{E}\left[({\Xi _{\mu }}\Delta \hspace{0.1667em}{\Xi _{\nu }})(E)\right]\\ {} & =\underset{f\in {\operatorname{Lip}_{1}}({\mathfrak{N}_{E}},{\operatorname{dist}_{\operatorname{TV}}})}{\sup }\left({\int _{{\mathfrak{N}_{E}}}}f(\xi )\operatorname{d}\hspace{-0.1667em}\mu (\xi )-{\int _{{\mathfrak{N}_{E}}}}f(\zeta )\operatorname{d}\hspace{-0.1667em}\nu (\zeta )\right).\end{aligned}\]

Remark 4.

By the definition of the total variation, it is straightforward that for any compact set $\Lambda \subset E$, the map

\[\begin{aligned}{}{\Omega _{\Lambda }}\hspace{0.1667em}:\hspace{0.1667em}({\mathfrak{N}_{E}},{\operatorname{dist}_{\operatorname{TV}}})& \longrightarrow \mathbf{N},\\ {} \xi & \longmapsto \xi (\Lambda )\end{aligned}\]

is Lipschitz. Let $({\mu _{n}},\hspace{0.1667em}n\ge 1)$ be a sequence of point processes and denote by ${\Xi _{n}}$ an ${\mathfrak{N}_{E}}$-valued random variable whose distribution is ${\mu _{n}}$. Similarly, for another element $\nu \in {\mathfrak{M}_{1}}({\mathfrak{N}_{E}})$, let Υ be an ${\mathfrak{N}_{E}}$-valued random variable whose distribution is ν. In view of (7) and Theorem 9, if ${\operatorname{dist}_{\operatorname{KR}}}({\mu _{n}},\nu )$ tends to zero then for any compact set Λ, the sequence of random variables $({\Xi _{n}}(\Lambda ),\hspace{0.1667em}n\ge 1)$ converges in distribution to $\Upsilon (\Lambda )$.

In the sequel, we assume that $E={\mathbf{R}^{k}}$ and the reference measure m is the Lebesgue measure on ${\mathbf{R}^{k}}$. For the quadratic distance, the cost function is the squared Euclidean distance (the $1/2$ factor is purely cosmetic but traditional)

\[ {c_{e}}=\frac{1}{2}\hspace{0.1667em}\| x-y{\| ^{2}}\]

and we define a cost between configurations (see also [3, 2]) as the lifting of ${c_{e}}$ on ${\mathfrak{N}_{E}}$: For two configurations ${\xi _{1}}$, ${\xi _{2}}$, let

\[ c({\xi _{1}},{\xi _{2}})=\inf \left\{\int {c_{e}}(x,y)\hspace{2.5pt}\operatorname{d}\hspace{-0.1667em}\beta (x,y),\hspace{2.5pt}\beta \in \Sigma ({\xi _{1}},{\xi _{2}})\right\},\]

where $\Sigma ({\xi _{1}},{\xi _{2}})$ denotes the set of $\beta \in {\mathfrak{N}_{E\times E}}$, i.e. configurations on $E\times E$, having marginals ${\xi _{1}}$ and ${\xi _{2}}$. First remark that when ${\xi _{1}}(E)$ is finite, the cost is finite only if ${\xi _{1}}(E)={\xi _{2}}(E)$, otherwise $\Sigma ({\xi _{1}},{\xi _{2}})$ is empty and then, by convention, the cost is infinite.

Fig. 2.

Cost between two configurations. On the left, there is no feasible coupling as the number of atoms are different. When the two configurations have the same cardinality, there are several possible coupling. One is given on the right

Moreover, the optimum cost is attained at the permutation of $\{1,\dots ,{\xi _{1}}(E)\}$ which minimizes the sum of the squared distances

\[ c({\xi _{1}},{\xi _{2}})=\frac{1}{2}\underset{\sigma \in {\mathfrak{S}_{{\xi _{1}}(E)}}}{\min }{\sum \limits_{j=1}^{{\xi _{1}}(E)}}\| {x_{i}}-{y_{\sigma (i)}}{\| ^{2}}\]

where ${\xi _{1}}=\{{x_{j}},1\le j\le {\xi _{1}}(E)\}$ and ${\xi _{2}}=\{{y_{j}},1\le j\le {\xi _{1}}(E)\}$. For infinite configurations, it is not immediate that the cost function so defined has the minimum regularity required to consider an optimal transport problem. According to [29], this is indeed true as c is lower semicontinuous on ${\mathfrak{N}_{E}}\times {\mathfrak{N}_{E}}$. We can then consider the Monge–Kantorovitch problem $\operatorname{MKP}(\mu ,\nu ,c)$ on ${\mathfrak{M}_{1}}({\mathfrak{N}_{E}})$. The main theorem of [8] is the following (see Definition 1 for the notations).

Theorem 11.

Let $\Lambda \subset E$ be a compact set. Let μ be a regular probability measure on ${\mathfrak{N}_{\Lambda }^{f}}$ and ν be a probability measure on ${\mathfrak{N}_{E}}$. We denote by ${j_{n}^{\mu }}$ the n-th order Janossy density of μ. Then ${\mu _{n}}$ satisfies

\[ \operatorname{d}\hspace{-0.1667em}{\mu _{n}}({x_{1}},\dots ,{x_{n}})=\frac{1}{\mu (\zeta (\Lambda )=n)}\hspace{0.1667em}{j_{n}^{\mu }}({x_{1}},\dots ,{x_{n}})\operatorname{d}\hspace{-0.1667em}{x_{1}}\dots \operatorname{d}\hspace{-0.1667em}{x_{n}},\]

and is called the normalized Janossy measure. The Monge–Kantorovitch distance, associated to c, between μ and ν is finite if and only if the following two conditions hold

1. $\mu (\zeta (\Lambda )=n)=\nu (\xi (E)=n):={\mathfrak{c}_{n}}$ for any integer $n\ge 0$,
2. ${\textstyle\sum _{n\ge 1}}{\mathfrak{c}_{n}}\hspace{0.1667em}{\mathcal{W}_{e}}({\mu _{n}},\hspace{0.1667em}{\nu _{n}})$ is finite.

For any $n\ge 1$, let ${\varphi _{n}}\hspace{0.1667em}:\hspace{0.1667em}{\Lambda ^{(n)}}\to {E^{(n)}}$ be the optimal transport map between ${\mu _{n}}$ and ${\nu _{n}}$ given by Theorem 8. Then, for $\xi \in {\mathfrak{N}_{\Lambda }^{(n)}}$,

\[ \zeta ={\varphi _{n}}(\xi ),\hspace{2.5pt}{\gamma _{\textit{opt}}}\textit{-a.s.}\]

Moreover,

(8)

\[ {\mathcal{W}_{c}}(\mu ,\hspace{0.1667em}\nu )=\sum \limits_{n\ge 1}{\mathfrak{c}_{n}}\hspace{2.5pt}{\mathcal{W}_{e}}({\mu _{n}},\hspace{0.1667em}{\nu _{n}}).\]

We denote by ${\operatorname{dist}_{{W_{2}}}}$ the Wasserstein-2 distance between μ and ν defined by

\[ {\operatorname{dist}_{{W_{2}}}}(\mu ,\nu )=\sqrt{{\mathcal{W}_{c}}(\mu ,\nu )}.\]

This means that whenever the distance between μ and ν is finite, there exists a strong coupling which works as follows: 1) draw a discrete random variable with the distribution of $\xi (\Lambda )$, and let ι be the obtained value 2) draw the points of ξ according to ${\mu _{\iota }}$ and then 3) apply the map ${\varphi _{\iota }}$ to ξ. The obtained configuration is distributed according to ${\nu _{\iota }}$.

It is shown in [8] that for two Poisson point processes of respective intensities ${\sigma _{1}}$ and ${\sigma _{2}}$, the distance defined above is finite if and only if ${\sigma _{1}}(E)={\sigma _{2}}(E)$ and

\[ \zeta =\sum \limits_{x\in \xi }{\delta _{t(x)}},\hspace{2.5pt}{\gamma _{\text{opt}}}\text{-a.s.}\]

where t is the optimal transport map between ${\sigma _{1}}/{\sigma _{1}}(E)$ and ${\sigma _{2}}/{\sigma _{2}}(E)$ for the Euclidean cost as defined in Theorem 8. Note that the optimal map is a transformation which is applied to each atom independently of the others. This amounts to say that

\[ {\varphi _{n}}({x_{1}},\dots ,{x_{n}})=(t({x_{1}}),\dots ,t({x_{n}})),\]

instead of

\[ {\varphi _{n}}({x_{1}},\dots ,{x_{n}})=({t_{1}}({x_{1}},\dots ,{x_{n}}),\dots ,{t_{n}}({x_{1}},\dots ,{x_{n}})),\]

in non-Poissonian cases.

4.1 Distances between DPP

For determinantal point processes, we can evaluate the effect of a modification of the eigenvalues with the Kantorovitch–Rubinstein distance and the effect of a modification of the eigenvectors with the Wasserstein-2 distance.

Lemma 12.

Let μ and ν be two determinantal point processes with respective kernels ${K_{\mu }}$ and ${K_{\nu }}$. Assume that ${K_{\mu }}$ and ${K_{\nu }}$ are two projection kernels in some Hilbert space ${L^{2}}(E,m)$ such that ${K_{\mu }}={K_{\nu }}+L$ where L is another projection kernel, orthogonal to ${K_{\nu }}$, i.e. $\operatorname{im}(L)\subset \ker {K_{\nu }}$. Then

(9)

\[ {\operatorname{dist}_{\operatorname{KR}}}(\mu ,\hspace{0.1667em}\nu )\le \operatorname{rank}(L).\]

Proof.

The hypothesis means that there exists $({\phi _{j}},\hspace{0.1667em}j=0,\dots ,n+l)$, a family of orthonormal functions in ${L^{2}}(E,m)$, such that

\[ {K_{\nu }}(x,y)={\sum \limits_{j=0}^{n}}{\phi _{j}}(x){\phi _{j}}(y)\hspace{2.5pt}\text{and}\hspace{2.5pt}L(x,y)={\sum \limits_{j=n+1}^{n+l}}{\phi _{j}}(x){\phi _{j}}(y).\]

Since L is a positive symmetric operator, this exactly means that ${K_{\nu }}\prec {K_{\mu }}$ in the Loewner ordering. According to the Strassen theorem [15, 23], there exists a probability space $({\Omega ^{\prime }},{\mathcal{A}^{\prime }},{\mathbf{P}^{\prime }})$ on which we can construct three ${\mathfrak{N}_{E}}$-valued random variables ${\Xi ^{\prime }_{\mu }}$, ${\Xi ^{\prime }_{\nu }}$ and ${\Upsilon ^{\prime }}$ such that ${\Xi ^{\prime }_{\mu }}$ and ${\Xi ^{\prime }_{\nu }}$ have respective distributions μ, ν and

\[ {\Xi ^{\prime }_{\mu }}={\Xi ^{\prime }_{\nu }}\cup {\Upsilon ^{\prime }}\hspace{2.5pt}\text{and}\hspace{2.5pt}{\xi ^{\prime }_{\nu }}\cap {\Upsilon ^{\prime }}=\varnothing ,\hspace{2.5pt}{\mathbf{P}^{\prime }}\hspace{2.5pt}\text{a.s.}\]

This implies that

\[ {\Xi ^{\prime }_{\mu }}\hspace{0.1667em}\Delta \hspace{0.1667em}{\Xi ^{\prime }_{\nu }}(E)={\Upsilon ^{\prime }}(E)=l.\]

According to the first definition of ${\operatorname{dist}_{\operatorname{KR}}}$, see (7), this implies (9). □

Theorem 13.

Let Λ be a compact subset of E and $({\phi _{j}},\hspace{0.1667em}j\ge 0)$ be a CONB of ${L^{2}}(\Lambda ,m)$ and

\[\begin{aligned}{}{K_{\mu }}(x,y)& ={\sum \limits_{j=0}^{\infty }}{\lambda _{n}^{\mu }}\hspace{2.5pt}{\phi _{j}}(x){\phi _{j}}(y),\\ {} {K_{\nu }}(x,y)& ={\sum \limits_{j=0}^{\infty }}{\lambda _{n}^{\nu }}\hspace{2.5pt}{\phi _{j}}(x){\phi _{j}}(y)\end{aligned}\]

where

\[ {\lambda _{j}^{\nu }}\le {\lambda _{j}^{\mu }},\hspace{2.5pt}\forall j\ge 0.\]

Let μ (respectively ν) be a determinantal point process of characteristics ${K_{\mu }}$ and ${h_{\mu }}$ (respectively ${K_{\nu }}$ and ${h_{\nu }}$). Then

(10)

\[ {\operatorname{dist}_{\operatorname{KR}}}(\mu ,\hspace{0.1667em}\nu )\le {\sum \limits_{j=0}^{\infty }}({\lambda _{j}^{\mu }}-{\lambda _{j}^{\nu }}).\]

Proof.

We make a coupling of the Bernoulli random variables which appear in Lemma 7 by using the same sequence of uniform random variables: Let $({U_{j}},\hspace{0.1667em}j\ge 0)$ be a sequence of independent, identically uniformly distributed over $[0,1]$, random variables, and let us consider

\[ {X_{j}^{\mu }}={\mathbf{1}_{\{{U_{j}}\le {\lambda _{j}^{\mu }}\}}}\hspace{2.5pt}\text{and}\hspace{2.5pt}{X_{j}^{\nu }}={\mathbf{1}_{\{{U_{j}}\le {\lambda _{j}^{\nu }}\}}}.\]

Note that

(11)

\[ \mathbf{P}({X_{j}^{\nu }}\ne {X_{j}^{\mu }})=({\lambda _{j}^{\mu }}-{\lambda _{j}^{\nu }}).\]

Let ${I_{\mu }}=\{j\ge 0,{X_{j}^{\mu }}=1\}$ and ${I_{\nu }}=\{j\ge 0,{X_{j}^{\nu }}=1\}$. In view of the hypothesis, ${X^{\nu }}\le {X^{\mu }}$ hence ${I_{\nu }}\subset {I_{\mu }}$. In other words, ${K_{{I_{\mu }}}}$ and ${K_{{I_{\nu }}}}$ are two projection kernels which satisfy the hypothesis of Lemma 12. Hence, there exists a realization $({\Xi _{\mu }},\hspace{0.1667em}{\Xi _{\nu }})$ of $\Sigma (\mu ,\nu )$ given ${I_{\mu }}$ and ${I_{\nu }}$, such that

\[ {\operatorname{dist}_{\operatorname{TV}}}({\Xi _{\mu }},\hspace{0.1667em}{\Xi _{\nu }})={\sum \limits_{j=0}^{\infty }}{\mathbf{1}_{\{{X_{j}^{\nu }}\ne {X_{j}^{\mu }}\}}}.\]

Gluing these realizations together, we get a coupling $({\Xi _{\mu }},\hspace{0.1667em}{\Xi _{\nu }})$ such that

\[\begin{aligned}{}\mathbf{E}\left[{\operatorname{dist}_{\operatorname{TV}}}({\Xi _{\mu }},\hspace{0.1667em}{\Xi _{\nu }})\right]& =\mathbf{E}\left[\mathbf{E}\left[{\operatorname{dist}_{\operatorname{TV}}}({\Xi _{\mu }},\hspace{0.1667em}{\Xi _{\nu }})\hspace{0.1667em}|\hspace{0.1667em}{I_{\mu }},{I_{\nu }}\right]\right]\\ {} & =\mathbf{E}\left[{\sum \limits_{j=0}^{\infty }}{\mathbf{1}_{\{{X_{j}^{\nu }}\ne {X_{j}^{\mu }}\}}}\right]\\ {} & ={\sum \limits_{j=0}^{\infty }}({\lambda _{j}^{\mu }}-{\lambda _{j}^{\nu }}),\end{aligned}\]

according to (11). Since the Kantorovitch–Rubinstein distance is obtained as the infimum over all couplings of the total variation distance between ${\Xi _{\mu }}$ and ${\Xi _{\nu }}$, this particular construction shows that (10) holds. □

The next corollary is an immediate consequence of the alternative definition of the Kantorovitch–Rubinstein distance for point processes, see (7).

Corollary 14.

With the hypothesis of Theorem 13, let ${\Xi _{\mu }}$ and ${\Xi _{\nu }}$ be random point processes of respective distributions μ and ν. Then we have that

\[ \underset{A\subset \Lambda }{\sup }{\operatorname{dist}_{\operatorname{TV}}}({\Xi _{\mu }}(A),\hspace{0.1667em}{\Xi _{\nu }}(A))\le {\sum \limits_{n=0}^{\infty }}({\lambda _{n}^{\mu }}-{\lambda _{n}^{\nu }}).\]

This means that the Kantorovitch–Rubinstein distance between point processes focuses on the number of atoms in any compact. As we shall see now, the Wasserstein-2 distance evaluates the matching distance between configurations when they have the same cardinality.

Theorem 15.

Let μ (respectively ν) be a determinantal point process of characteristics ${K_{\mu }}$ and ${h_{\mu }}$ (respectively ${K_{\nu }}$ and ${h_{\nu }}$) on a compact set $\Lambda \subset {\mathbf{R}^{k}}$. The Wasserstein-2 distance between μ and ν is finite if and only if

\[ \textit{sp}({K_{\mu }};\hspace{0.1667em}{L^{2}}({h_{\mu }}\hspace{-0.1667em}\operatorname{d}\hspace{-0.1667em}x))=\textit{sp}({K_{\nu }};\hspace{0.1667em}{L^{2}}({h_{\nu }}\hspace{-0.1667em}\operatorname{d}\hspace{-0.1667em}x)).\]

Proof.

If the Wasserstein-2 distance between μ and ν is finite, then, according to point 1 of Theorem 11, the distributions of ${\Xi _{\mu }}(\Lambda )$ and ${\Xi _{\nu }}(\Lambda )$ coincide. We also know from Lemma 7 that

(12)

\[ {\Phi _{\mu }}(z)={\mathbf{E}_{\mu }}\left[{z^{{\Xi _{\mu }}(\Lambda )}}\right]=\prod \limits_{\lambda \in \operatorname{sp}{K_{\mu }}}(1-\lambda +\lambda z),\]

the infinite product being convergent since $\operatorname{trace}{K_{\mu }}={\textstyle\sum _{\lambda \in \operatorname{sp}{K_{\mu }}}}\lambda $ is finite. A similar formula holds for ${\Xi _{\nu }}(\Lambda )$. The zeros of ${\phi _{\mu }}$ are $(1-1/\lambda )$ for $\lambda \in \operatorname{sp}{K_{\mu }}$, counted with multiplicity. Hence the zeros of the holomorphic functions ${\phi _{\mu }}$ and ${\phi _{\nu }}$ appear to be negative and isolated. Let

\[ m(\Phi ,r)=\text{number of zeros (counted with multiplicity) of}\hspace{2.5pt}\Phi \hspace{2.5pt}\text{in}\hspace{2.5pt}B(0,\hspace{0.1667em}r).\]

By the properties of zeros of holomorphic functions we have

\[ m({\Phi _{\mu }},r)=m({\Phi _{\nu }},r)\hspace{2.5pt}\text{for any}\hspace{2.5pt}r\ge 0.\]

Hence, the zeros of ${\phi _{\mu }}$ and ${\phi _{\nu }}$ coincide and so do the two spectra.

Conversely, if the two spectra coincide, it remains to verify point 2 of Theorem 11. By the very definition of ${\mathcal{W}_{e}}$,

\[ {\mathcal{W}_{e}}({\mu _{n}},{\nu _{n}})\le {\int _{\Lambda }}\| x{\| ^{2}}(\operatorname{d}\hspace{-0.1667em}{\mu _{n}}+\operatorname{d}\hspace{-0.1667em}{\nu _{n}})\le \underset{x\in \Lambda }{\sup }\| x{\| ^{2}}\Big({\mu _{n}}(\Lambda )+{\nu _{n}}(\Lambda )\Big).\]

Thus, using the notations of Theorem 11, we have

\[\begin{aligned}{}\sum \limits_{n\ge 1}{\mathcal{W}_{e}}({\mu _{n}},\hspace{0.1667em}{\nu _{n}})\hspace{2.5pt}{\mathfrak{c}_{n}}& \le \underset{x\in \Lambda }{\sup }\| x{\| ^{2}}\sum \limits_{n\ge 1}\Big({\mu _{n}}(\Lambda )+{\nu _{n}}(\Lambda )\Big){\mathfrak{c}_{n}}\\ {} & \le {(\underset{x\in \Lambda }{\sup }\| x\| )^{2}}\left(\mathbf{E}\left[{\Xi _{\mu }}(\Lambda )\right]+\mathbf{E}\left[{\Xi _{\nu }}(\Lambda )\right]\right)\\ {} & ={(\underset{x\in \Lambda }{\sup }\| x\| )^{2}}\left(\operatorname{trace}({K_{\mu }})+\operatorname{trace}({K_{\nu }})\right).\end{aligned}\]

This quantity is finite, hence the Wasserstein-2 distance between μ and ν is finite as soon as the spectra are equal. □

The next lemma is a straightforward consequence of Lemma 7.

Lemma 16.

Let μ be a determinantal point process of characteristics ${K_{\mu }}$ and ${h_{\mu }}$. For a finite subset I, let

\[ {c_{I}}=\prod \limits_{i\in I}{\lambda _{i}^{\mu }}\hspace{0.1667em}\prod \limits_{j\in {I^{c}}}(1-{\lambda _{j}^{\mu }}),\]

where ${\lambda _{i}^{\mu }}$’s are the eigenvalues of ${K_{\mu }}$. Then, its n-th Janossy density is given by

\[ {j_{n}^{\mu }}({x_{1}},\dots ,{x_{n}})=\sum \limits_{\substack{I\subset \mathbf{N}\\ {} |I|=n}}{c_{I}}\hspace{0.1667em}{p_{I}}({x_{1}},\dots ,{x_{n}}).\]

This means that given ${\Xi _{\mu }}(E)=n$, the points are distributed according to the probability density

\[ {\mathfrak{p}_{n}^{\mu }}\hspace{0.1667em}:\hspace{0.1667em}({x_{1}},\dots ,{x_{n}})\longmapsto {\mathfrak{c}_{n}^{-1}}\sum \limits_{\substack{I\subset \mathbf{N}\\ {} |I|=n}}{c_{I}}\hspace{0.1667em}{p_{I}}({x_{1}},\dots ,{x_{n}})\textit{where}\hspace{2.5pt}{\mathfrak{c}_{n}}=\sum \limits_{\substack{I\subset \mathbf{N}\\ {} |I|=n}}{c_{I}}.\]

Proof.

Consider that ${\Xi _{\mu }}$ is constructed with Algorithm 1 and denote by ${I_{{\Xi _{\mu }}}}$ the set of indices of the Bernoulli random variables which are equal to 1 for the drawing of ${\Xi _{\mu }}$. For any bounded $f\hspace{0.1667em}:\hspace{0.1667em}{\mathfrak{N}_{E}^{f}}\to \mathbf{R}$,

\[\begin{aligned}{}\mathbf{E}\left[f({\Xi _{\mu }})\right]& =f(\varnothing )+{\sum \limits_{n=1}^{\infty }}\mathbf{E}\left[f({\Xi _{\mu }}){\mathbf{1}_{\{{\Xi _{\mu }}(E)=n\}}}\right]\\ {} & =f(\varnothing )+{\sum \limits_{n=1}^{\infty }}\sum \limits_{\substack{J\subset \mathbf{N}\\ {} |J|=n}}\mathbf{E}\left[f({\Xi _{\mu }})\hspace{0.1667em}|\hspace{0.1667em}{I_{{\Xi _{\mu }}}}=J\right]\hspace{0.1667em}{c_{J}}\\ {} & =f(\varnothing )+{\sum \limits_{n=1}^{\infty }}\sum \limits_{\substack{J\subset \mathbf{N}\\ {} |J|=n}}{c_{J}}{\int _{{E^{n}}}}f({x_{1}},\dots ,{x_{n}})\hspace{0.1667em}{p_{J}}({x_{1}},\dots ,{x_{n}})\operatorname{d}\hspace{-0.1667em}{x_{1}}\dots \operatorname{d}\hspace{-0.1667em}{x_{n}}\\ {} & =f(\varnothing )+{\sum \limits_{n=1}^{\infty }}{\mathfrak{c}_{n}}{\int _{{E^{n}}}}f({x_{1}},\dots ,{x_{n}})\hspace{0.1667em}{\mathfrak{p}_{n}^{\mu }}({x_{1}},\dots ,{x_{n}})\operatorname{d}\hspace{-0.1667em}{x_{1}}\dots \operatorname{d}\hspace{-0.1667em}{x_{n}}.\end{aligned}\]

The result follows by identification with (1). □

Then, Theorem 11 applies as follows.

Theorem 17.

Suppose that the hypothesis of Theorem 15 holds. Let $\operatorname{Id}-\nabla {\psi _{n}}$ be the optimal transport map between ${\mathfrak{p}_{n}^{\mu }}$ and ${\mathfrak{p}_{n}^{\nu }}$. Then the optimal coupling is given by the following rule: For ${\Xi _{\mu }}$ such that ${\Xi _{\mu }}(E)=n$, it is coupled with ${\Xi _{\nu }}$, the configuration with n atoms described by

\[ {\Xi _{\nu }}=\sum \limits_{x\in {\Xi _{\mu }}}{\delta _{x-{\nabla _{x}}{\psi _{n}}({\Xi _{\mu }})}}.\]

Furthermore,

\[\begin{aligned}{}{\mathcal{W}_{c}}(\mu ,\nu )& ={\sum \limits_{n=1}^{\infty }}{\mathfrak{c}_{n}}\hspace{0.1667em}{\mathcal{W}_{e}}({\mathfrak{p}_{n}^{\mu }},\hspace{0.1667em}{\mathfrak{p}_{n}^{\nu }})\\ {} & =\frac{1}{2}{\sum \limits_{n=1}^{\infty }}{\int _{{E^{n}}}}\| \nabla {\psi _{n}}({x_{1}},\dots ,{x_{n}}){\| _{{E^{n}}}^{2}}\hspace{2.5pt}{j_{n}^{\mu }}({x_{1}},\dots ,{x_{n}})\operatorname{d}\hspace{-0.1667em}{x_{1}}\dots \operatorname{d}\hspace{-0.1667em}{x_{n}}.\end{aligned}\]

Theorems 15 and 17 mean that two determinantal point processes are strongly coupled when and only when their eigenvalues are identical. Moreover, the eigenvalues also control the convex combination of the densities of the projection DPP which appear in the Janossy densities.

Fig. 3.

For two rank-3 DPP, to compute the optimal map at stage 2 (i.e. for configurations with two points), we compute the coefficient of the convex combination which will be used on both sides to compute the Janossy density. Then we solve the optimal transport problem between these two densities. This gives the map to be applied to configurations with 2 points of the first DPP to obtain the matching configuration in the second DPP

4.2 Determinantal projection processes

Recall from Definition 5 that a projection DPP has a spectrum reduced to $\{1\}$. When it is of finite rank M, almost all its configurations have M points distributed according to the density

(13)

\[ {p_{\phi }}({x_{1}},\dots ,{x_{M}})=\frac{1}{M!}\det \Big({K_{\phi }}({x_{i}},\hspace{0.1667em}{x_{j}}),\hspace{0.1667em}1\le i,j\le M\Big).\]

Theorem 17 cannot be used as is since a projection DPP do not possess Janossy densities. However, the initial definition of ${\mathcal{W}_{c}}$ can still be used.

Theorem 18.

Let $\psi =({\psi _{j}},\hspace{0.1667em}1\le j\le M)$ and $\psi =({\psi _{j}},\hspace{0.1667em}1\le j\le M)$ be two orthonormal families of ${L^{2}}(E;m)$. Let ${\mu _{\psi }}$ and ${\mu _{\phi }}$ be the two projection DPP associated to these families. Then

\[ {\mathcal{W}_{c}}({\mu _{\psi }},\hspace{0.1667em}{\mu _{\phi }})\le \underset{\sigma \in {\mathfrak{S}_{M}}}{\inf }{\sum \limits_{j=1}^{M}}{\mathcal{W}_{e}}(|{\psi _{j}}{|^{2}}\operatorname{d}\hspace{-0.1667em}m,\hspace{0.1667em}|{\phi _{\sigma (j)}}{|^{2}}\operatorname{d}\hspace{-0.1667em}m).\]

Proof.

We know that the points of ${\mu _{\psi }}$ (respectively ${\mu _{\phi }}$) are distributed according to ${p_{\psi }}$ (respectively ${p_{\phi }}$) given by (13). Let γ be a probability measure on ${E^{M}}\times {E^{M}}$ whose marginals are ${p_{\psi }}\operatorname{d}\hspace{-0.1667em}m$ and ${p_{\phi }}\operatorname{d}\hspace{-0.1667em}m$. We know that

\[\begin{aligned}{}{\mathcal{W}_{c}}({\mu _{\psi }},\hspace{0.1667em}{\mu _{\phi }})& ={\int _{{E^{M}}\times {E^{M}}}}\underset{\sigma \in {\mathfrak{S}_{M}}}{\inf }{\sum \limits_{j=1}^{M}}|{x_{j}}-{y_{\sigma (j)}}{|_{E}^{2}}\hspace{2.5pt}\operatorname{d}\hspace{-0.1667em}\gamma ({x_{1}},\dots ,{x_{M}},{y_{1}},\dots ,{y_{M}})\\ {} & \le {\sum \limits_{j=1}^{M}}{\int _{{E^{M}}\times {E^{M}}}}|{x_{j}}-{y_{j}}{|_{E}^{2}}\hspace{2.5pt}\operatorname{d}\hspace{-0.1667em}\gamma ({x_{1}},\dots ,{x_{M}},{y_{1}},\dots ,{y_{M}}).\end{aligned}\]

We know from Algorithm 1, that the marginal distribution of a single atom of ${\mu _{\psi }}$ has distribution

\[ \operatorname{d}\hspace{-0.1667em}{\mu _{\psi }^{1}}(x)=\frac{1}{M}{\sum \limits_{j=1}^{M}}|{\psi _{j}}(x){|^{2}}\operatorname{d}\hspace{-0.1667em}m(x).\]

Since ${p_{\psi }}$ and ${p_{\phi }}$ are both invariant with respect to permutations, we obtain

\[\begin{aligned}{}{\mathcal{W}_{c}}({\mu _{\psi }},\hspace{0.1667em}{\mu _{\phi }})& \le M\hspace{2.5pt}{\int _{E\times E}}|{x_{1}}-{y_{1}}{|_{E}^{2}}\operatorname{d}\hspace{-0.1667em}\gamma ({x_{1}},\dots ,{x_{M}},{y_{1}},\dots ,{y_{M}})\\ {} & \le M\hspace{2.5pt}{\mathcal{W}_{e}}({\mu _{\psi }^{1}},{\mu _{\phi }^{1}}).\end{aligned}\]

If ${\gamma _{i}^{1}}$ is a coupling between $|{\psi _{i}}{|^{2}}\operatorname{d}\hspace{-0.1667em}m$ and $|{\phi _{i}}{|^{2}}\operatorname{d}\hspace{-0.1667em}m$ then ${M^{-1}}{\textstyle\sum _{i=1}^{M}}{\gamma _{i}^{1}}$ is a coupling between ${\mu _{\psi }^{1}}$ and ${\mu _{\phi }^{1}}$. Hence,

\[\begin{aligned}{}{\mathcal{W}_{c}}({\mu _{\psi }},\hspace{0.1667em}{\mu _{\phi }})& \le {\sum \limits_{j=1}^{M}}{\int _{E\times E}}|{x_{1}}-{y_{1}}{|^{2}}\operatorname{d}\hspace{-0.1667em}{\gamma _{j}^{1}}({x_{1}},{y_{1}})\\ {} & \le {\sum \limits_{j=1}^{M}}{\mathcal{W}_{e}}(|{\psi _{j}}{|^{2}}\operatorname{d}\hspace{-0.1667em}m,|{\phi _{j}}{|^{2}}\operatorname{d}\hspace{-0.1667em}m).\end{aligned}\]

Since we can arrange the elements of the families ψ and ϕ in any order, the result follows. □

5 Simulation

In this section, we show how the previous theorems can be applied to give some guarantees when we make an approximate simulation of a Ginibre point process. We will consider Ginibre point processes but our reasoning could be applied to any rotational invariant determinantal process like the polyanalytic ensembles [17, 13] or the Bergman process [19]. For the Ginibre process, which will be our toy model, its restriction to ${\mathcal{B}_{R}}$, denoted by ${\mathfrak{G}_{R}^{1,1}}$, has a kernel of the form ([10])

\[ {K^{R}}(x,y)={\sum \limits_{n=0}^{\infty }}{\lambda _{n}^{R}}\hspace{2.5pt}{\phi _{n}^{R}}(x)\overline{{\phi _{n}^{R}}(y)}\]

where

\[\begin{aligned}{}{\lambda _{n}^{R}}& =\frac{\gamma (n+1,{R^{2}})}{n!},\\ {} {\phi _{n}^{R}}(x)& =\frac{1}{\sqrt{\pi \gamma (n+1,{R^{2}})}}\hspace{2.5pt}{x^{n}}{e^{-|x{|^{2}}/2}},\end{aligned}\]

with $\gamma (n,r)$ being the lower incomplete gamma function. We denote by ${\mathfrak{G}_{R,N}^{1,1}}$ the truncated Ginibre process whose kernel is the truncation of ${K_{R}}$ to its first N components:

\[ {K_{N}^{R}}(x,y)={\sum \limits_{n=0}^{N-1}}{\lambda _{n}^{R}}{\phi _{n}^{R}}(x)\overline{{\phi _{n}^{R}}(y)}.\]

We assume here $\lambda =1$ and $\beta =1$ only for the sake of simplicity, the computations could be done similarly for any values of λ and β. In the following, we remove the superscript $(1,1)$ as λ and β are fixed. The strict application of Algorithm 1 for the simulation of ${\mathfrak{G}_{R}}$, requires to compute all the quantities of the form

\[ {\lambda _{n}^{R}}{\prod \limits_{k=n+1}^{\infty }}(1-{\lambda _{k}^{R}})\]

to determine which Bernoulli random variables are active. Strictly speaking, this is unfeasible. However, it is a well-known observation that the number of points of ${\mathfrak{G}_{R}}$ is about ${R^{2}}$. So it is likely that ${\mathfrak{G}_{R}}$ and ${\mathfrak{G}_{R,{N_{R}}}}$ should be close for ${N_{R}}$ close to ${R^{2}}$. This is what proves the next theorem.

Theorem 19.

Let $c>0$ and ${N_{R}}={(R+c)^{2}}$. For $R>c$, we have

\[ {\operatorname{dist}_{\operatorname{KR}}}({\mathfrak{G}_{R}},\hspace{0.1667em}{\mathfrak{G}_{R,{N_{R}}}})\le \sqrt{\frac{2}{\pi }}R{e^{-{c^{2}}}}.\]

Actually, the proof says that with high probability, ${\mathfrak{G}_{R}}$ and ${\mathfrak{G}_{R,{N_{R}}}}$ do coincide.

Proof.

First, using the integral expression $\gamma (j,x)={\textstyle\int _{t=0}^{x}}{t^{j-1}}{e^{-t}}\operatorname{d}\hspace{-0.1667em}t$, observe that ${\textstyle\sum _{j=1}^{\infty }}\frac{\gamma (j,x)}{\Gamma (j)}=x$. Then, using the formula $\gamma (n+1,x)=n\gamma (n,x)-{x^{n}}{e^{-x}}$, we have by induction

\[ {\sum \limits_{j=n+1}^{\infty }}\frac{\gamma (j,x)}{\Gamma (j)}=\frac{{x^{n}}{e^{-x}}-(n-x)\gamma (n,x)}{\Gamma (n)}\cdot \]

For $n={(R+c)^{2}}$ and $x={R^{2}}$, this implies

\[\begin{aligned}{}\sum \limits_{j\ge {(R+c)^{2}}}{\lambda _{j}^{R}}& \le {(R+c)^{2}}\frac{{R^{2{(R+c)^{2}}}}{e^{-{R^{2}}}}}{{(R+c)^{2}}!}.\end{aligned}\]

Using the bound $n!\ge \sqrt{2\pi n}{\left(\frac{n}{e}\right)^{n}}$,

\[\begin{aligned}{}\sum \limits_{j\ge {(R+c)^{2}}}{\lambda _{j}^{R}}& \le \frac{R+c}{\sqrt{2\pi }}\frac{{R^{2{(R+c)^{2}}}}{e^{{(R+c)^{2}}-{R^{2}}}}}{{(R+c)^{2{(R+c)^{2}}}}}\\ {} & \le \frac{R+c}{\sqrt{2\pi }}{e^{{(R+c)^{2}}-{R^{2}}-2{(R+c)^{2}}\log (1+\frac{c}{R})}}\\ {} & \le \frac{R+c}{\sqrt{2\pi }}{e^{{(R+c)^{2}}-{R^{2}}-2{(R+c)^{2}}\frac{\frac{c}{R}}{1+\frac{c}{R}}}}\\ {} & \le \frac{R+c}{\sqrt{2\pi }}{e^{-{c^{2}}}}.\end{aligned}\]

Since $R>c$, the proof is complete. □

As a corollary of the previous proof, we have

(14)

\[ \mathbf{P}(\exists j\ge {(R+c)^{2}},\text{B}({\lambda _{j}^{R}})=1)\le \sum \limits_{j\ge {(R+c)^{2}}}{\lambda _{j}^{R}}\le \kappa R{e^{-{c^{2}}}}\]

for R large enough. This means that the number of active Bernoulli random variables in Algorithm 1 is less than ${(R+c)^{2}}$ with high probability. We can also provide a lower bound on the cardinality of I.

Lemma 20.

For any $R>c>0$,

\[ \mathbf{P}(\operatorname{card}(I)<{(R-c)^{2}})\le \frac{1}{\sqrt{2\pi }}R{e^{-{c^{2}}}}.\]

Proof.

As in the previous proof, we will reduce the problem to bounding a sum of reduced incomplete gamma functions.

\[\begin{aligned}{}\mathbf{P}(\operatorname{card}(I)<{(R-c)^{2}})& =1-\mathbf{P}(\operatorname{card}(I)\ge {(R-c)^{2}})\\ {} & \le 1-\prod \limits_{0\le j<\lfloor {(R-c)^{2}}\rfloor }\mathbf{P}(\operatorname{B}({\lambda _{j}^{R}})=1)\\ {} & \le \sum \limits_{0\le j<\lfloor {(R-c)^{2}}\rfloor }(1-\mathbf{P}(\operatorname{B}({\lambda _{j}^{R}})=1))\\ {} & \le \sum \limits_{1\le j\le \lfloor {(R-c)^{2}}\rfloor }\frac{\Gamma (j,{R^{2}})}{\Gamma (j)}\cdot \end{aligned}\]

Using the formula $\Gamma (n+1,x)=n\Gamma (n,x)+{x^{n}}{e^{-x}}$, we have by induction

\[ {\sum \limits_{j=1}^{n}}\frac{\Gamma (j,x)}{\Gamma (j)}=\frac{{x^{n}}{e^{-x}}-(x-n)\Gamma (n,x)}{\Gamma (n)}\cdot \]

For $n=\lfloor {(R-c)^{2}}\rfloor $ and $x={R^{2}}$, this implies

\[\begin{aligned}{}\mathbf{P}(\operatorname{card}(I)<{(R-c)^{2}})& \le {(R-c)^{2}}\frac{{R^{2{(R-c)^{2}}}}{e^{-{R^{2}}}}}{{(R-c)^{2}}!}\cdot \end{aligned}\]

Using Stirling formula,

\[\begin{aligned}{}\mathbf{P}(\operatorname{card}(I)<{(R-c)^{2}})& \le \frac{R-c}{\sqrt{2\pi }}\frac{{R^{2{(R-c)^{2}}}}{e^{{(R-c)^{2}}-{R^{2}}}}}{{(R-c)^{2{(R-c)^{2}}}}}\\ {} & \le \frac{R-c}{\sqrt{2\pi }}{e^{{(R-c)^{2}}-{R^{2}}-2{(R-c)^{2}}\log (1-\frac{c}{R})}}\\ {} & \le \frac{R-c}{\sqrt{2\pi }}{e^{{(R-c)^{2}}-{R^{2}}+2{(R-c)^{2}}\frac{\frac{c}{R}}{1-\frac{c}{R}}}}\\ {} & \le \frac{R-c}{\sqrt{2\pi }}{e^{-{c^{2}}}}\cdot \end{aligned}\]

The proof is thus complete. □

The combination of Lemma 20 and (14) shows that the cardinality of I is of the order of ${R^{2}}$ with high probability.

5.1 Inverse transform sampling

The next step of the algorithm is to draw the points according to a density given by a determinant. Since we do not have explicit expression of the inverse cumulative function of these densities, we have to resort to rejection sampling. Fortunately, the particular form of the eigenfunctions of isotropic point processes, like the Ginibre point process, is prone to the simulation of modulus and arguments by inverting their respective cumulative distribution function. The key remark is that along the iterations which are necessary to draw the ${W_{i}}$, the densities always have the same form:

(15)

\[ {p_{i}}(z)=\sum \limits_{k\in I}{a_{k,i}}\hspace{2.5pt}{z^{{n_{k}}}}\hspace{0.1667em}{f_{k}}(|z|),\hspace{2.5pt}\text{for}\hspace{2.5pt}i=1,\dots ,|I|.\]

Without any approximation, ${f_{k}}$ corresponds to the radial component of ${p_{I}}$ and is given by (19). When we approximate the eigenfunctions, ${f_{k}}$ is replaced by ${\tilde{f}_{k}}$ as in (20). This new approach is summarized in Algorithm 2.

Lemma 21 (Cumulative distribution of the modulus).

Let

(16)

\[ p(z)=\sum \limits_{i\in I}{a_{i}}{z^{{n_{i}}}}{f_{i}}(|z|)\]

and

\[ P(r)={\int _{\rho =0}^{r}}{\int _{\theta =0}^{2\pi }}|p(\rho {e^{j\theta }}){|^{2}}\rho \operatorname{d}\hspace{-0.1667em}\rho \operatorname{d}\hspace{-0.1667em}\theta .\]

The following equality holds:

\[ P(r)=\sum \limits_{i\in I}|{a_{i}}{|^{2}}{F_{i}}({r^{2}})\]

where

\[ {F_{i}}({r^{2}})={\int _{\rho =0}^{{r^{2}}}}\pi {\rho ^{{n_{i}}}}{f_{i}^{2}}(\sqrt{\rho })\operatorname{d}\hspace{-0.1667em}\rho .\]

Given a sequence of complex numbers ${W_{\ell }}$ for ℓ from 1 to $|I|$, we denote by ${e_{\ell }}$ the orthonormal vectors obtained by the Gram–Schmidt orthonormalization of the vectors ${\phi _{I}^{R}}({W_{\ell }})$. Let also ${M_{\ell }}\in {\mathbb{R}^{|I|}}$ be the vector ${(|{e_{\ell ,i}}{|^{2}})_{i\in I}}$ where ${e_{\ell ,i}}$ is the coordinate of index i of ${e_{\ell }}$. Moreover, let ${U_{F}}(r)={({F_{i}}(r))_{i\in I}}$. Finally, let ${U_{i}}$ be the sequence of vectors defined by induction with ${U_{1}}={(1)_{i\in I}}$ and ${U_{i+1}}={U_{i}}-{M_{i}}$. Then, using Lemma 21, we observe that the cumulative distribution associated with the density ${p_{i}}$ in Algorithm 1 is the function $\frac{1}{|I|-i+1}{U_{i}}\cdot {U_{F}}(r)$. Thus drawing the modulus of ${W_{i}}$ in Algorithm 1 is reduced to sampling ${c_{i}}$ uniformly in $[0,1]$ and solving the equation

(17)

\[ {c_{i}}=\frac{1}{|I|-i+1}{U_{i}}\cdot {U_{F}}(r).\]

Knowing ${U_{i}}$, we can compute ${U_{i+1}}$ in $O(|I|)$ arithmetic operations. By construction Equation (17) has only one solution in the interval $[0,1]$, thus using the bisection method ([5]), it can be solved with precision δ using $O(|I|\log (1/\delta ))$ evaluation of ${F_{i}}$.

Given the modulus, we can now simulate the arguments.

Lemma 22 (Cumulative distribution of the argument).

Let p be given by (16) and

\[ Q(r,\alpha )={\int _{\theta =0}^{\alpha }}|p(r{e^{i\theta }}){|^{2}}r\operatorname{d}\hspace{-0.1667em}\theta .\]

Then Q can be rewritten as a sum of $|I{|^{2}}$ terms:

\[ Q(r,\alpha )=\sum \limits_{i,k\in I}{a_{i}}\overline{{a_{k}}}{G_{i,k}}(r,\alpha )\]

where ${G_{i,k}}(r,\alpha )=\left\{\begin{array}{l@{\hskip10.0pt}l}r{g_{i}}(r){g_{k}}(r)\frac{{e^{j({n_{i}}-{n_{k}})\alpha }}-1}{j({n_{i}}-{n_{k}})}\hspace{1em}& \textit{if}\hspace{2.5pt}i\ne j\\ {} r{g_{i}^{2}}(r)\alpha \hspace{1em}& \textit{if}\hspace{2.5pt}i=k\end{array}\right.$ and ${g_{i}}(r)={r^{{n_{i}}}}{f_{i}}(r)$.

Similarly to the simulation of the modulus, for ℓ from 1 to $|I|$, let ${A_{\ell }}\in {\mathbb{C}^{|I{|^{2}}}}$ be the vector ${({e_{\ell ,i}}\overline{{e_{\ell ,k}}})_{i,k\in I}}$. Let ${V_{G}}(r,\alpha )={\left({G_{i,k}}(r,\alpha )\right)_{i,k\in I}}$. Let ${({V_{i}})_{i=1\dots |I|-1}}$ be the sequence of vectors defined by recurrence with ${V_{1}}={({\mathbf{1}_{i=k}})_{i,k\in I}}$ and ${V_{i+1}}={V_{i}}-{A_{i}}$. Drawing the argument of ${W_{i}}$ in Algorithm 1 is now reduced to sampling ${c_{i}}$ uniformly in $[0,1]$ and solving the equation (i.e. inverting numerically the cdf):

(18)

\[ {c_{i}}=\frac{1}{{V_{i}}\cdot {V_{G}}(r,2\pi )}{V_{i}}\cdot {V_{G}}(r,\alpha ).\]

Computing ${V_{i+1}}$ from ${V_{i}}$ requires $O(|I{|^{2}})$ arithmetic operations. Then, for fixed r, (18) can be solved up to precision δ in $O(|I{|^{2}}+|I|\log \delta )$ arithmetic operations and evaluations of the ${f_{i}}$, using a dichotomy approach.

The total cost of sampling the ${W_{i}}$ with this approach is $O(|I{|^{3}}+|I{|^{2}}\log \delta )$ operations. We will see in the next section how we can reduce this complexity using an approximation of the eigenfunctions.

Gathering the results of this section, we get in Algorithm 2 an efficient method to sample points from a symmetric projection point process.

Algorithm 2:

Simulation of a compact symmetric projection point process restricted to the disc ${\mathcal{B}_{R}}$

5.2 Compact Ginibre and approximation

Using Theorem 18 with a well-chosen approximation, we will show that we can reduce in Algorithm 2 the complexity of steps A. and B. from $O(|I{|^{2}})$ to $O(|I{|^{1.5}})$ operations with high probability. The idea to approximate the eigenfunctions was already present in [27] but it was not used to its whole power. We here approximate the eigenfunctions so that they contain substantially less monomials than the originals.

For a given constant $c>0$ and for an integer n, let ${R_{n}}$ be the ring between the circles of radii ${u_{n}}=\min (R,\sqrt{n}+c)$ and ${l_{n}}=\max (0,\min (\sqrt{n},R)-c)$. Let

(19)

\[\begin{aligned}{}{\mu _{n}}={\int _{{R_{n}}}}|{\phi _{n}^{R}}(z){|^{2}}\operatorname{d}\hspace{-0.1667em}z& =\frac{\gamma (n+1,{u_{n}^{2}})-\gamma (n+1,{l_{n}^{2}})}{\gamma (n+1,{R^{2}})},\\ {} {f_{n}}(|z|)& =\frac{1}{\sqrt{\pi \gamma (n+1,{R^{2}})}}{e^{-\frac{|z{|^{2}}}{2}}}.\end{aligned}\]

and define the following approximated functions:

(20)

\[ {\widetilde{f}_{n}}(|z|)=\left\{\begin{array}{l@{\hskip10.0pt}l}{f_{n}}(|z|)/\sqrt{{\mu _{n}}}\hspace{1em}& \text{if}\hspace{2.5pt}z\in {R_{n}},\\ {} 0\hspace{1em}& \text{otherwise,}\end{array}\right.\]

and let

\[ {\widetilde{\phi }_{n}^{R}}(z)={z^{n}}{\widetilde{f}_{n}}(|z|).\]

Remark 5 (Complexity simplification).

For a complex $z\in B(0,R)$, z belongs to ${R_{n}}$ for integers n such that

\[ {(|z|-c)^{2}}\le n\le {(|z|+c)^{2}}.\]

There are $[4c|z|]\le [4cR]$ such integers. This means that the modified function ${\tilde{U}_{F}}$ contains at most $4cR$ nonzero coefficients instead of ${R^{2}}$ coefficients in the original expression. The numerical effort to find ${c_{i}}$ and ${d_{i}}$ in Algorithm 2 is reduced accordingly.

We now show that replacing ${\phi _{n}^{R}}$ by ${\tilde{\phi }_{n}^{R}}$ does not cost much in terms of accuracy. In the course of the proof, we need the so-called Bakry–Emery–Blower theorem (see [32, Theorem 9.9 and pages 291–292]).

Theorem 23 (Bakry–Emery–Blower theorem).

Let V be a ${\mathcal{C}^{2}}$ function on R with $\textstyle\int {e^{-V(x)}}\operatorname{d}\hspace{-0.1667em}x=1$ and ${V^{\prime\prime }}(x)\ge \lambda >0$ then the measure ${\rho _{\infty }}$ of density ${e^{-V}}$ with respect to the Lebesgue measure satisfies the Talagrand inequality: For any probability measure ρ,

(21)

\[ {\mathcal{W}_{e}}(\rho ,\hspace{0.1667em}{\rho _{\infty }})\le H(\rho \hspace{0.1667em}|\hspace{0.1667em}{\rho _{\infty }})=\int \rho (r)\log \frac{\rho (r)}{{\rho _{\infty }}(r)}\operatorname{d}\hspace{-0.1667em}r.\]

Theorem 24.

For any $I\subset \{1,\dots ,{N_{R}}\}$,

\[ {\mathcal{W}_{c}}({\mu _{\phi }},\hspace{0.1667em}{\mu _{\tilde{\phi }}})\le \sum \limits_{j\in I}\log \left(\frac{1}{{\mu _{j}}}\right)\cdot \]

Proof.

According to Theorem 18, it is sufficient to evaluate

\[ {\mathcal{W}_{e}}\left(|{\phi _{j}^{R}}{|^{2}}\operatorname{d}\hspace{-0.1667em}x,\hspace{0.1667em}|{\tilde{\phi }_{j}^{R}}{|^{2}}\operatorname{d}\hspace{-0.1667em}x\right)\]

for any $j\in I$. Denote the two measures involved in the previous optimal transport problem by

\[ {\zeta _{j}}(\operatorname{d}\hspace{-0.1667em}x)=|{\phi _{j}^{R}}(x){|^{2}}\operatorname{d}\hspace{-0.1667em}x,\hspace{2.5pt}{\tilde{\zeta }_{j}}(\operatorname{d}\hspace{-0.1667em}x)=|{\tilde{\phi }_{j}^{R}}(x){|^{2}}\operatorname{d}\hspace{-0.1667em}x.\]

These are two radially symmetric measures on ${\mathbf{R}^{2}}$. We still denote by ζ and $\tilde{\zeta }$ the two measures they induce on the polar coordinates $(r,\theta )$. Consider

\[ {\zeta _{j}}(\operatorname{d}\hspace{-0.1667em}r\hspace{0.1667em}|\hspace{0.1667em}\theta )={c_{j}}\hspace{0.1667em}{r^{2j+1}}{e^{-{r^{2}}}}{\mathbf{1}_{[0,R]}}(r)\hspace{2.5pt}\text{where}\hspace{2.5pt}{c_{j}}=\frac{1}{\gamma (j+1,{R^{2}})},\]

the distribution of r given θ under ζ and the same quantity for $\tilde{\zeta }$. If we have a coupling between these two measures, i.e. if we have a measure

\[ {\gamma _{\theta }}(\operatorname{d}\hspace{-0.1667em}r,\hspace{0.1667em}\operatorname{d}\hspace{-0.1667em}{r^{\prime }})\in \Sigma \Big({\zeta _{j}}(\operatorname{d}\hspace{-0.1667em}r\hspace{0.1667em}|\hspace{0.1667em}\theta ),\hspace{0.1667em}{\tilde{\zeta }_{j}}(\operatorname{d}\hspace{-0.1667em}{r^{\prime }}\hspace{0.1667em}|\hspace{0.1667em}\theta )\Big),\]

then the measure

\[ \frac{1}{2\pi }{\gamma _{\theta }}(\operatorname{d}\hspace{-0.1667em}r,\hspace{0.1667em}\operatorname{d}\hspace{-0.1667em}{r^{\prime }})\operatorname{d}\hspace{-0.1667em}\theta \]

is a coupling between ${\zeta _{j}}$ and ${\tilde{\zeta }_{j}}$. It follows that

\[ {\mathcal{W}_{e}}({\zeta _{j}},\hspace{0.1667em}{\tilde{\zeta }_{j}})\le {\mathcal{W}_{e}}\left({c_{j}}{r^{2j+1}}{e^{-{r^{2}}}}{\mathbf{1}_{[0,R]}}(r)\operatorname{d}\hspace{-0.1667em}r,\hspace{0.1667em}{\tilde{c}_{j}}{r^{2j+1}}{e^{-{r^{2}}}}{\mathbf{1}_{{R_{j}}}}(r)\operatorname{d}\hspace{-0.1667em}r\right)\]

where

\[ {\tilde{c}_{j}}=\frac{1}{{\mu _{j}}}\cdot \]

Furthermore, we have

\[ -\frac{\operatorname{d}{\hspace{-0.1667em}^{2}}}{\operatorname{d}\hspace{-0.1667em}{r^{2}}}\log ({r^{2j+1}}{e^{-{r^{2}}}})=\frac{2j+1}{{r^{2}}}+2\ge 2.\]

Hence the Bakry–Emery–Blower criterion (Theorem 23) is satisfied and the measure

\[ {\rho _{\infty }}(\operatorname{d}\hspace{-0.1667em}r)={c_{j}}{r^{2j+1}}{e^{-{r^{2}}}}{\mathbf{1}_{[0,R]}}(r)\operatorname{d}\hspace{-0.1667em}r\]

satisfies the Talagrand inequality. Apply (21) to

\[ {\rho _{j}}(\operatorname{d}\hspace{-0.1667em}r)={\tilde{c}_{j}}{r^{2j+1}}{e^{-{r^{2}}}}{\mathbf{1}_{{R_{j}}}}(r)\operatorname{d}\hspace{-0.1667em}r\]

to obtain

\[ {\mathcal{W}_{e}}({\rho _{j}},\hspace{0.1667em}{\rho _{\infty }})\le \log \left(\frac{1}{{\rho _{\infty }}({R_{j}})}\right)=\log \left(\frac{1}{{\mu _{j}}}\right)\cdot \]

The proof is thus complete. □

Finally, using the same techniques as above, we bound the sum of $\log \left(\frac{1}{{\mu _{i}}}\right)$ in the following lemma.

Lemma 25.

There exists a constant κ such that for $\sqrt{\log R}\le c\le R$:

\[ {\sum \limits_{n=0}^{\infty }}\log \left(\frac{1}{{\mu _{n}}}\right)\le \kappa {R^{2}}{e^{-{c^{2}}}}\cdot \]

Proof.

We split the sum in three parts:

\[\begin{aligned}{}{S_{1}}& ={\sum \limits_{n=0}^{{(R-c)^{2}}-1}}\log \left(\frac{1}{{\mu _{n}}}\right),\\ {} {S_{2}}& ={\sum \limits_{n={(R-c)^{2}}}^{{R^{2}}-1}}\log \left(\frac{1}{{\mu _{n}}}\right),\\ {} {S_{3}}& ={\sum \limits_{n={R^{2}}}^{\infty }}\log \left(\frac{1}{{\mu _{n}}}\right).\end{aligned}\]

We will first prove that the terms in ${S_{1}}$ and ${S_{2}}$ are $O({e^{-{c^{2}}}})$ and the terms $\log (\frac{1}{{\mu _{{R^{2}}+k}}})$ in ${S_{3}}$ are $O(R{e^{-{c^{2}}}}{(1-\frac{1}{R})^{k}})$.

For $c\ge 1$ and $n\le {(R-c)^{2}}$, we show that ${{\mu _{n}}^{-1}}$ is roughly equal to $\frac{\gamma (n+1,{R^{2}})}{\Gamma (n+1)}$:

\[\begin{aligned}{}& \gamma (n+1,{u_{n}^{2}})-\gamma (n+1,{l_{n}^{2}})\\ {} & \hspace{1em}=\Gamma (n+1)-\Gamma (n+1,{u_{n}^{2}})-\gamma (n+1,{l_{n}^{2}})\\ {} & \hspace{1em}\ge \Gamma (n+1)-\frac{{u_{n}^{2(n+1)}}}{{u_{n}^{2}}-n-1}{e^{-{u_{n}^{2}}}}-\frac{{l_{n}^{2(n+1)}}}{n+1-{l_{n}^{2}}}{e^{-{l_{n}^{2}}}}\\ {} & \hspace{1em}\ge \Gamma (n+1)-\frac{{u_{n}^{2}}{e^{-{c^{2}}}}}{({u_{n}^{2}}-n-1)\sqrt{2\pi n}}\Gamma (n+1)-\frac{{l_{n}^{2}}{e^{-{c^{2}}}}}{(n+1-{l_{n}^{2}})\sqrt{2\pi n}}\Gamma (n+1)\\ {} & \hspace{1em}\ge \Gamma (n+1)(1-{e^{-{c^{2}}}}).\end{aligned}\]

This implies that

\[ \frac{1}{{\mu _{n}}}\le \frac{\gamma (n+1,{R^{2}})}{\Gamma (n+1)(1-{e^{-{c^{2}}}})}\]

and

\[ \log \left(\frac{1}{{\mu _{n}}}\right)\le \frac{1}{1-{e^{-{c^{2}}}}}\left(\frac{\Gamma (n+1,{R^{2}})}{\Gamma (n+1)}+{e^{-{c^{2}}}}\right)\cdot \]

Thus, for $c\ge 1$, we have

\[ {S_{1}}\le 2R{e^{-{c^{2}}}}+2{R^{2}}{e^{-{c^{2}}}}.\]

For ${S_{2}}$ we have ${u_{n}}=R$ and ${l_{n}}=\sqrt{n}-c$ such that

\[ \frac{1}{{\mu _{n}}}=\frac{1}{1-\gamma (n+1,{(\sqrt{n}-c)^{2}})\gamma (n+1,{R^{2}})}\cdot \]

Moreover, for $n+1\le {R^{2}}$, we know that

\[\begin{aligned}{}\gamma (n+1,{R^{2}})& \ge \frac{\Gamma (n+1)}{2}\end{aligned}\]

and

\[\begin{aligned}{}\gamma (n+1,{(\sqrt{n}-c)^{2}})& \le {e^{-{c^{2}}}}\Gamma (n+1).\end{aligned}\]

Combine these identities with the well-known fact

\[ \sum \limits_{i}\log \left(\frac{1}{1-{\epsilon _{i}}}\right)\le \frac{{\textstyle\sum _{i}}{\epsilon _{i}}}{1-\max {\epsilon _{i}}}\]

to obtain

\[ {S_{2}}\le \frac{2cR{e^{-{c^{2}}}}}{1-{e^{-{c^{2}}}}}\cdot \]

Finally for ${S_{3}}$, we have ${u_{n}}=R$ and ${l_{n}}=R-c$, so that we have

\[ \frac{1}{{\mu _{n}}}=\frac{1}{1-\frac{\gamma (n+1,{(R-c)^{2}})}{\gamma (n+1,{R^{2}})}}\cdot \]

Then remark that

\[\begin{aligned}{}\frac{\gamma (n,{(R-c)^{2}})}{\gamma (n,{R^{2}})}& \le \frac{{(R-c)^{2n}}{e^{-{(R-c)^{2}}}}/(n-{(R-c)^{2}})}{{R^{2n}}{e^{-{R^{2}}}}/n}\\ {} & \le \frac{{(1-\frac{c}{R})^{2n}}{e^{{R^{2}}-{(R-c)^{2}}}}}{1-\frac{{(R-c)^{2}}}{n}}\\ {} & \le \frac{{(1-\frac{c}{R})^{2(n-{R^{2}})}}{e^{-2{R^{2}}\frac{c}{R}-{R^{2}}\frac{{c^{2}}}{{R^{2}}}+2cR-{c^{2}}}}}{1-\frac{{(R-c)^{2}}}{n}}\\ {} & \le \frac{{(1-\frac{c}{R})^{2(n-{R^{2}})}}{e^{-2{c^{2}}}}}{1-\frac{{(R-c)^{2}}}{{R^{2}}}}\\ {} & \le \frac{R}{c}{\bigg(1-\frac{c}{R}\bigg)^{2(n-{R^{2}})}}{e^{-2{c^{2}}}}.\end{aligned}\]

Thus, summing from ${R^{2}}$ to ∞ we get

\[ {S_{3}}\le \frac{\frac{{R^{2}}}{{c^{2}}}{e^{-2{c^{2}}}}}{1-Rc{e^{-2{c^{2}}}}}\cdot \]

The proof is thus complete. □

Fig. 4.

Simulation of determinantal point processes. Left: kernel $K(z,w)=\frac{1}{\pi }{e^{-\frac{1}{2}(|z{|^{2}}+|w{|^{2}})+z\overline{w}}}$, restricted to the disk of radius 100. Right: kernel $K(z,w)=\frac{1}{\pi (1-z\overline{w})}$, restricted to the disk of radius 0.9995

5.3 Experimental results

An implementation in Python of this algorithm publicly available in [26] allowed us to sample $10\hspace{0.1667em}002$ points in $2\hspace{0.1667em}128$ seconds on a 8 core 3Ghz CPU for the Ginibre kernel restricted to a disk of radius 100, shown in Figure 4. In the case of a projection determinantal process, the last matrix V should be 0 if the vectors ${e_{j}}$ are orthonormal. In our simulation, the norm of V is $9.42\times {10^{-11}}$ which is an indicator on the small numerical error that we had with our approximation. For the simulation in a disk of radius 30, leading to roughly 900 points, we ran 10 simulations that all took between 3.2 and 3.4 seconds. The same approach can be used for the DPP with the so-called Bergmann kernel [19]. In this case, the simulation in a disk of radius 0.9995 leads to roughly 1000 points. We ran 10 simulations that all took between 5.9 and 7.3 seconds.

Authors

Abstract

1 Introduction

Fig. 1.

2 Determinantal point processes

Definition 1.

Definition 2.

Definition 3.

(1)

2.1 Determinantal point processes

Definition 4.

Definition 5.

(2)

Theorem 6.

(3)

Remark 1.

Remark 2.

(4)

(5)

2.1.1 Simulation of DPP

Lemma 7.

Algorithm 1:

Table 1.

Remark 3.

3 Distances derived from optimal transport

(6)

Theorem 8.

Theorem 9 (See [12, Chapter 11]).

4 Distances between point processes

Definition 10.

(7)

Remark 4.

Fig. 2.

Theorem 11.

(8)

4.1 Distances between DPP

Lemma 12.

(9)

Proof.

Theorem 13.

(10)

Proof.

(11)

Corollary 14.

Theorem 15.

Proof.

(12)

Lemma 16.

Proof.

Theorem 17.

Fig. 3.

4.2 Determinantal projection processes

(13)

Theorem 18.

Proof.

5 Simulation

Theorem 19.

Proof.

(14)

Lemma 20.

Proof.

5.1 Inverse transform sampling

(15)

Lemma 21 (Cumulative distribution of the modulus).

(16)

(17)

Lemma 22 (Cumulative distribution of the argument).

(18)

Algorithm 2:

5.2 Compact Ginibre and approximation

(19)

(20)

Remark 5 (Complexity simplification).

Theorem 23 (Bakry–Emery–Blower theorem).

(21)

Theorem 24.

Proof.

Lemma 25.

Proof.

Fig. 4.