1 Introduction and overview
The Malliavin–Stein method for probabilistic approximations was initiated in the paper [64], with the aim of providing a quantitative counterpart to the (one- and multi-dimensional) central limit theorems for random variables living in the Wiener chaos of a general separable Gaussian field. As formally discussed in the sections to follow, the basic idea of the approach initiated in [64] is that, in order to assess the discrepancy between some target law (Normal or Gamma, for instance), and the distribution of a nonlinear functional of a Gaussian field, one can fruitfully apply infinite-dimensional integration by parts formulae from the Malliavin calculus of variations [57, 66, 77, 78] to the general bounds associated with the so-called Stein’s method for probabilistic approximations [66, 23]. In particular, the Malliavin–Stein approach captures and amplifies the essence of [21], where Stein’s method was combined with finite-dimensional integration by parts formulae for Gaussian vectors, in order to deduce second order Poincaré inequalities – as applied to random matrix models with Gaussian-subordinated entries (see also [70, 96]).
We recall that, as initiated by P. Malliavin in the path-breaking reference [56], the Malliavin calculus is an infinite-dimensional differential calculus, whose operators act on smooth nonlinear functionals of Gaussian fields (or of more general probabilistic objects). As vividly described in the classical references [57, 77], as well as in the more recent books [66, 78], since its inception such a theory has generated a staggering number of applications, ranging, e.g., from mathematical physics to stochastic differential equations, and from mathematical finance to stochastic geometry (in particular, models involving stabilization, but also hyperplane, flat or cylinder processes), analysis on manifolds and mathematical statistics. On the other hand, the similarly successful and popular Stein’s method (as created by Ch. Stein in the classical reference [92] – see also the 1986 monograph [93]) is a collection of analytical techniques, allowing one to estimate the distance between the distributions of two random objects, by using characterizing differential operators (or difference operator in the case where the random variables of interest are discrete). The discovery in [64] that the two theories can be fruitfully combined has been a major breakthrough in the domain of probabilistic limit theorems and approximations.
Since the publication of [64], the Malliavin–Stein method has generated several hundreds of papers, with ramifications in many (often unexpected) directions, including functional inequalities, random matrix theory, stochastic geometry, noncommutative probability and computer sciences. Many of hese developments largely exceed the scope of the present survey, and we invite the interested reader to consult the following general references (i)–(iii) for a more detailed presentation: (i) the webpage [1] is a constantly updated resource, listing all existing papers written around the Malliavin–Stein method; (ii) the monograph [66], written in 2012, contains a self-contained presentation of Malliavin calculus and Stein’s method, as applied to functionals of general Gaussian fields, with specific emphasis on random variables belonging to a fixed Wiener chaos; (iii) the text [81] is a collection of surveys, containing an in-depth presentation of variational techniques on the Poisson space (including the Malliavin–Stein method), together with their application to asymptotic problems arising in stochastic geometry. The following more specific references (a)–(c) point to some recent developments that we find particularly exciting and ripe for further developments: (a) the papers [58, 59, 68, 82, 85, 88, 94] provide a representative overview of applications of Malliavin–Stein techniques to the study of nodal sets associated with Gaussian random fields on two-dimensional manifolds; (b) the papers [62, 74] – and many of the reference therein – display a pervasive use of Malliavin–Stein techniques to determine rates of convergence in total variation in the Breuer–Major Theorem; (c) references [19, 61] deal with the problem of tightness and functional convergence in the Breuer–Major theorem evoked at Point (b).
The aim of the present survey is twofold. On the one hand, we aim at presenting the essence of the Malliavin–Stein method for functionals of Gaussian fields, by discussing the crucial elements of Malliavin calculus and Stein’s method together with their interaction (see Section 2 and Section 3). On the other hand, we aim at introducing the reader to some of the most recent developments of the theory, with specific focus on the general theory of Markov semigroups in a diffusive setting (following the seminal references [52, 5], as well as [73, 53, 54]), and on integration by parts formulae (and associated operators) in the context of functionals of a random point measure [37, 38, 55, 49, 48, 90]. This corresponds to the content of Section 4 and Section 5, respectively. Finally, Section 6 deals with some recent results (and open problems) concerning ${\chi ^{2}}$ approximations.
From now on, every random object will be defined on a suitable common probability space $(\Omega ,\mathcal{F},\mathbb{P})$, with $\mathbb{E}$ indicating mathematical expectation with respect to $\mathbb{P}$. Throughout the paper, the symbol $\mathcal{N}(\mu ,{\sigma ^{2}})$ will be a shorthand for the one-dimensional Gaussian distribution with mean $\mu \in \mathbb{R}$ and variance ${\sigma ^{2}}>0$. In particular, $X\sim \mathcal{N}(\mu ,{\sigma ^{2}})$ if and only if
\[ \mathbb{P}[X\in A]={\int _{A}}{e^{-\frac{{(x-\mu )^{2}}}{2{\sigma ^{2}}}}}\frac{dx}{\sqrt{2\pi {\sigma ^{2}}}},\]
for every Borel set $A\subset \mathbb{R}$.2 Elements of Stein’s method for normal approximations
In this section, we briefly introduce the main ingredients of Stein’s method for normal approximations in dimension one. The approximation will be performed with respect to the total variation and 1-Wasserstein distances between the distributions of two random variables; more detailed information about these distances can be found in [66, Appendix C] and the references therein.
The crucial intuition behind Stein’s method lies in the following heuristic reasoning: it is a well-known fact (see, e.g., Lemma 2.1-(e) below) that a random variable X has the standard $\mathcal{N}(0,1)$ distribution if and only if
for every smooth mapping $f:\mathbb{R}\to \mathbb{R}$; heuristically, it follows that, if X is a random variable such that the quantity $\mathbb{E}[Xf(X)-{f^{\prime }}(X)]$ is close to zero for a large class of test functions f, then the distribution of X should be close to Gaussian.
The fact that such a heuristic argument can be made rigorous and applied in a wide array of probabilistic models was the main discovery of Stein’s original contribution [92], where the foundations of Stein’s method were first laid. The reader is referred to Stein’s monograph [93], as well as the books [23, 66], for an exhaustive presentation of the theory and its applications (in particular, for extensions to multidimensional approximations).
We recall that the total variation distance, between the laws of two real-valued random variables F and G, is defined by
One has to note that the topology induced by the distance ${d_{TV}}$ – on the set of all probability measures on $\mathbb{R}$ – is stronger than the topology of convergence in distribution; one sometimes uses the following equivalent representation of ${d_{TV}}$ (see, e.g., [66, p. 213]):
(2.2)
\[ {d_{TV}}(F,G):=\underset{B\in \mathcal{B}(\mathbb{R})}{\sup }\Big|\mathbb{P}[F\in B]-\mathbb{P}[G\in B]\Big|.\]The 1-Wasserstein distance ${d_{W}}$, between the distributions of two real-valued integrable random variables F and G, is given by
where $\mathrm{Lip}(\mathrm{K})$, $K>0$, stands for the class of all Lipschitz mappings $h:\mathbb{R}\to \mathbb{R}$ such that h has a Lipschitz constant $\le K$. As for total variation, the topology induced by ${d_{W}}$ – on the set of all probability measures on $\mathbb{R}$ having a finite absolute first moment – is stronger than the topology of convergence in distribution; it is also interesting to recall the dual representation
where the infimum is taken over all couplings $(X,Y)$ of F and G; see, e.g., [97, p. 95] for a discussion of this fact.
(2.4)
\[ {d_{W}}(F,G):=\underset{h\in \mathrm{Lip}(\mathrm{1})}{\sup }\Big|\mathbb{E}[h(F)]-\mathbb{E}[h(G)]\Big|,\]The following classical result, whose complete proof can be found, e.g., in [66, p. 64 and p. 67], contains all the elements of Stein’s method that are needed for our discussion; as for many fundamental findings in the area, this result can be traced back to [92].
Sketch of the proof. Points (a) and (b) can be verified by a direct computation. Point (c) and Point (d) follow by plugging the left-hand side of (2.7) into (2.3) and (2.4), respectively. Finally, the fact that the relation $\mathbb{E}[{f^{\prime }}(X)-Xf(X)]=0$ implies that $X\sim \mathcal{N}(0,1)$ is a direct consequence of Point (c), whereas the reverse implication follows by an integration by parts argument. □
Lemma 2.1.
Let $N\sim \mathcal{N}(0,1)$ be a standard Gaussian random variable.
-
(a) Fix $h:\mathbb{R}\to [0,1]$, a Borel-measurable function. Define ${f_{h}}:\mathbb{R}\to \mathbb{R}$ as
(2.6)
\[ {f_{h}}(x):={e^{\frac{{x^{2}}}{2}}}{\int _{-\infty }^{x}}\{h(y)-\mathbb{E}[h(N)]\}{e^{-\frac{{y^{2}}}{2}}}dy,\hspace{1em}x\in \mathbb{R}.\] -
(c) Let X be an integrable random variable. Then where the supremum is taken over all pairs $(f,{f^{\prime }})$ such that f is a Lipschitz function whose absolute value is bounded by $\sqrt{\frac{\pi }{2}}$, and ${f^{\prime }}$ is a version of the derivative of f satisfying $\| {f^{\prime }}{\| _{\infty }}\le 2$.
-
(d) Let X be an integrable random variable. Then, where the supremum is taken over all ${C^{1}}$ functions $f:\mathbb{R}\to \mathbb{R}$ such that $\| {f^{\prime }}{\| _{\infty }}\le 2$ and ${f^{\prime }}\in \mathrm{Lip}(2)$.
-
(e) Let X be a general random variable. Then $X\sim \mathcal{N}(0,1)$ if and only if $\mathbb{E}[{f^{\prime }}(X)-Xf(X)]=0$ for every absolutely continuous function f such that $\mathbb{E}|{f^{\prime }}(N)|<+\infty $.
3 Normal approximation with Stein’s method and Malliavin calculus
The first part of the present section contains some elements of Gaussian analysis and Malliavin calculus. The reader can consult, for instance, the references [66, 77, 57, 78] for further details. In Section 3.2 we will shortly explore the connection between Malliavin calculus and the version of Stein’s method presented in Section 2.
3.1 Isonormal processes, multiple integrals, and the Malliavin operators
Let $\mathfrak{H}$ be a real separable Hilbert space. For any $q\ge 1$, we write ${\mathfrak{H}^{\otimes q}}$ and ${\mathfrak{H}^{\odot q}}$ to indicate, respectively, the qth tensor power and the qth symmetric tensor power of $\mathfrak{H}$; we also set by convention ${\mathfrak{H}^{\otimes 0}}={\mathfrak{H}^{\odot 0}}=\mathbb{R}$. When $\mathfrak{H}={L^{2}}(A,\mathcal{A},\mu )=:{L^{2}}(\mu )$, where μ is a σ-finite and nonatomic measure on the measurable space $(A,\mathcal{A})$, then ${\mathfrak{H}^{\otimes q}}\simeq {L^{2}}({A^{q}},{\mathcal{A}^{q}},{\mu ^{q}})=:{L^{2}}({\mu ^{q}})$, and ${\mathfrak{H}^{\odot q}}\simeq {L_{s}^{2}}({A^{q}},{\mathcal{A}^{q}},{\mu ^{q}}):={L_{s}^{2}}({\mu ^{q}})$, where ${L_{s}^{2}}({\mu ^{q}})$ stands for the subspace of ${L^{2}}({\mu ^{q}})$ composed of those functions that are ${\mu ^{q}}$-almost everywhere symmetric. We denote by $W=\{W(h):h\in \mathfrak{H}\}$ an isonormal Gaussian process over $\mathfrak{H}$. This means that W is a centered Gaussian family with a covariance structure given by the relation $\mathbb{E}\left[W(h)W(g)\right]={\langle h,g\rangle _{\mathfrak{H}}}$. Without loss of generality, we can also assume that $\mathcal{F}=\sigma (W)$, that is, $\mathcal{F}$ is generated by W, and use the shorthand notation ${L^{2}}(\Omega ):={L^{2}}(\Omega ,\mathcal{F},\mathbb{P})$.
For every $q\ge 1$, the symbol ${C_{q}}$ stands for the qth Wiener chaos of W, defined as the closed linear subspace of ${L^{2}}(\Omega )$ generated by the family $\{{H_{q}}(W(h)):h\in \mathfrak{H},{\left\| h\right\| _{\mathfrak{H}}}=1\}$, where ${H_{q}}$ is the qth Hermite polynomial, defined as follows:
We write by convention ${C_{0}}=\mathbb{R}$. For any $q\ge 1$, the mapping ${I_{q}}({h^{\otimes q}})={H_{q}}(W(h))$ can be extended to a linear isometry between the symmetric tensor product ${\mathfrak{H}^{\odot q}}$ (equipped with the modified norm $\sqrt{q!}{\left\| \cdot \right\| _{{\mathfrak{H}^{\otimes q}}}}$) and the qth Wiener chaos ${C_{q}}$. For $q=0$, we write by convention ${I_{0}}(c)=c$, $c\in \mathbb{R}$.
(3.1)
\[ {H_{q}}(x)={(-1)^{q}}{e^{\frac{{x^{2}}}{2}}}\frac{{d^{q}}}{d{x^{q}}}\big({e^{-\frac{{x^{2}}}{2}}}\big).\]It is well known that ${L^{2}}(\Omega )$ can be decomposed into the infinite orthogonal sum of the spaces ${C_{q}}$: this means that any square-integrable random variable $F\in {L^{2}}(\Omega )$ admits the following Wiener–Itô chaotic expansion
where the series converges in ${L^{2}}(\Omega )$, ${f_{0}}=E[F]$, and the kernels ${f_{q}}\in {\mathfrak{H}^{\odot q}}$, $q\ge 1$, are uniquely determined by F. For every $q\ge 0$, we denote by ${J_{q}}$ the orthogonal projection operator on the qth Wiener chaos. In particular, if $F\in {L^{2}}(\Omega )$ has the form (3.2), then ${J_{q}}F={I_{q}}({f_{q}})$ for every $q\ge 0$.
Let $\{{e_{k}},\hspace{0.1667em}k\ge 1\}$ be a complete orthonormal system in $\mathfrak{H}$. Given $f\in {\mathfrak{H}^{\odot p}}$ and $g\in {\mathfrak{H}^{\odot q}}$, for every $r=0,\dots ,p\wedge q$, the contraction of f and g of order r is the element of ${\mathfrak{H}^{\otimes (p+q-2r)}}$ defined by
Notice that the definition of $f{\otimes _{r}}g$ does not depend on the particular choice of $\{{e_{k}},\hspace{0.1667em}k\ge 1\}$, and that $f{\otimes _{r}}g$ is not necessarily symmetric; we denote its symmetrization by $f{\widetilde{\otimes }_{r}}g\in {\mathfrak{H}^{\odot (p+q-2r)}}$. Moreover, $f{\otimes _{0}}g=f\otimes g$ equals the tensor product of f and g while, for $p=q$, $f{\otimes _{q}}g={\langle f,g\rangle _{{\mathfrak{H}^{\otimes q}}}}$. When $\mathfrak{H}={L^{2}}(A,\mathcal{A},\mu )$ and $r=1,\dots ,p\wedge q$, the contraction $f{\otimes _{r}}g$ is the element of ${L^{2}}({\mu ^{p+q-2r}})$ given by
(3.3)
\[ f{\otimes _{r}}g={\sum \limits_{{i_{1}},\dots ,{i_{r}}=1}^{\infty }}{\langle f,{e_{{i_{1}}}}\otimes \cdots \otimes {e_{{i_{r}}}}\rangle _{{\mathfrak{H}^{\otimes r}}}}\otimes {\langle g,{e_{{i_{1}}}}\otimes \cdots \otimes {e_{{i_{r}}}}\rangle _{{\mathfrak{H}^{\otimes r}}}}.\](3.4)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}& & \displaystyle f{\otimes _{r}}g({x_{1}},\dots ,{x_{p+q-2r}})\\ {} & & \displaystyle ={\int _{{A^{r}}}}f({x_{1}},\dots ,{x_{p-r}},{a_{1}},\dots ,{a_{r}})\times \\ {} & & \displaystyle \hspace{1em}\hspace{1em}\hspace{1em}\hspace{1em}\times g({x_{p-r+1}},\dots ,{x_{p+q-2r}},{a_{1}},\dots ,{a_{r}})d\mu ({a_{1}})...d\mu ({a_{r}}).\end{array}\]It is a standard fact of Gaussian analysis that the following multiplication formula holds: if $f\in {\mathfrak{H}^{\odot p}}$ and $g\in {\mathfrak{H}^{\odot q}}$, then
We now introduce some basic elements of the Malliavin calculus with respect to the isonormal Gaussian process W.
Let $\mathcal{S}$ be the set of all cylindrical random variables of the form
where $n\ge 1$, $g:{\mathbb{R}^{n}}\to \mathbb{R}$ is an infinitely differentiable function such that its partial derivatives have polynomial growth, and ${\varphi _{i}}\in \mathfrak{H}$, $i=1,\dots ,n$. The Malliavin derivative of F with respect to W is the element of ${L^{2}}(\Omega ,\mathfrak{H})$ defined as
\[ DF\hspace{0.2778em}=\hspace{0.2778em}{\sum \limits_{i=1}^{n}}\frac{\partial g}{\partial {x_{i}}}\left(W({\varphi _{1}}),\dots ,W({\varphi _{n}})\right){\varphi _{i}}.\]
In particular, $DW(h)=h$ for every $h\in \mathfrak{H}$. By iteration, one can define the mth derivative ${D^{m}}F$, which is an element of ${L^{2}}(\Omega ,{\mathfrak{H}^{\odot m}})$, for every $m\ge 2$. For $m\ge 1$ and $p\ge 1$, ${\mathbb{D}^{m,p}}$ denotes the closure of $\mathcal{S}$ with respect to the norm $\| \cdot {\| _{m,p}}$, defined by the relation
\[ \| F{\| _{m,p}^{p}}\hspace{0.2778em}=\hspace{0.2778em}\mathbb{E}\left[|F{|^{p}}\right]+{\sum \limits_{i=1}^{m}}\mathbb{E}\left[\| {D^{i}}F{\| _{{\mathfrak{H}^{\otimes i}}}^{p}}\right].\]
We often use the (canonical) notation ${\mathbb{D}^{\infty }}:={\textstyle\bigcap _{m\ge 1}}{\textstyle\bigcap _{p\ge 1}}{\mathbb{D}^{m,p}}$. For example, it is a well-known fact that any random variable F that is a finite linear combination of multiple Wiener–Itô integrals is an element of ${\mathbb{D}^{\infty }}$. The Malliavin derivative D obeys the following chain rule. If $\phi :{\mathbb{R}^{n}}\to \mathbb{R}$ is continuously differentiable with bounded partial derivatives and if $F=({F_{1}},\dots ,{F_{n}})$ is a vector of elements of ${\mathbb{D}^{1,2}}$, then $\phi (F)\in {\mathbb{D}^{1,2}}$ and
Note also that a random variable F as in (3.2) is in ${\mathbb{D}^{1,2}}$ if and only if ${\textstyle\sum _{q=1}^{\infty }}q\| {J_{q}}F{\| _{{L^{2}}(\Omega )}^{2}}<\infty $ and in this case one has the following explicit relation:
\[ \mathbb{E}\left[\| DF{\| _{\mathfrak{H}}^{2}}\right]={\sum \limits_{q=1}^{\infty }}q\| {J_{q}}F{\| _{{L^{2}}(\Omega )}^{2}}.\]
If $\mathfrak{H}={L^{2}}(A,\mathcal{A},\mu )$ (with μ nonatomic), then the derivative of a random variable F as in (3.2) can be identified with the element of ${L^{2}}(A\times \Omega )$ given by
The operator L, defined as $\mathbf{L}={\textstyle\sum _{q=0}^{\infty }}-q{J_{q}}$, is the infinitesimal generator of the Ornstein–Uhlenbeck semigroup. The domain of L is
For any $F\in {L^{2}}(\Omega )$, we define ${\mathbf{L}^{-1}}F={\textstyle\sum _{q=1}^{\infty }}-\frac{1}{q}{J_{q}}(F)$. The operator ${\mathbf{L}^{-1}}$ is called the pseudoinverse of L. Indeed, for any $F\in {L^{2}}(\Omega )$, we have that ${\mathbf{L}^{-1}}F\in \mathrm{Dom}\mathbf{L}={\mathbb{D}^{2,2}}$, and
The following infinite dimensional Malliavin integration by parts formula plays a crucial role in the analysis (see, for instance, [66, Section 2.9] for a proof).
Inspired by the Malliavin integration by parts formula appearing in Lemma 3.1, we now introduce a class of iterated Gamma operators. We will need such operators in Section 6.
Definition 3.2 (See Chapter 8 in [66]).
Let $F\in {\mathbb{D}^{\infty }}$; the sequence of random variables ${\{{\Gamma _{i}}(F)\}_{i\ge 0}}\subset {\mathbb{D}^{\infty }}$ is recursively defined as follows. Set ${\Gamma _{0}}(F)=F$ and, for every $i\ge 1$,
Definition 3.3 (Cumulants).
Let F be a real-valued random variable such that $\mathbb{E}|F{|^{m}}<\infty $ for some integer $m\ge 1$, and write ${\varphi _{F}}(t)=\mathbb{E}[{e^{itF}}]$, $t\in \mathbb{R}$, for the characteristic function of F. Then, for $r=1,\dots ,m$, the rth cumulant of F, denoted by ${\kappa _{r}}(F)$, is given by
The following statement explicitly connects the expectation of the random variables ${\Gamma _{r}}(F)$ to the cumulants of F.
Proposition 3.5 (See Chapter 8 in [66]).
Let $F\in {\mathbb{D}^{\infty }}$. Then ${\kappa _{r}}(F)=(r-1)!\mathbb{E}[{\Gamma _{r-1}}(F)]$ for every $r\ge 1$.
As announced, in the next subsection we show how to use the above Malliavin machinery in order to study the Stein’s bounds presented in Section 2.
3.2 Connection with Stein’s method
Let $F\in {\mathbb{D}^{1,2}}$ with $\mathbb{E}[F]=0$ and $\mathbb{E}[{F^{2}}]=1$. Take a ${C^{1}}$ function such that $\| f\| \le \sqrt{\frac{\pi }{2}}$ and $\| {f^{\prime }}\| \le 2$. Using the Malliavin integration by parts formula stated in Lemma 3.1 together with the chain rule (3.7), we can write
If we furthermore assume that $F\in {\mathbb{D}^{1,4}}$, then the random variable $1-{\langle DF,-D{\mathbf{L}^{-1}}F\rangle _{\mathfrak{H}}}$ is square-integrable, using the Cauchy–Schwarz inequality we infer that
(3.12)
\[ \begin{aligned}{}\Big|\mathbb{E}[{f^{\prime }}(F)-Ff(F)]\Big|& =\Big|\mathbb{E}[{f^{\prime }}(F)\left(1-{\langle DF,-D{\mathbf{L}^{-1}}F\rangle _{\mathfrak{H}}}\right)]\Big|\\ {} & \le 2\hspace{0.1667em}\mathbb{E}\Big|1-{\langle DF,-D{\mathbf{L}^{-1}}F\rangle _{\mathfrak{H}}}\Big|.\end{aligned}\]
\[ \Big|\mathbb{E}[{f^{\prime }}(F)-Ff(F)]\Big|\le 2\sqrt{\operatorname{Var}\left({\langle DF,-D{\mathbf{L}^{-1}}F\rangle _{\mathfrak{H}}}\right)}.\]
Note that in above we used the fact that $\mathbb{E}[{\langle DF,-D{\mathbf{L}^{-1}}F\rangle _{\mathfrak{H}}}]=\mathbb{E}[{F^{2}}]=1$. The above arguments combined with Lemma 2.1 yield immediately2 the next crucial statement, originally proved in [64].Theorem 3.6.
Let $F\in {\mathbb{D}^{1,2}}$ be a generic random element with $\mathbb{E}[F]=0$ and $\mathbb{E}[{F^{2}}]=1$. Let $N\sim \mathcal{N}(0,1)$. Assume further that F has a density with respect to the Lebesgue measure. Then,
\[ {d_{TV}}(F,N)\le 2\hspace{0.1667em}\mathbb{E}\Big|1-{\langle DF,-D{\mathbf{L}^{-1}}F\rangle _{\mathfrak{H}}}\Big|.\]
Moreover, assume that $F\in {\mathbb{D}^{1,4}}$, then
\[ {d_{TV}}(F,N)\le 2\sqrt{\operatorname{Var}\left({\langle DF,-D{\mathbf{L}^{-1}}F\rangle _{\mathfrak{H}}}\right)}.\]
In particular case, if $F={I_{q}}(f)$ belongs to the Wiener chaos of order $q\ge 2$, then
Note that, by virtue of Lemma 2.1, similar bounds can be immediately obtained for the Wasserstein distance ${d_{W}}$ (and many more – see [66, Chapter 5]). In particular, the previous statement allows one to recover the following central limit theorem for chaotic random variables, first proved in [80].
Corollary 3.7 (Fourth Moment Theorem).
Let ${\{{F_{n}}\}_{n\ge 1}}={\{{I_{q}}({f_{n}})\}_{n\ge 1}}$ be a sequence of random elements in a fixed Wiener chaos of order $q\ge 2$ such that $\mathbb{E}[{F_{n}^{2}}]=q!\| {f_{n}}{\| ^{2}}=1$. Assume that $N\sim \mathcal{N}(0,1)$. Then, as n tends to infinity, the following assertions are equivalent.
As demonstrated by the webpage [1], the ‘fourth moment theorem’ stated in Corollary 3.7 has been the starting point of a very active line of research, composed of several hundred papers connected with disparate applications. In the next section, we will implicitly provide a general version of Theorem 3.6 (with the 1-Wasserstein distance replacing the total variation distance), whose proof relies only on the spectral properties of the Ornstein–Uhlenbeck generator L and on the so-called Γ calculus (see, e.g., [18]).
4 The Markov triple approach
In this section, we introduce a general framework for studying and generalizing the fourth moment phenomenon appearing in the statement of Corollary 3.7. The forthcoming approach was first introduced in [52] by M. Ledoux, and then further developed and generalised in [5, 8].
4.1 Diffusive fourth moment structures
We start with definition of our general setup.
Definition 4.1.
A diffusive fourth moment structure is a triple $(E,\mu ,\mathbf{L})$ such that:
-
(a) $(E,\mu )$ is a probability space;
-
(b) L is a symmetric unbounded operator defined on some dense subset of ${L^{2}}(E,\mu )$, that we denote by $\mathcal{D}(\mathbf{L})$ (the set $\mathcal{D}(\mathbf{L})$ is called the domain of L);
-
(d) the operator L is diffusive, meaning that, for any ${\mathcal{C}_{b}^{2}}$ function $\varphi :\mathbb{R}\to \mathbb{R}$, any $X\in \mathcal{D}(\mathbf{L})$, it holds that $\varphi (X)\in \mathcal{D}(\mathbf{L})$ and
(4.2)
\[ \mathbf{L}\left[\varphi (X)\right]={\varphi ^{\prime }}(X)\mathbf{L}[X]+{\varphi ^{\prime\prime }}(X)\Gamma [X,X];\]
In this context, we usually write $\Gamma [X]$ instead of $\Gamma [X,X]$ and $\mathbb{E}$ denotes the integration against probability measure μ.
Remark 4.2.
-
(1) Property (d) together with symmetric property of the operator L determine a functional calculus through the following fundamental integration by parts formula: for any X, Y in $\mathcal{D}(\mathbf{L})$ and $\varphi \in {\mathcal{C}_{b}^{2}}$,
-
(2) The results in this section can be stated under the weaker assumption that $\textbf{sp}(-\mathbf{L})=\{0={\lambda _{0}}<{\lambda _{1}},\dots ,{\lambda _{k}}<\cdots \hspace{0.1667em}\}\subset {\mathbb{R}_{+}}$ is discrete. However, to keep a transparent presentation, we restrict ourselves to the assumption $\textbf{sp}(-\mathbf{L})=\mathbb{N}$. The reader is referred to [5] for further details.
-
(3) We point out that, by a recursive argument, assumption (4.3) yields that for any $X\in \textbf{Ker}(\mathbf{L}+p\textbf{Id})$ and any polynomial P of degree m, we have
-
(4) The eigenspaces of a diffusive fourth moment structure are hypercontractive (see [10] for details and sufficient conditions), that is, there exists a constant $C(M,k)$ such that for any $X\in {\textstyle\bigoplus _{i\le M}}\textbf{Ker}\left(\mathbf{L}+i\textbf{Id}\right)$:
-
(5) Property (f) in the previous definition roughly implies that eigenfunctions of L in a diffusive fourth moment structure behave like orthogonal polynomials with respect to multiplication.
For further details on our setup, we refer the reader to [18] as well as [5, 8]. The next example describes some diffiusive fourth moment structures. The reader can consult [8, Section 2.2] for two classical methods for building further diffusive fourth moment structures starting from known ones.
Example 4.3.
-
(a) Finite-Dimensional Gaussian Structures: Let $d\ge 1$ and denote by ${\gamma _{d}}$ the d-dimensional standard Gaussian measure on ${\mathbb{R}^{d}}$. It is well known (see, for example, [18]), that ${\gamma _{d}}$ is the invariant measure of the Ornstein–Uhlenbeck generator, defined for any test function φ by
(4.7)
\[ \mathbf{L}\varphi (x)=\Delta \varphi -{\sum \limits_{i=1}^{d}}{x_{i}}{\partial _{i}}\varphi (x).\]\[ \textbf{Ker}(\mathbf{L}+k\textbf{Id})=\left\{\sum \limits_{{i_{1}}+{i_{2}}+\cdots +{i_{d}}=k}\alpha ({i_{1}},\dots ,{i_{d}}){\prod \limits_{j=1}^{d}}{H_{{i_{j}}}}({x_{j}})\right\},\]where ${H_{n}}$ denotes the Hermite polynomial of order n. Since, eigenfunctions of L are multivariate polynomials so it is straightforward to see that assumption (f) is also verified. -
(b) Wiener space and isonormal processes: Letting $d\to \infty $ in the setup of the previous item (a) one recovers the infinite dimensional generator of the Ornstein–Uhlenbeck semigroup for isonormal processes, as defined in Section 3.1. It is easily verified in particular, by using (3.5), that $(\Omega ,\mathcal{F},\mathbf{L})$ is also a diffusive fourth moment structure.
-
(c) Laguerre Structure: Let $\nu \ge -1$, and ${\pi _{1,\nu }}(dx)={x^{\nu -1}}\frac{{\mathrm{e}^{-x}}}{\Gamma (\nu )}{\textbf{1}_{(0,\infty )}}\mathrm{d}x$ be the Gamma distribution with parameter ν on ${\mathbb{R}_{+}}$. The associated Laguerre generator is defined for any test function φ (in dimension one) by
(4.8)
\[ {\mathbf{L}_{1,\nu }}(\varphi )=x{\varphi ^{\prime\prime }}(x)+(\nu +1-x){\varphi ^{\prime }}(x).\]\[ {\pi _{d,\nu }}(\mathrm{d}x)={\pi _{1,\nu }}(\mathrm{d}{x_{1}}){\pi _{1,\nu }}(\mathrm{d}{x_{2}})\cdots {\pi _{1,\nu }}(\mathrm{d}{x_{d}}),\]where $x=({x_{1}},{x_{2}},\dots ,{x_{d}})$:It is also classical that (see, for example, [18]) the spectrum of ${\mathbf{L}_{d,\nu }}$ is given by $-{\mathbb{N}_{0}}$ and moreover that(4.10)
\[ \textbf{Ker}({\mathbf{L}_{d,p}}+k\textbf{Id})=\left\{\sum \limits_{{i_{1}}+{i_{2}}+\cdots +{i_{d}}=k}\alpha ({i_{1}},\dots ,{i_{d}}){\prod \limits_{j=1}^{d}}{L_{{i_{j}}}^{(\nu )}}({x_{j}})\right\},\]
In the next subsection, we demonstrate how a diffusive fourth moment structure can be combined with the tools of Γ calculus, in order to deduce substantial generalizations of Theorem 3.6.
4.2 Connection with Γ calculus
Throughout this section, we assume that $(E,\mu ,\mathbf{L})$ is a diffiusive fourth moment structure. Our principal aim is to prove a fourth moment criterion analogous to that of (3.13) for eigenfunctions of the operator L. To do this, we assume that $X\in \textbf{Ker}(\mathbf{L}+q\textbf{Id})$ for some $q\ge 1$ with $\mathbb{E}[{X^{2}}]=1$. The arguments implemented in the proof will clearly demonstrate that requirements (d) and (f) in Definition 4.1 are the most crucial elements in order to establish our estimates.
Proof.
First note that by using integration by parts formula (4.4), we have $\mathbb{E}[\Gamma [X]]=-\mathbb{E}[X\mathbf{L}X]=q\mathbb{E}[{X^{2}}]=q$. Secondly, by using the definition of the carré-du-champ operator Γ and the fact that $\mathbf{L}X=-qX$, one easily verifies that
Next, taking into account properties (f) and (g) we can conclude that
\[ {X^{2}}-1\in \underset{1\le i\le 2q}{\bigoplus }\textbf{Ker}\left(\mathbf{L}+i\textbf{Id}\right).\]
For the rest of the proof, we use the notation ${J_{i}}$ to denote the projection of a square-integrable element X onto the eigenspace $\textbf{Ker}\left(\mathbf{L}+i\textbf{Id}\right)$. Now,
\[\begin{aligned}{}& \operatorname{Var}\left(\Gamma [X]\right)\\ {} & =\mathbb{E}\left[{\left(\Gamma [X]-q\right)^{2}}\right]=\frac{1}{4}\mathbb{E}\left[\left(\mathbf{L}+2q\textbf{Id}\right)({X^{2}}-1)\times \left(\mathbf{L}+2q\textbf{Id}\right)({X^{2}}-1)\right]\\ {} & =\frac{1}{4}\mathbb{E}\left[\mathbf{L}({X^{2}}-1)\left(\mathbf{L}+2q\textbf{Id}\right)({X^{2}}-1)\right]\hspace{-0.1667em}+\hspace{-0.1667em}\frac{q}{2}\mathbb{E}\left[({X^{2}}-1)\left(\mathbf{L}+2q\textbf{Id}\right)({X^{2}}-1)\right]\\ {} & =\frac{1}{4}\hspace{-0.1667em}\hspace{-0.1667em}\sum \limits_{1\le i\le 2q}(-i)(2q-i)\mathbb{E}\left[{\left({J_{i}}({X^{2}}-1)\right)^{2}}\right]\hspace{-0.1667em}+\hspace{-0.1667em}\frac{q}{2}\mathbb{E}\left[({X^{2}}-1)\left(\mathbf{L}+2q\textbf{Id}\right)({X^{2}}-1)\right]\\ {} & \le \frac{q}{2}\mathbb{E}\left[({X^{2}}-1)\left(\mathbf{L}+2q\textbf{Id}\right)({X^{2}}-1)\right]\\ {} & =q\mathbb{E}\left[({X^{2}}-1)(\Gamma [X]-q)\right]=q\mathbb{E}\left[({X^{2}}-1)\Gamma [X]\right]\\ {} & =q\mathbb{E}\left[\Gamma [\frac{{X^{3}}}{3}-X,X]\right]=-q\mathbb{E}\left[\left(\frac{{X^{3}}}{3}-X\right)\mathbf{L}X\right]\\ {} & ={q^{2}}\mathbb{E}\left[X\left(\frac{{X^{3}}}{3}-X\right)\right]={q^{2}}\mathbb{E}\left[\frac{{X^{4}}}{3}-{X^{2}}\right]\\ {} & =\frac{{q^{2}}}{3}\left\{\mathbb{E}[{X^{4}}]-3\right\},\end{aligned}\]
thus yielding the desired conclusion. □In order to avoid some technicalities, we now present a quantitative bound in the 1-Wasserstein distance ${d_{W}}$ (and not in the more challenging total variation distance ${d_{TV}}$) for eigenfunctions of the operator L. This requires to adapt the Stein’s method machinery presented in Section 2 to our setting, as a direct application of the integration by part formula (4.4). The arguments below are borrowed in particular from [52, Proposition 1].
Proposition 4.5.
Let $(E,\mu ,\mathbf{L})$ be a diffiusive fourth moment structure. Assume that $X\in \textbf{\textit{Ker}}(\mathbf{L}+q\textbf{\textit{Id}})$ for some $q\ge 1$ with $\mathbb{E}[{X^{2}}]=1$. Let $N\sim \mathcal{N}(0,1)$. Then,
Proof.
For every function f of class ${C^{1}}$ on $\mathbb{R}$, with $\| {f^{\prime }}{\| _{\infty }}\le 1$ and ${f^{\prime }}\in \mathrm{Lip}(2)$ according to Part (b) in Lemma 2.1, it is enough to show that
\[ \Big|\mathbb{E}\left[{f^{\prime }}(X)-Xf(X)\right]\Big|\le \frac{2}{q}\operatorname{Var}{\left(\Gamma [X]\right)^{\frac{1}{2}}}.\]
Since $\mathbf{L}X=-qX$, and diffusivity of the operator Γ together with integration by parts formula (4.4), one can write that
\[\begin{aligned}{}\mathbb{E}\left[{f^{\prime }}(X)-Xf(X)\right]& =\mathbb{E}\left[{f^{\prime }}(X)+\frac{1}{q}\mathbf{L}(X)f(X)\right]=\mathbb{E}\left[{f^{\prime }}(X)-\frac{1}{q}\Gamma [f(X),X]\right]\\ {} & =\mathbb{E}\left[{f^{\prime }}(X)-\frac{1}{q}{f^{\prime }}(X)\Gamma [X]\right]\\ {} & =\frac{1}{q}\mathbb{E}\left[{f^{\prime }}(X)\left(q-\Gamma [X]\right)\right].\end{aligned}\]
Now, the claim follows at once by using the Cauchy–Schwarz inequality and noting that $\mathbb{E}[\Gamma [X]]=q\hspace{0.1667em}\mathbb{E}[{X^{2}}]=q$. □We end this section with the following general version of the fourth moment theorem for eigenfunctions of the operator L, obtained by combining Propositions 4.4 and 4.5.
Theorem 4.6.
Let $(E,\mu ,\mathbf{L})$ be a diffiusive fourth moment structure. Assume that $X\in \textbf{\textit{Ker}}(\mathbf{L}+q\textbf{\textit{Id}})$ for some $q\ge 1$ with $\mathbb{E}[{X^{2}}]=1$. Let $N\sim \mathcal{N}(0,1)$. Then,
It follows that, if ${\{{X_{n}}\}_{n\ge 1}}$ is a sequence of eigenfunctions in a fixed eigenspace $\textbf{\textit{Ker}}(\mathbf{L}+q\textbf{\textit{Id}})$ where $q\ge 1$ and $\mathbb{E}[{X_{n}^{2}}]=1$ for all $n\ge 1$, then the following implication holds: $\mathbb{E}[{X_{n}^{4}}]\to 3$ if and only if ${X_{n}}$ converges in distribution towards the standard Gaussian random variable N.
Remark 4.7.
The fact that the condition $\mathbb{E}[{X_{n}^{4}}]\to 3$ is necessary for convergence to the Gaussian random variable is a direct consequence of the hypercontractive estimate (4.6).
4.3 Transport distances, Stein discrepancy and Γ calculus
The general setting of the Markov triple together with Γ calculus provide a suitable framework to study functional inequalities such as the classical logarithmic Sobolev inequality or the celebrated Talagrand quadratic transportation cost inequality. For simplicity, here we restrict ourselves to the setting of Wiener structure and the Gaussian measure to be our reference measure. The reader may consult references [53, 54] for a presentation of the general setting, and [72, 73] for some previous references connecting fourth moment theorems and entropic estimates.
Let $d\ge 1$, and $d\gamma (x)={(2\pi )^{-\frac{d}{2}}}{e^{-\frac{|x|}{2}}}dx$ be the standard Gaussian measure on ${\mathbb{R}^{d}}$. Assume that $d\nu =hd\gamma $ is a probability measure on ${\mathbb{R}^{d}}$ with a (smooth) density function $h:{\mathbb{R}^{d}}\to {\mathbb{R}_{+}}$ with respect to the Gaussian measure γ. Inspired from Gaussian integration by parts formula we introduce first the crucial notion of a Stein kernel ${\tau _{\nu }}$ associated with the probability measure ν and, then, the concept of Stein discrepancy.
Definition 4.8.
(a) A measurable matrix-valued map ${\tau _{\nu }}$ on ${\mathbb{R}^{d}}$ is called a Stein kernel for the centered probability measure ν if for every smooth test function $\phi :{\mathbb{R}^{d}}\to \mathbb{R}$,
\[ {\int _{{\mathbb{R}^{d}}}}x\cdot \nabla \phi d\nu ={\int _{{\mathbb{R}^{d}}}}{\langle {\tau _{\nu }},\operatorname{Hess}(\phi )\rangle _{\operatorname{HS}}}d\nu ,\]
where $\operatorname{Hess}(\phi )$ stands for the Hessian of ϕ, and ${\langle \hspace{0.1667em},\hspace{0.1667em}\rangle _{\operatorname{HS}}}$, and $\| \hspace{0.1667em},\hspace{0.1667em}{\| _{\operatorname{HS}}}$ denote the usual Hilbert–Schmidt scalar product and norm, respectively.(b) The Stein discrepancy of ν with respect to γ is defined as
\[ \operatorname{S}(\nu ,\gamma )=\inf {\Big({\int _{{\mathbb{R}^{d}}}}\| {\tau _{\nu }}-\textbf{Id}{\| _{\operatorname{HS}}^{2}}d\nu \Big)^{\frac{1}{2}}}\]
where the infimum is taken over all Stein kernels of ν, and takes the value $+\infty $ if a Stein kernel for ν does not exist.We recall that the Stein kernel ${\tau _{\nu }}$ is uniquely defined in dimension $d=1$, and that unicity may fail in higher dimensions $d\ge 2$, see [73, Appendix A]. Also, ${\tau _{\gamma }}={\textbf{Id}_{d\times d}}$ is the identity matrix. We further refer to [40, 25] for existence of the Stein kernel in general settings. The interest of the Stein’s discrepancy comes, e.g., from the fact that – as a simple application of Stein’s method –
\[ {d_{TV}}(\nu ,\gamma )\le 2{\int _{\mathbb{R}}}|{\tau _{\nu }}-1|d\nu \le 2{\Big({\int _{\mathbb{R}}}|{\tau _{\nu }}-1{|^{2}}d\nu \Big)^{\frac{1}{2}}},\]
yielding that ${d_{TV}}(\nu ,\gamma )\le 2\operatorname{S}(\nu ,\gamma )$; see [53] for further details.Next, we need the notion of Wasserstein distance. Let $p\ge 1$. Given two probability measures ν and μ on the Borel sets of ${\mathbb{R}^{d}}$, whose marginals have finite moments of order p, we define the p-Wasserstein distance between ν and μ as
\[ {\operatorname{W}_{p}}(\nu ,\mu )=\underset{\pi }{\inf }{\Big({\int _{{\mathbb{R}^{d}}\times {\mathbb{R}^{d}}}}|x-y{|^{p}}d\pi (x,y)\Big)^{\frac{1}{p}}}\]
where the infimum is taken over all probability measures π of ${\mathbb{R}^{d}}\times {\mathbb{R}^{d}}$ with marginals ν and μ; note that ${\mathrm{W}_{1}}={d_{W}}$, as defined in Section 2.We recall that, for a measure $\nu =h\gamma $ with a smooth density function h on ${\mathbb{R}^{d}}$,
is the relative entropy of the measure ν with respect to γ, and
is the Fisher information of ν with respect to γ. After having established these notions, we can state two popular probabilistic/entropic functional inequalities:
The next theorem is borrowed from [53], and represents a significant improvement of the previous logarithmic Sobolev and Talagrand inequalities based on the use of Stein discrepancies: the techniques used in the proof are based on an interpolation argument along the Ornstein–Uhlenbeck semigroup. The theorem establishes connections between the relative entropy H, the Stein discrepancy S, the Fisher information I, and the Wasserstein distance W, customarily called the HSI and the WSH inequalities. The reader is also referred to the recent works [40, 25, 89] for related estimates of the Stein discrepancy based on the use of Poincaré inequalities, as well as on optimal transport techniques. See [15] for a further amplification of the approach of [53], with applications to the quantitative multidimensional CLT in the 2-Wasserstein distance. See also [33].
Theorem 4.9.
The next subsection deals with the challenging problem of quantitative probabilistic approximations in infinite dimension.
4.4 Functional approximations and Dirichlet structures
Although Stein’s method is already successfully used for quantifying functional limit theorems of the Donsker type (see [11, 12], as well as [34, 35, 45, 91] for a discussion of recent developments), the general problem of assessing the discrepancy between probability distributions on infinite-dimensional spaces (like, e.g., on classes of smooth functions or on the Skorohod space) is essentially open.
In the last years a new direction of research has emerged, where the ideas behind the Malliavin–Stein approach are applied in the framework of Dirichlet structures, in order to deal with quantitative estimates on the probabilistic approximation of Hilbert space-valued random variables. A general (and impressive!) contribution on the matter is the recent work by Bourguin and Campese [17], where the authors are able to retrieve several Hilbert space counterparts of the finite-dimensional results discussed in Section 3 above. Bourguin and Campese’s approach (whose discussion requires preliminaries that go beyond the scope of our survey) represents a substantial addition to a line of investigation intiated by L. Coutin and L. Decreusefond in the seminal works [26, 29, 27, 28, 30].
As a quick illustration, we conclude the section with two representative statements, taken from [26, 30] and [29], respectively.
Theorem 4.10 (See [26] and Section 3.2 in [30]).
Let $({N_{\lambda }}(t):t\ge 0)$ be a Poisson process with intensity λ. Then, as $\lambda \to \infty $,
\[ \left(\frac{{N_{\lambda }}(t)-\lambda t}{\sqrt{\lambda }}:t\ge 0\right)\hspace{0.2778em}\Longrightarrow \hspace{0.2778em}\left(B(t):t\ge 0\right)\]
where the convergence takes place weakly in the Skorohkod space. Moreover, for every $\beta <\frac{1}{2}$ consider the so-called Besov–Liouville space ${I_{\beta ,2}}$,
\[ {I_{\beta ,2}}=\Big\{f\hspace{0.1667em}:\hspace{0.1667em}\exists \hspace{0.1667em}\dot{f},\hspace{0.1667em}f(x)=\frac{1}{\Gamma (\beta )}{\int _{0}^{x}}{(x-t)^{\beta -1}}\dot{f}(t)dt\Big\}.\]
Let ${\mu _{\beta }}$ denote the Wiener measure on the space ${I_{\beta ,2}}$, and ${Q_{\lambda }}$ be the probability measure induced by $\left({N_{\lambda }}(t):t\ge 0\right)$ . Then, there exists a constant ${c_{\beta }}$ such that
\[ \underset{\| F{\| _{{C_{b}^{2}}({I_{\beta ,2}},\mathbb{R})}}\le 1}{\sup }\Big|\int Fd{Q_{\lambda }}-\int Fd{\mu _{\beta }}\Big|\le \frac{{c_{\beta }}}{\sqrt{\lambda }}\]
where ${C_{b}^{2}}({I_{\beta ,2}},\mathbb{R})$ is the set of twice Fréchet differentiable functionals on ${I^{\beta ,2}}$.
The next result aims to provide a rate of convergence in the Donsker theorem in Wasserstein distance. Let $\eta \in (0,1)$, $p\ge 1$. Define the fractional Sobolev space ${W_{\eta ,p}}$ as the closure of the space ${C^{1}}$ w.r.t. norm
Also, for $n\ge 1$, define ${\mathcal{A}^{n}}=\{(k,j)\hspace{0.1667em}:\hspace{0.1667em}1\le k\le d,\hspace{0.1667em}0\le j\le n-1\}$, and let
\[ {S^{n}}=\sum \limits_{(k,j)\in {\mathcal{A}^{n}}}{X_{(k,j)}}{h_{(k,j)}^{n}},\hspace{1em}{h_{(k,j)}^{n}}(t)=\sqrt{n}{\int _{0}^{t}}{\textbf{1}_{[j/n,(j+1)/n]}}(s)ds\hspace{0.1667em}{e_{k}}\]
where $({e_{k}}):1\le k\le d$ is the canonical basis of ${\mathbb{R}^{d}}$, and $({X_{(k,j)}},(k,j)\in {\mathcal{A}^{n}})$ is a family of independent identically distributed, ${\mathbb{R}^{d}}$-valued, random variables with $\mathbb{E}[X]=0$, and $\mathbb{E}\| X{\| _{{\mathbb{R}^{d}}}^{2}}=1$, where X is a random variable which has their common distribution.Theorem 4.11 (See Section 3 in [29]).
Let $W={W_{\eta ,p}}\left([0,1],{\mathbb{R}^{d}}\right)$, and ${\mu _{\eta ,p}}$ be the law of the d-dimensional Brownian motion B on the space W. Then, there exists a constant c such that for $X\in {L^{p}}(W;{\mathbb{R}^{d}},{\mu _{\eta ,p}})$ with $p\ge 3$,
5 Bounds on the Poisson space: fourth moments, second-order Poincaré estimates and two-scale stabilization
We will now describe a nondiffusive Markov triple for which a fourth moment result analogous to Proposition 4.5 holds. Such a Markov triple is associated with the space of square-integrable functionals of a Poisson measure on a general pair $(Z,\mathcal{Z})$, where Z is a Polish space and $\mathcal{Z}$ is the associated Borel σ-field. The requirement that Z is Polish – together with several other assumptions adopted in the present section – is made in order to simplify the discussion; the reader is referred to [37, 38] for statements and proofs in the most general setting. See also [50, 51] for an exhaustive presentation of tools of stochastic analysis for functionals of Poisson processes, as well as [81] for a discussion of the relevance of variational techniques in the framework of modern stochastic geometry.
5.1 Setup
Let μ be a nonatomic σ-finite measure on $(Z,\mathcal{Z})$, and set ${\mathcal{Z}_{\mu }}:=\{B\in \mathcal{Z}\hspace{0.1667em}:\hspace{0.1667em}\mu (B)<\infty \}$. In what follows, we will denote by
a Poisson measure on $(Z,\mathcal{Z})$ with control (or intensity) μ. This means that η is a random field indexed by the elements of $\mathcal{Z}$, satisfying the following two properties: (i) for every finite collection ${B_{1}},\dots ,{B_{m}}\in \mathcal{Z}$ of pairwise disjoint sets, the random variables $\eta ({B_{1}}),\dots ,\eta ({B_{m}})$ are stochastically independent, and (ii) for every $B\in \mathcal{Z}$, the random variable $\eta (B)$ has the Poisson distribution with mean $\mu (B)$.3 Whenever $B\in {\mathcal{Z}_{\mu }}$, we also write $\hat{\eta }(B):=\eta (B)-\mu (B)$ and denote by
the compensated Poisson measure associated with η. Throughout this section, we assume that $\mathcal{F}=\sigma (\eta )$.
It is a well-known fact that one can regard the Poisson measure η as a random element taking values in the space ${\mathbf{N}_{\sigma }}={\mathbf{N}_{\sigma }}(Z)$ of all σ-finite point measures χ on $(Z,\mathcal{Z})$ that satisfy $\chi (B)\in {\mathbb{N}_{0}}\cup \{+\infty \}$ for all $B\in \mathcal{Z}$. Such a space is equipped with the smallest σ-field ${\mathcal{N}_{\sigma }}:={\mathcal{N}_{\sigma }}(Z)$ such that, for each $B\in \mathcal{Z}$, the mapping ${\mathbf{N}_{\sigma }}\ni \chi \mapsto \chi (B)\in [0,+\infty ]$ is measurable. In view of our assumptions on Z and following, e.g., [51, Section 6.1], throughout the paper we can assume without loss of generality that η is proper, in the sense that η can be P-a.s. represented in the form
where $\{{X_{n}}:n\ge 1\}$ is a countable collection of random elements with values in $\mathcal{Z}$ and where we write ${\delta _{z}}$ for the Dirac measure at z. Since we assume μ to be nonatomic, one has that ${X_{k}}\ne {X_{n}}$ for every $k\ne n$, P-a.s.
Now denote by $\mathbf{F}({\mathbf{N}_{\sigma }})$ the class of all measurable functions $\mathfrak{f}:{\mathbf{N}_{\sigma }}\to \mathbb{R}$ and by ${\mathcal{L}^{0}}(\Omega ):={\mathcal{L}^{0}}(\Omega ,\mathcal{F})$ the class of real-valued, measurable functions F on Ω. Note that, as $\mathcal{F}=\sigma (\eta )$, each $F\in {\mathcal{L}^{0}}(\Omega )$ has the form $F=\mathfrak{f}(\eta )$ for some measurable function $\mathfrak{f}$. This $\mathfrak{f}$, called a representative of F, is ${P_{\eta }}$-a.s. uniquely defined, where ${P_{\eta }}=P\circ {\eta ^{-1}}$ is the image measure of P under η. Using a representative $\mathfrak{f}$ of F, one can introduce the add-one-cost operator ${D^{+}}={({D_{z}^{+}})_{z\in \mathcal{Z}}}$ on ${\mathcal{L}^{0}}(\Omega )$ as follows:
Similarly, we define ${D^{-}}$ on ${\mathcal{L}^{0}}(\Omega )$ as
where $\mathrm{supp}(\chi ):=\big\{z\in \mathcal{Z}\hspace{0.1667em}:\hspace{0.1667em}\text{for all}\hspace{2.5pt}A\in \mathcal{Z}\hspace{2.5pt}\text{s.t}.z\in A\text{:}\hspace{2.5pt}\chi (A)\ge 1\big\}$ is the support of the measure $\chi \in {\mathbf{N}_{\sigma }}$. We call $-{D^{-}}$ the remove-one-cost operator associated with η. We stress that the definitions of ${D^{+}}F$ and ${D^{-}}F$ are, respectively, $P\otimes \mu $-a.e. and P-a.s. independent of the choice of the representative $\mathfrak{f}$ – see, e.g., the discussion in [37, Section 2] and the references therein. Note that the operator ${D^{+}}$ can be straightforwardly iterated as follows: set ${D^{(1)}}:={D^{+}}$ and, for $n\ge 2$ and ${z_{1}},\dots ,{z_{n}}\in Z$ and $F\in {\mathcal{L}^{0}}(\Omega )$, recursively define
(5.2)
\[ {D_{z}^{+}}F:=\mathfrak{f}(\eta +{\delta _{z}})-\mathfrak{f}(\eta )\hspace{0.1667em},\hspace{1em}z\in \mathcal{Z}.\](5.3)
\[ {D_{z}^{-}}F:=\mathfrak{f}(\eta )-\mathfrak{f}(\eta -{\delta _{z}})\hspace{0.1667em},\hspace{0.1667em}\hspace{0.1667em}\hspace{2.5pt}\text{if}\hspace{5pt}z\in \mathrm{supp}(\eta )\hspace{0.1667em},\hspace{0.1667em}\hspace{0.1667em}\text{and}\hspace{5pt}{D_{z}^{-}}F:=0,\hspace{0.1667em}\hspace{0.1667em}\text{otherwise,}\]5.2 ${L^{1}}$ integration by parts
One of the most fundamental formulae in the theory of Poisson processes is the so-called Mecke formula stating that, for each measurable function $h:{\mathbf{N}_{\sigma }}\times Z\to [0,+\infty ]$, the identity
holds true. In fact, the equation (5.4) characterizes the Poisson process, see [51, Chapter 4] for a detailed discussion. Such a formula can be used in order to define an (approximate) integration by parts formula on the Poisson space.
(5.4)
\[ \mathbb{E}\bigg[{\int _{Z}}h(\eta +{\delta _{z}},z)\mu (dz)\bigg]=\mathbb{E}\bigg[{\int _{Z}}h(\eta ,z)\eta (dz)\bigg]\]For random variables $F,G\in {\mathcal{L}^{0}}(\Omega )$ such that ${D^{+}}F\hspace{0.1667em}{D^{+}}G\in {L^{1}}(P\otimes \mu )$, we define
which verifies $\mathbb{E}[|{\Gamma _{0}}(F,G)|]<\infty $, and $\mathbb{E}[{\Gamma _{0}}(F,G)]=\mathbb{E}[{\textstyle\int _{Z}}({D_{z}^{+}}F{D_{z}^{+}}G)\hspace{0.1667em}\mu (dz)]$, in view of the Mecke formula. The following statement, taken from [37], can be regarded as an integration by parts formula in the framework of Poisson random measures, playing a role similar to that of Lemma 3.1 in the setting of Gaussian fields. It is an almost direct consequence of (5.4).
(5.5)
\[ {\Gamma _{0}}(F,G):=\frac{1}{2}\left\{{\int _{Z}}({D_{z}^{+}}F{D_{z}^{+}}G)\hspace{0.1667em}\mu (dz)+{\int _{Z}}({D_{z}^{-}}F{D_{z}^{-}}G)\hspace{0.1667em}\eta (dz)\right\}\]5.3 Multiple integrals
For an integer $p\ge 1$ we denote by ${L^{2}}({\mu ^{p}})$ the Hilbert space of all square-integrable and real-valued functions on ${\mathcal{Z}^{p}}$ and we write ${L_{s}^{2}}({\mu ^{p}})$ for the subspace of those functions in ${L^{2}}({\mu ^{p}})$ which are ${\mu ^{p}}$-a.e. symmetric. Moreover, for ease of notation, we denote by $\| \cdot {\| _{2}}$ and ${\langle \cdot ,\cdot \rangle _{2}}$ the usual norm and scalar product on ${L^{2}}({\mu ^{p}})$ for whatever value of p. We further define ${L^{2}}({\mu ^{0}}):=\mathbb{R}$. For $f\in {L^{2}}({\mu ^{p}})$, we denote by ${I_{p}}(f)$ the multiple Wiener–Itô integral of f with respect to $\hat{\eta }$. If $p=0$, then, by convention, ${I_{0}}(c):=c$ for each $c\in \mathbb{R}$. Now let $p,q\ge 0$ be integers. The following basic properties are proved, e.g., in [50], and are analogous to the properties of multiple integrals in a Gaussian framework, as discussed in Section 3.1:
-
1. ${I_{p}}(f)={I_{p}}(\tilde{f})$, where $\tilde{f}$ denotes the canonical symmetrization of $f\in {L^{2}}({\mu ^{p}})$;
-
2. ${I_{p}}(f)\in {L^{2}}(P)$, and $\mathbb{E}\big[{I_{p}}(f){I_{q}}(g)\big]={\delta _{p,q}}\hspace{0.1667em}p!\hspace{0.1667em}{\langle \tilde{f},\tilde{g}\rangle _{2}}$, where ${\delta _{p,q}}$ denotes the Kronecker delta symbol.
As in the Gaussian framework of Section 3.1, for $p\ge 0$ the Hilbert space consisting of all random variables ${I_{p}}(f)$, $f\in {L^{2}}({\mu ^{p}})$, is called the p-th Wiener chaos associated with η, and is customarily denoted by ${C_{p}}$. It is a crucial fact that every $F\in {L^{2}}(P)$ admits a unique representation
where ${f_{p}}\in {L_{s}^{2}}({\mu ^{p}})$, $p\ge 1$, are suitable symmetric kernel functions, and the series converges in ${L^{2}}(P)$. Identity (5.7) is the analogue of relation (3.2), and is once again referred to as the chaotic decomposition of the functional $F\in {L^{2}}(P)$.
The multiple integrals discussed in this section also enjoy multiplicative properties similar to formula (3.5) above – see, e.g., [50, Proposition 5] for a precise statement. One consequence of such product formulae is that, if $F\in {C_{p}}$ and $G\in {C_{q}}$ are such that $FG$ is square-integrable, then
which can be seen as a property analogous to (4.3).
5.4 Malliavin operators
We now briefly discuss Malliavin operators on the Poisson space.
-
1. The domain $\mathrm{dom}\hspace{0.1667em}D$ of the Malliavin derivative operator D is the set of all $F\in {L^{2}}(P)$ such that the chaotic decomposition (5.7) of F satisfies ${\textstyle\sum _{p=1}^{\infty }}p\hspace{0.1667em}p!\| {f_{p}}{\| _{2}^{2}}<\infty $. For such an F, the random function $Z\ni z\mapsto {D_{z}}F\in {L^{2}}(P)$ is defined via
(5.9)
\[ {D_{z}}F={\sum \limits_{p=1}^{\infty }}p{I_{p-1}}\big({f_{p}}(z,\cdot )\big)\hspace{0.1667em},\] -
2. The domain $\mathrm{dom}\hspace{0.1667em}\mathbf{L}$ of the Ornstein–Uhlenbeck generator L is the set of those $F\in {L^{2}}(P)$ whose chaotic decomposition (5.7) verifies the condition ${\textstyle\sum _{p=1}^{\infty }}{p^{2}}\hspace{0.1667em}p!\| {f_{p}}{\| _{2}^{2}}<\infty $ (so that $\mathrm{dom}\hspace{0.1667em}\mathbf{L}\subset \mathrm{dom}\hspace{0.1667em}D$) and, for $F\in \mathrm{dom}\hspace{0.1667em}\mathbf{L}$, one defines By definition, $\mathbb{E}[\mathbf{L}F]=0$; also, from (5.11) it is easy to see that L is symmetric, in the sense that for all $F,G\in \mathrm{dom}\hspace{0.1667em}\mathbf{L}$. Note that, from (5.11), it is immediate that the spectrum of $-\mathbf{L}$ is given by the nonnegative integers and that $F\in \mathrm{dom}\hspace{0.1667em}\mathbf{L}$ is an eigenfunction of $-\mathbf{L}$ with corresponding eigenvalue p if and only if $F={I_{p}}({f_{p}})$ for some ${f_{p}}\in {L_{s}^{2}}({\mu ^{p}})$, that is: The following identity corresponds to formula (65) in [50]: if $F\in \mathrm{dom}\hspace{0.1667em}\mathbf{L}$ is such that ${D^{+}}F\in {L^{1}}(P\otimes \mu )$, thenDefine for any $F\in {L^{2}}(P)$ the pseudoinverse ${\mathbf{L}^{-1}}$ by Recall [50, Section 8] the covariance identity
-
3. For suitable random variables $F,G\in \mathrm{dom}\hspace{0.1667em}\mathbf{L}$ such that $FG\in \mathrm{dom}\hspace{0.1667em}\mathbf{L}$, we introduce the carré du champ operator Γ associated with L by
(5.14)
\[ \Gamma (F,G):=\frac{1}{2}\big(\mathbf{L}(FG)-F\mathbf{L}G-G\mathbf{L}F\big)\hspace{0.1667em}.\](5.15)
\[ \mathbb{E}\big[(\mathbf{L}F)G\big]=\mathbb{E}\big[F(\mathbf{L}G)\big]=-\mathbb{E}\big[\Gamma (F,G)\big];\]
The following result – proved in [37] – provides an explicit representation of the carré-du-champ operator Γ in terms of ${\Gamma _{0}}$, as introduced in (5.5).
Proposition 5.2.
For all $F,G\in \mathrm{dom}\hspace{0.1667em}\mathbf{L}$ such that $FG\in \mathrm{dom}\hspace{0.1667em}\mathbf{L}$ and
we have that $DF={D^{+}}F$, $DG={D^{+}}G$, in such a way that $DF\hspace{0.1667em}DG\hspace{-0.1667em}=\hspace{-0.1667em}{D^{+}}F\hspace{0.1667em}{D^{+}}G\hspace{-0.1667em}\in \hspace{-0.1667em}{L^{1}}(P\otimes \mu )$, and
where ${\Gamma _{0}}$ is defined in (5.5).
5.5 Fourth moment theorems
Starting at least from the reference [83] (where Malliavin calculus and Stein’s method were first combined on the Poisson space), the establishing a fourth moment bound similar to Theorem 4.6 on the Poisson space has been an open problem for several years. As recalled above, the main difficulty in achieving such a result is the discrete nature of add-one-cost and remove-one-cost operators, preventing in particular the triple $(\Omega ,P,\mathbf{L})$ from enjoying a diffusive property.
The next statement contains one of the main bounds proved in [38], and shows that a quantitative fourth moment bound is available on the Poisson space. Such a bound (which also has a multidimensional extension) is proved by a clever combination of Malliavin-type techniques with an infinitesimal version of the exchangeable pairs approach toward Stein’s method – see, e.g., [23].
One should notice that the first bound of this type was proved in [37] under slightly more restrictive assumptions; also, reference [37] contains analogous bounds in the Kolmogorov distance, that are not achievable by using exchangeable pairs. In particular, one of the key estimates used in [37] is the following remarkable equality and bound
\[ {\frac{1}{2q}{\int _{Z}}\mathbb{E}\big[|{D_{z}^{+}}F|^{4}}\big]\mu (dz)\hspace{-0.1667em}=\frac{3}{q}\mathbb{E}\big[{F^{2}}\Gamma (F,F)\big]-\mathbb{E}\big[{F^{4}}\big]\hspace{-0.1667em}\le \hspace{-0.1667em}\frac{4q-3}{2q}\Big(\mathbb{E}\big[{F^{4}}\big]-3\mathbb{E}{[{F^{2}}]^{2}}\Big),\]
that are valid for every $F\in {C_{q}}$, $q\ge 2$, such that the mapping $z\mapsto {D_{z}^{+}}F$ verifies some minimal integrability conditions.5.6 Second-order Poincaré estimates
What one calls second-order Poincaré inequalities is a collection of analytic estimates (first established on the Poisson space in [55]) where the Wasserstein and Kolmogorov distances, between a given function of η and a Gaussian random variable, are bounded by integrated moments of iterated add-one-cost operators on the Poisson space. The rationality behind such a name is the following. Just as the Poincaré inequality
controls the variance of a random variable F by means of integrated moments of the add-one cost (see [51, Section 18.3]), the integrated moments of second-order add-one-cost ${D_{x}^{+}}{D_{y}^{+}}F:={D_{z,y}^{2}}F$ controll the discrepancy between the distribution of F and that of a Gaussian random variable – a phenomenon already observed in the Gaussian setting [21, 70, 96], where gradients typically replace add-one-cost operators.
For the rest of the section, we exclusively consider square-integrable random variables F such that $F\in \mathrm{dom}\hspace{0.1667em}D$, in such a way that ${D^{+}}F=DF$ (up to negligible sets). The starting point for proving second-order Poincaré estimates is the covariance identity (5.13), which can be proved as in the Gaussian setting by means of chaos expansions. When one combines Stein’s method with such a formula, it is however not possible to deduce the existence of a Stein kernel as in the Gaussian setting (see (3.12)), since Malliavin operators on a Poisson space do not enjoy an exact chain rule such as (3.7). Indeed, we have that, for sufficiently smooth mapping $f:\mathbb{R}\to \mathbb{R}$,
\[\begin{aligned}{}\operatorname{Cov}(F,f(F))& =-\int \mathbb{E}[{D_{z}}(f(F)){D_{z}}{\mathbf{L}^{-1}}F]\mu (dz)\\ {} & =:-\int \mathbb{E}[{f^{\prime }}(F){D_{z}}F{D_{z}}{\mathbf{L}^{-1}}F]\mu (dz)+R\end{aligned}\]
where we approximate ${D_{z}}(f(F))=f(F+{D_{z}}F)-f(F)$ by ${f^{\prime }}(F){D_{z}}F$ with the error term
appearing in the implicit definition of R; notice that, in general, $R\ne 0$, thus the previous computations do not yield the existence of a Stein kernel. Selecting f as in Lemma 2.1-(d), one can bound the error term in the aforementioned calculation by $|{D_{z}}F{|^{2}}$. Therefore, for F such that $\mathbb{E}[F]=0$ and $\operatorname{Var}[F]=1$, one has the bound
\[ {d_{W}}(F,N)\le \sqrt{\operatorname{Var}\Big[\int {D_{z}}F{D_{z}}{\mathbf{L}^{-1}}F]\mu (dz)\Big]}+\int \mathbb{E}[|{D_{z}}F{|^{2}}|{D_{z}}{\mathbf{L}^{-1}}F|]\mu (dz).\]
Applying the Poincaré inequality (5.17) to the variance term, as well as the contraction bound [55, Lemma 3.4] for the add-one-cost
and analogous estimates for the iterated add-one-cost, leads to the following theorem.Theorem 5.4 (Second-order Poincaré estimates [55]).
Let $F\in \mathrm{dom}\hspace{0.1667em}D$ be such that $\mathbb{E}[F]=0$ and $\operatorname{Var}[F]=1$, and let N be a standard Gaussian random variable. Then,
where
\[\begin{aligned}{}{\gamma _{1}}& :=2{\Big[\iiint \mathbb{E}{[{({D_{x}}F{D_{y}}F)^{2}}]^{1/2}}\mathbb{E}{[{({D_{x,z}^{2}}F{D_{y,z}^{2}}F)^{2}}]^{1/2}}{\mu ^{3}}(dxdydz)\Big]^{1/2}},\\ {} {\gamma _{2}}& :={\Big[\iiint \mathbb{E}[{({D_{x,z}^{2}}F{D_{y,z}^{2}}F)^{2}}]{\mu ^{3}}(dxdydz)\Big]^{1/2}},\\ {} {\gamma _{3}}& :=\int \mathbb{E}[|{D_{x}}F{|^{3}}]\mu (dx).\end{aligned}\]
As mentioned above, second-order Poincaré techniques are equally useful for obtaining bounds in the Kolmogorov distance – see [55], as well as [90] for a powerful extension to the framework of multivariate normal approximations.
An example of a successful application of second-order Poincaré estimates from [55] (to which we refer the reader for a discussion of the associated literature) is the derivation of presumably optimal Berry–Esseen bounds for the total edge length of the Poisson-based nearest neighbor graph. More precisely, let ${\eta _{t}}$ be a Poisson point process with intensity $t>0$ on a convex compact set $H\subset {\mathbb{R}^{d}}$. We consider the graph with vertex set $\operatorname{supp}{\eta _{t}}$ and edge set formed by $\{x,y\}\subset \operatorname{supp}{\eta _{t}}$ when either x is the nearest neighbor of y or the other way around. Consider the total edge length of the graph so obtained, denoted by ${L_{t}}$. Then we have
\[ {d_{W}}\Big(\frac{{L_{t}}-\mathbb{E}[{L_{t}}]}{\sqrt{\operatorname{Var}[{L_{t}}]}},N\Big)\le \frac{C}{\sqrt{t}},\]
where C depends only on H. We refer the reader to [55, Theorem 7.1] for a far more general statement, and to [49] for a collection of presumably optimal bounds on the normal approximation of exponentially stabilizing random variables (see the next subsection).5.7 Stabilization theory and two-scale bounds
While the second-order Poincaré estimates can provide sharp Berry–Esseen bounds, they are not always applicable. This is the case, for instance, for certain combinatorial optimization statistics or connectivity functionals of the underlying Poisson process. The problem is typically that the iterated add-one-cost of the functionals, although well-defined almost surely, are not computationally tractable, e.g., for obtaining moment estimates.
In this section, we present an alternative collection of analytic inequalities, called the two-scale stabilization bounds, which avoid the use of iterated add-one-cost – they are one of the main findings from [48]; see also [22] for several related estimates obtained by a discretization procedure. As their name suggests, these bounds are closely related to the stabilization theory of Penrose and Yukich [87, 86]. Such a theory originated from the ground-breaking central limit theorem of Kesten and Lee [46] for the total edge weight ${M_{n}}$ of Euclidean minimal spanning trees (MST) with stationary Poisson points ${\eta _{n}}$ in a ball of radius $n\in \mathbb{N}$. Recall that the MST is the connected graph over the vertex set ${\eta _{n}}$ that minimizes its total length. Without referring to the stochastic analysis on the Poisson space, Kesten and Lee already performed a fine study of the add-one-cost of ${M_{n}}$ (and not of the iterated add-one-cost) implying some moment estimates of ${D_{x}}{M_{n}}$. Penrose and Yukich [87] extrapolated the high level ideas from [46] and transformed them into a general theory applicable to (nonquantitative) central limit theorems for a plethora of problems in stochastic geometry. The theory was further extended to multivariate normal approximation by Penrose [86]. A variant of the theory using score functionals was put forward by Baryshnikov and Yukich [13].
We now define properly the notions of strong and weak stabilization. We assume for concreteness that the ambient space is ${\mathbb{R}^{d}}$ and η is a Poisson process of unit intensity. A Poisson functional $F=F(\eta )$ is strongly stabilizing if there exists an almost surely finite random variable R, called the stabilization radius, such that
where ${B_{R}}$ stands for a ball with radius R centered at the origin. Here is a simple example. Fix $r>0$ and make an edge between two points in ${\eta _{n}}:=\eta {|_{{B_{n}}}}$ within distance r. The graph $G({\eta _{n}},r)$ so obtained is known as the Gilbert graph or the random geometric graph. Then, the number $F({\eta _{n}})$ of edges within a finite window containing the origin has stabilization radius $R=r$ almost surely, since ${D_{0}}F(\eta )$ is the number of edges incident to the origin in $G(\eta +{\delta _{0}},r)$. Proving strong stabilization often relies on combinatorial and geometric arguments in many problems of stochastic geometry, see [87] for a list of examples. In general situations, R is genuinely random in contrast to the simple example given above.
To obtain central limit theorems, it actually suffices to show a weaker version of stabilization. We say that F is weakly stabilizing if for any sequence of measurable sets ${E_{n}}$ satisfying $\liminf {E_{n}}={\mathbb{R}^{d}}$, we have the almost sure convergence
where Δ is a random variable. It is clear that a strongly stabilizing functional is also weakly stabilizing with $\Delta ={D_{0}}F(\eta )$.
Theorem 5.5 ( See [87, Theorem 3.1]).
Suppose that F is weakly stabilizing and satisfies the moment condition
where the supremum is taken for all balls A that contain 0. Then there exists ${\sigma ^{2}}\ge 0$, such that
It is remarkable how few assumptions one needs in order to obtain a CLT. Notice that the limiting variance ${\sigma ^{2}}$ could be 0. In [87], it was shown that ${\sigma ^{2}}>0$ whenever Δ is not a constant. Theorem 5.5 was proved by a martingale method and does not offer insights on how fast the normalized sequence converges to normal. The latter question was addressed in a recent preprint by Lachièze-Rey, Peccati and Yang [48]. Under slightly strengthened conditions on the functionals, they assessed the rate of normal approximation in Theorem 5.5. To state one of the bounds that can be deduced from [48], we consider again the ball ${B_{n}}$ of radius n centered at the origin, and introduce the key quantity
\[ {\psi _{n}}={\psi _{n}}({A_{n,\cdot }}):=\underset{x\in {B_{n}}}{\sup }\mathbb{E}[|{D_{x}}F(\eta {|_{{B_{n}}}})-{D_{x}}F(\eta {|_{{A_{n,x}}}})|],\hspace{1em}n\ge 1,\]
where ${A_{n,x}}$ is any measurable set indexed by n and x. In practice, we take ${A_{n,x}}={B_{{b_{n}}}}(x)=\{y:|x-y|\le {b_{n}}\}$ with $1\ll {b_{n}}\ll n$ which is a local window of x compared to the scale of ${B_{n}}$. In what follows, we accept this choice and call ${\psi _{n}}$ a two-scale discrepancy in view of this interpretation. The following result, taken from [48], can be applied in many concrete problems in stochastic geometry.Theorem 5.6 ([48, Corollary 1.3]).
Let ${\hat{F}_{n}}=\operatorname{Var}{[F({\eta _{n}})]^{-1/2}}(F({\eta _{n}})-\mathbb{E}[F({\eta _{n}})])$ with ${\eta _{n}}=\eta {|_{{B_{n}}}}$ as before. Suppose that
\[ \underset{n\in \mathbb{N},x\in {B_{n}}}{\sup }\mathbb{E}[|{D_{x}}F({\eta _{n}}){|^{p}}]<\infty \]
for some $p>4$ and also that there exists an absolute constant $b>0$ such that $\operatorname{Var}[F({\eta _{n}})]\ge b|{B_{n}}|$. Then there exists a finite positive constant c such that
This theorem simplifies and extends some arguments in the proof of a quantitative CLT for the minimal spanning trees by Chatterjee and Sen [22]. Analogous Kolmogorov bounds for univariate normal approximation, and bounds for multivariate normal approximation are also considered in [48]. More remarks are in order.
Remark 5.7.
-
i) The sequence $({b_{n}})$ serves as a free parameter in the bound. One should keep track of the dependence of ${\psi _{n}}$ on ${b_{n}}$ and make an optimization in the end.
-
ii) For any fixed $x\in {\mathbb{R}^{d}}$, applying the weak stabilization condition for F with two sequences $({B_{n}})$ and $({B_{{b_{n}}}}(x))$ (together with the translation invariance of η and the moment assumption for the add-one-cost) yields the following convergence As such, Theorem 5.6 quantifies Theorem 5.5 after uniformly strengthening the assumptions of Theorem 5.5.
-
iii) When the functional is strongly stabilizing, this bound takes an even simpler form. More precisely, we say ${R_{x}}$ is a stabilization radius at x if Then, applying Hölder’s inequality and the uniform moment condition for the add-one-cost leads to the existence a positive finite c such that\[ {\psi _{n}}\le c\underset{x\in {B_{n}}}{\sup }\mathbb{P}{[{R_{x}}\ge {b_{n}}]^{1-\frac{1}{p}}}.\]Hence, the upper tail of ${R_{x}}$ is relevant in the rate of normal approximation. One may further classify the stabilization condition with regards to the decay of the upper tail. For instance, we say that the funcitonal F is exponentially stabilizing if ${R_{x}}$ has a sub-exponential upper tail.
-
iv) There are some general methods for obtaining lower bounds of variance. For example, one can partition the space into nonoverlapping cubes of appropriate size then use projection method for functions of independent random variables such as Hoeffding decomposition. Another method via chaos expansion was given in [55, Section 5]
We mention one application where the second order Poincaré estimates do not apply but the two-scale stabilization bounds do. Fix $r>0$ and consider the number ${K_{n}}$ of components in the Gilbert graph $G({\eta _{n}},2r)$ (or equivalently the Boolean model ${O_{r,n}}={\cup _{x\in {\eta _{n}}}}B(x,r)$) as $n\to \infty $. This corresponds to the so-called thermodynamic regime, where the family of random sets ${O_{r}}={\cup _{x\in \eta }}B(x,r)$ (unbounded analogue of ${O_{r,n}}$) indexed by r exhibits a phase transition at some ${r^{\ast }}\in (0,\infty )$ defined as
\[ {r^{\ast }}=\inf \{r:\mathbb{P}[0\hspace{2.5pt}\text{is connected to infinity in}\hspace{2.5pt}{O_{r}}]>0\}.\]
We stress that the analysis of ${K_{n}}$ is relatively involved in the critical phase due to the co-existence of the unbounded occupied component and the unbounded vacant component (in ${O_{r}^{c}}$). However, the following estimate was obtained in [48] for all $r>0$ in dimension 2 using the strong stabilization bound:
\[ {d_{W}}(({K_{n}}-\mathbb{E}[{K_{n}}])/\sqrt{\operatorname{Var}[{K_{n}}]},N(0,1))\le \frac{C}{{n^{\beta }}},\]
where C and β are finite positive constants. In $d\ge 3$, a polylogarithmic rate was obtained. The bottleneck of these estimates are the two-arm exponents of the critical Boolean models which are hard to improve.6 Malliavin–Stein method for targets in the second Wiener chaos
In this section, we present a short overview of some recent developments of the Mallaivin–Stein approach for target distributions in the second Gaussian Wiener chaos. We also formulate some complementary conjectures. We adopt the same notation as in Section 3.1 above. Let W stand for an isonormal Gaussian process on a separable Hilbert space $\mathfrak{H}$. Recall that the elements in the second Wiener chaos are random variables having the general form $F={I_{2}}(f)$, with $f\in {\mathfrak{H}^{\odot 2}}$. With any kernel $f\in {\mathfrak{H}^{\odot 2}}$, we associate the following Hilbert–Schmidt operator
We also write ${\{{\alpha _{f,k}}\}_{k\ge 1}}$ and ${\{{e_{f,k}}\}_{k\ge 1}}$, respectively, to indicate the (not necessarily distinct) eigenvalues of ${A_{f}}$ and the corresponding eigenvectors. The next proposition gathers together some relevant properties of the elements of the second Wiener chaos associated with W.
Proposition 6.1 (See Section 2.7.4 in [66]).
Let $F={I_{2}}(f)$, $f\in {\mathfrak{H}^{\odot 2}}$, be a generic element of the second Wiener chaos of W.
-
1. The following equality holds: $F={\textstyle\sum _{k\ge 1}}{\alpha _{f,k}}\big({N_{k}^{2}}-1\big)$, where ${\{{N_{k}}\}_{k\ge 1}}$ is a sequence of i.i.d. $\mathcal{N}(0,1)$ random variables that are elements of the isonormal process W, and the series converges in ${L^{2}}(\Omega )$ and almost surely.
-
3. The law of the random variable F is determined by its moments, or equivalently, by its cumulants.
For the rest of the section, to avoid unnecessary complication, we consider target distributions in the second Wiener chaos of the form
where ${N_{i}}\sim \mathcal{N}(0,1)$ are i.i.d, and the coefficients $({\alpha _{\infty ,i}}:i=1,\dots ,d)$ are distinct, and ${\alpha _{\infty ,i}}=0$ for $i\ge d+1$. We also work under the normalization assumption $\mathbb{E}[{F_{\infty }^{2}}]=1$. We highlight the following particular cases: (i) ${\alpha _{\infty ,i}}=1$ for $i=1,\dots ,d$, for which the target random variable ${F_{\infty }}$ reduces to a centered chi-squared distribution with d degree of freedom (here, the Malliavin–Stein method has been successfully implemented in a series of papers [63, 36, 64, 71, 6]); (ii) $d=2$, and ${\alpha _{\infty ,1}}\times {\alpha _{\infty ,2}}<0$, in which case the target random variable ${F_{\infty }}$ belongs to the so-called Variance–Gamma class of probability distributions. We refer to [41–43, 39, 7] for development of Stein and Malliavin–Stein methods for the Variance–Gamma distributions.
To any target distribution ${F_{\infty }}$ of the form (6.1) we attach the following polynomial
It turns out that polynomials P and Q plays a major role in quantitative limit theorems in this setup. The next result provides a (suitable) Stein operator for target distributions ${F_{\infty }}$ in the second Wiener chaos. Also, the stability phenomenon of the weak convergence of the sequences in the second Wiener chaos is studied in [69] using tools from complex analysis.
(6.2)
\[ Q(x)={\big(P(x)\big)^{2}}:={\Big(x{\prod \limits_{i=1}^{d}}(x-{\alpha _{\infty ,i}})\Big)^{2}}.\]Theorem 6.2 (Stein characterization [3]).
Let ${F_{\infty }}$ be an element of the second Wiener chaos of the form (6.1). Assume that F is a generic centered random variable living in a finite sum of Wiener chaoses (hence smooth in the sense of Malliavin calculus). Then, $F={F_{\infty }}$ (equality in distribution) if and only if $\mathbb{E}\left[{\mathcal{A}_{\infty }}f(F)\right]=0$ where the differential operator ${\mathcal{A}_{\infty }}$ is given by
for all functions $f:\mathbb{R}\to \mathbb{R}$ such that ${\mathcal{A}_{\infty }}f(F)\in {L^{1}}(\Omega )$ and coefficients
The polynomials P and Q are given by relation (6.2).
(6.3)
\[ {\mathcal{A}_{\infty }}f(x):={\sum \limits_{l=2}^{d+1}}({b_{l}}-{a_{l-1}}x){f^{(d+2-l)}}(x)-{a_{d+1}}xf(x),\]The next conjecture puts forward a non-Gaussian counterpart to the Stein’s Lemma 2.1.
Conjecture 6.3 (Stein Universality Lemma).
Let $\mathcal{H}$ denote an appropriate separating (see [66, Definition C.1.1]) class of test functions. For every given test function $h\in \mathcal{H}$ consider the associated Stein equation
Then, equation (6.6) admits a bounded d times differentiable solution ${f_{h}}$ such that $\| {f_{h}^{(r)}}{\| _{\infty }}<+\infty $ for all $r=1,\dots ,d$ and the bounds are independent of the given test function h.
The rest of the section is devoted to several quantitative estimates involving target distributions in the second Wiener chaos. The first estimate is stated in terms of the 2-Wasserstein transport distance ${\operatorname{W}_{2}}$ (see Section 4.3 for definition). See also [47] for several related results of a quantitative nature.
Theorem 6.4 ([2]).
Let $({F_{n}}:n\ge 1)$ be a sequence of random variables belonging to the second Wiener chaos associated to the isonormal process W so that $\mathbb{E}[{F_{n}^{2}}]=1$ for all $n\ge 1$. Assume that the target random variable ${F_{\infty }}$ takes the form (6.1). Define
where the polynomial Q is given by (6.2). Then, there exists a constant $C>0$ (possibly depending only on the target random variable ${F_{\infty }}$ but independent of n) such that
(6.7)
\[ \Delta ({F_{n}}):={\sum \limits_{r=2}^{\textit{deg}(Q)}}\frac{{Q^{(r)}}(0)}{r!}\frac{{\kappa _{r}}({F_{n}})}{(r-1)!{2^{r-1}}},\]Example 6.5.
Consider the target random variable ${F_{\infty }}$ of the form (6.1) with $d=2$ and ${\alpha _{\infty ,1}}=-{\alpha _{\infty ,2}}=1/2$. Hence, ${F_{\infty }}$ $(={N_{1}}\times {N_{2}}$, where ${N_{1}},{N_{2}}\sim \mathcal{N}(0,1)$ are independent and equality holds in law) belongs to the class of Variance–Gamma distributions $V{G_{c}}(r,\theta ,\sigma )$ with parameters $r=\sigma =1$ and $\theta =0$. Then, [39, Corollary 5.10, part (a)] reads
that is in line with the estimate (6.8). One has to note that ${\kappa _{3}}({F_{\infty }})=0$.
(6.9)
\[ {d_{W}}({F_{n}},{F_{\infty }})\le C\hspace{0.1667em}\sqrt{\Delta ({F_{n}})+1/4\hspace{0.1667em}{\kappa _{3}^{2}}({F_{n}})}\]The next result provides a quantitative bound in the Kolmogorov distance. The proof relies on the classical Berry–Essen estimate in terms of bounding the difference of the characteristic functions. We recall that for two real-valued random variables X and Y the Kolmogorov distance is defined as
Theorem 6.6 ([4]).
Let the target random variable ${F_{\infty }}$ in the second Wiener chaos be of the form (6.1). Assume that $({F_{n}}:n\ge 1)$ be a sequence of centered random elements living in a finite sum of the Wiener chaoses. Then, there exists a constant C (possibly depending on the sequence $({F_{n}})$, but not on n) such that
where the coefficients $({a_{r}}:r=1,\dots ,d+1)$ are given by relation (6.4). In the particular case, when the sequence $({F_{n}}:n\ge 1)$ belongs to the second Wiener chaos, it holds that
(6.10)
\[ \begin{aligned}{}& {d_{\textit{Kol}}}({F_{n}},{F_{\infty }})\\ {} & \le C\sqrt{\mathbb{E}\left[\Big|{\sum \limits_{r=1}^{d+1}}{a_{r}}\left({\Gamma _{r-1}}({F_{n}})-\mathbb{E}[{\Gamma _{r-1}}({F_{n}})]\right)\Big|\right]+{\sum \limits_{r=2}^{d+1}}|{\kappa _{r}}({F_{n}})-{\kappa _{r}}({F_{\infty }})|}\\ {} & \le C\sqrt{\sqrt{\operatorname{Var}\left({\sum \limits_{r=1}^{d+1}}{a_{r}}{\Gamma _{r-1}}({F_{n}})\right)}+{\sum \limits_{r=2}^{d+1}}|{\kappa _{r}}({F_{n}})-{\kappa _{r}}({F_{\infty }})|}\end{aligned}\]
\[ \operatorname{Var}\left({\sum \limits_{r=1}^{d+1}}{a_{r}}{\Gamma _{r-1}}({F_{n}})\right)=\Delta ({F_{n}})\]
where the quantity $\Delta ({F_{n}})$ is as in Theorem 6.4, and the estimate (6.10) takes the form (compare with the estimate (6.8))
We end the section with the following conjecture, whose object is the control of the iterated Gamma operators of Malliavin calculus appearing in the RHS of the estimate (6.10) by means of finitely many cumulants. Lastly, we point out that the forthcoming estimate (6.12) has to be compared with the famous estimate $\operatorname{Var}({\Gamma _{1}}(F))\le C\hspace{0.1667em}{\kappa _{4}}(F)$ in the normal approximation setting, when F is a chaotic random variable.
Conjecture 6.7.
Let ${F_{\infty }}$ be the target random variable in the second Wiener chaos of the form (6.1). Assume that $F={I_{q}}(f)$ is a chaotic random variable in the q-th Wiener chaos with $q\ge 2$. Then, there exists a general constant C (possibly depending on q and d) such that
where the polynomials P and Q are given by equation (6.2). In the particular case of the normal product target distribution, i.e., $d=2$, and ${\alpha _{\infty ,1}}=-{\alpha _{\infty ,2}}=1/2$, the estimate (6.11) boils down to
where C is an absolute constant.