Nonparametric Bayesian inference for multidimensional compound Poisson processes

Gugushvili, Shota; van der Meulen, Frank; Spreij, Peter

doi:10.15559/15-VMSTA20

Abstract

Given a sample from a discretely observed multidimensional compound Poisson process, we study the problem of nonparametric estimation of its jump size density $r_{0}$ and intensity $\lambda _{0}$. We take a nonparametric Bayesian approach to the problem and determine posterior contraction rates in this context, which, under some assumptions, we argue to be optimal posterior contraction rates. In particular, our results imply the existence of Bayesian point estimates that converge to the true parameter pair $(r_{0},\lambda _{0})$ at these rates. To the best of our knowledge, construction of nonparametric density estimators for inference in the class of discretely observed multidimensional Lévy processes, and the study of their rates of convergence is a new contribution to the literature.

1 Introduction

Let $N=(N_{t})_{t\ge 0}$ be a Poisson process of constant intensity $\lambda >0$, and let $\{Y_{j}\}$ be independent and identically distributed (i.i.d.) ${\mathbb{R}}^{d}$-valued random vectors defined on the same probability space and having a common distribution function R, which is assumed to be absolutely continuous with respect to the Lebesgue measure with density r. Assume that N and $\{Y_{j}\}$ are independent and define the ${\mathbb{R}}^{d}$-valued process $X=(X_{t})_{t\ge 0}$ by

\[X_{t}=\sum \limits_{j=1}^{N_{t}}Y_{j}.\]

The process X is called a compound Poisson process (CPP) and forms a basic stochastic model in a variety of applied fields, such as, for example, risk theory and queueing; see [10, 21].

Suppose that, corresponding to the true parameter pair $(\lambda _{0},r_{0})$, a sample $X_{\varDelta }$, $X_{2\varDelta },\dots ,X_{n\varDelta }$ from X is available, where the sampling mesh $\varDelta >0$ is assumed to be fixed and thus independent of n. The problem we study in this note is nonparametric estimation of $r_{0}$ (and of $\lambda _{0}$). This is referred to as decompounding and is well studied for one-dimensional CPPs; see [2, 3, 6, 9, 24]. Some practical situations in which this problem may arise are listed in [9, p. 3964]. However, the methods used in the above papers do not seem to admit (with the exception of [24]) a generalization to the multidimensional setup. This is also true for papers studying nonparametric inference for more general classes of Lévy processes (of which CPPs form a particular class), such as, for example, [4, 5, 19]. In fact, there is a dearth of publications dealing with nonparametric inference for multidimensional Lévy processes. An exception is [1], where the setup is however specific in that it is geared to inference in Lévy copula models and that, unlike the present work, the high-frequency sampling scheme is assumed ($\varDelta =\varDelta _{n}\to 0$ and $n\varDelta _{n}\to \infty $).

In this work, we will establish the posterior contraction rate in a suitable metric around the true parameter pair $(\lambda _{0},r_{0})$. This concerns study of asymptotic frequentist properties of Bayesian procedures, which has lately received considerable attention in the literature (see, e.g., [14, 15]), and is useful in that it provides their justification from the frequentist point of view. Our main result says that for a β-Hölder regular density $r_{0}$, under some suitable additional assumptions on the model and the prior, the posterior contracts at the rate ${n}^{-\beta /(2\beta +d)}{(\log n)}^{\ell }$, which, perhaps up to a logarithmic factor, is arguably the optimal posterior contraction rate in our problem. Finally, our Bayesian procedure is adaptive: the construction of our prior does not require knowledge of the smoothness level β in order to achieve the posterior contraction rate given above.

The proof of our main theorem employs certain results from [14, 22] but involves a substantial number of technicalities specifically characteristic of decompounding.

We remark that a practical implementation of the Bayesian approach to decompounding lies outside the scope of the present paper. Preliminary investigations and a small scale simulation study we performed show that it is feasible and under certain conditions leads to good results. However, the technical complications one has to deal with are quite formidable, and therefore the results of our study of implementational aspects of decompounding will be reported elsewhere.

The rest of the paper is organized as follows. In the next section, we introduce some notation and recall a number of notions useful for our purposes. Section 3 contains our main result, Theorem 2, and a brief discussion on it. The proof of Theorem 2 is given in Section 4. Finally, Section 5 contains the proof of the key technical lemma used in our proofs.

2 Preliminaries

Assume without loss of generality that $\varDelta =1$, and let $Z_{i}=X_{i}-X_{i-1}$, $i=1,\dots ,n$. The ${\mathbb{R}}^{d}$-valued random vectors $Z_{i}$ are i.i.d. copies of a random vector

\[Z=\sum \limits_{j=1}^{T}Y_{j},\]

where $\{Y_{j}\}$ are i.i.d. with distribution function $R_{0}$, whereas T, which is independent of $\{Y_{j}\}$, has the Poisson distribution with parameter $\lambda _{0}$. The problem of decompounding the jump size density $r_{0}$ introduced in Section 1 is equivalent to estimation of $r_{0}$ from observations $\mathcal{Z}_{n}=\{Z_{1},Z_{2},\dots ,Z_{n}\}$, and we will henceforth concentrate on this alternative formulation. We will use the following notation:

$\mathbb{P}_{r}$

law of $Y_{1}$,

$\mathbb{Q}_{\lambda ,r}$

law of $Z_{1}$,

$\mathbb{R}_{\lambda ,r}$

law of $X=(X_{t},\hspace{0.1667em}t\in [0,1])$.

2.1 Likelihood

We will first specify the dominating measure for $\mathbb{Q}_{\lambda ,r}$, which allows us to write down the likelihood in our model. Define the random measure μ by

\[\mu (B)=\big\{\mathrm{\# }t:(t,X_{t}-X_{t-})\in B\big\},\hspace{1em}B\in \mathcal{B}\big([0,1]\big)\otimes \mathcal{B}\big({\mathbb{R}}^{d}\setminus \{0\}\big).\]

Under $\mathbb{R}_{\lambda ,r}$, the random measure μ is a Poisson point process on $[0,1]\times ({\mathbb{R}}^{d}\setminus \{0\})$ with intensity measure $\varLambda (\mathrm{d}t,dx)=\lambda \mathrm{d}tr(x)\mathrm{d}x$. Provided that $\lambda ,\widetilde{\lambda }>0$, and $\widetilde{r}>0$, by formula (46.1) on p. 262 in [23] we have

(1)

\[\frac{\mathrm{d}\mathbb{R}_{\lambda ,r}}{\mathrm{d}\mathbb{R}_{\widetilde{\lambda },\widetilde{r}}}(X)=\exp \Bigg({\int _{0}^{1}}\int _{{\mathbb{R}}^{d}}\log \bigg(\frac{\lambda r(x)}{\widetilde{\lambda }\widetilde{r}(x)}\bigg)\mu (\mathrm{d}t,\mathrm{d}x)-(\lambda -\widetilde{\lambda })\Bigg).\]

The density $k_{\lambda ,r}$ of $\mathbb{Q}_{\lambda ,r}$ with respect to $\mathbb{Q}_{\widetilde{\lambda },\widetilde{r}}$ is then given by the conditional expectation

(2)

\[k_{\lambda ,r}(x)=\mathbb{E}_{\widetilde{\lambda },\widetilde{r}}\bigg(\frac{\mathrm{d}\mathbb{R}_{\lambda ,r}}{\mathrm{d}\mathbb{R}_{\widetilde{\lambda },\widetilde{r}}}(X)\hspace{0.1667em}\Big|\hspace{0.1667em}X_{1}=x\bigg),\]

where the subscript in the conditional expectation operator signifies the fact that it is evaluated under $\mathbb{R}_{\widetilde{\lambda },\widetilde{r}}$; see Theorem 2 on p. 245 in [23] and Corollary 2 on p. 246 there. Hence, the likelihood (in the parameter pair $(\lambda ,r)$) associated with the sample $\mathcal{Z}_{n}$ is given by

(3)

\[L_{n}(\lambda ,r)=\prod \limits_{i=1}^{n}k_{\lambda ,r}(Z_{i}).\]

2.2 Prior

We will use the product prior $\varPi =\varPi _{1}\times \varPi _{2}$ for $(\lambda _{0},r_{0})$. The prior $\varPi _{1}$ for $\lambda _{0}$ will be assumed to be supported on the interval $[\underline{\lambda },\overline{\lambda }]$ and to possess a density $\pi _{1}$ with respect to the Lebesgue measure.

The prior for $r_{0}$ will be specified as a Dirichlet process mixture of normal densities. Namely, introduce a convolution density

(4)

\[r_{F,\varSigma }(x)=\int \phi _{\varSigma }(x-z)F(\mathrm{d}z),\]

where F is a distribution function on ${\mathbb{R}}^{d}$, Σ is a $d\times d$ positive definite real matrix, and $\phi _{\varSigma }$ denotes the density of the centered d-dimensional normal distribution with covariance matrix Σ. Let α be a finite measure on ${\mathbb{R}}^{d}$, and let $\mathcal{D}_{\alpha }$ denote the Dirichlet process distribution with base measure α (see [11] or, alternatively, [13] for a modern overview). Recall that if $F\sim \mathcal{D}_{\alpha }$, then for any Borel-measurable partition $B_{1},\dots ,B_{k}$ of ${\mathbb{R}}^{d}$, the distribution of the vector $(F(B_{1}),\dots ,F(B_{k}))$ is the k-dimensional Dirichlet distribution with parameters $\alpha (B_{1}),\dots ,\alpha (B_{k})$. The Dirichlet process location mixture of normals prior $\varPi _{2}$ is obtained as the law of the random function $r_{F,\varSigma }$, where $F\sim \mathcal{D}_{\alpha }$ and $\varSigma \sim G$ for some prior distribution function G on the set of $d\times d$ positive definite matrices. For additional information on Dirichlet process mixtures of normal densities, see, for example, the original papers [12] and [18], or a recent paper [22] and the references therein.

2.3 Posterior

Let $\mathcal{R}$ denote the class of probability densities of the form (4). By Bayes’ theorem, the posterior measure of any measurable set $A\subset (0,\infty )\times \mathcal{R}$ is given by

\[\varPi (A|\mathcal{Z}_{n})=\frac{\iint _{A}L_{n}(\lambda ,r)\mathrm{d}\varPi _{1}(\lambda )\mathrm{d}\varPi _{2}(r)}{\iint L_{n}(\lambda ,r)\mathrm{d}\varPi _{1}(\lambda )\mathrm{d}\varPi _{2}(r)}.\]

The priors $\varPi _{1}$ and $\varPi _{2}$ indirectly induce the prior $\varPi =\varPi _{1}\times \varPi _{2}$ on the collection of densities $k_{\lambda ,r}$. We will use the symbol Π to signify both the prior on $(\lambda _{0},r_{0})$ and the density $k_{\lambda _{0},r_{0}}$. The posterior in the first case will be understood as the posterior for the pair $(\lambda _{0},r_{0})$, whereas in the second case as the posterior for the density $k_{\lambda _{0},r_{0}}$. Thus, setting $\overline{A}=\{k_{\lambda ,r}:(\lambda ,r)\in A\}$, we have

\[\varPi (\overline{A}|\mathcal{Z}_{n})=\frac{\int _{\overline{A}}L_{n}(k)\mathrm{d}\varPi (k)}{\int L_{n}(k)\mathrm{d}\varPi (k)}.\]

In the Bayesian paradigm, the posterior encapsulates all the inferential conclusions for the problem at hand. Once the posterior is available, one can next proceed with computation of other quantities of interest in Bayesian statistics, such as Bayes point estimates or credible sets.

2.4 Distances

The Hellinger distance $h(\mathbb{Q}_{0},\mathbb{Q}_{1})$ between two probability laws $\mathbb{Q}_{0}$ and $\mathbb{Q}_{1}$ on a measurable space $(\varOmega ,\mathfrak{F})$ is given by

\[h(\mathbb{Q}_{0},\mathbb{Q}_{1})={\bigg(\int {\big(\mathrm{d}{\mathbb{Q}_{0}^{1/2}}-\mathrm{d}{\mathbb{Q}_{1}^{1/2}}\big)}^{2}\bigg)}^{1/2}.\]

Assuming that $\mathbb{Q}_{0}\ll \mathbb{Q}_{1}$, the Kullback–Leibler divergence $\mathrm{K}(\mathbb{Q}_{0},\mathbb{Q}_{1})$ is

\[\mathrm{K}(\mathbb{Q}_{0},\mathbb{Q}_{1})=\int \log \bigg(\frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}_{1}}\bigg)\mathrm{d}\mathbb{Q}_{0}.\]

We also define the V-discrepancy by

\[\mathrm{V}(\mathbb{Q}_{0},\mathbb{Q}_{1})=\int {\log }^{2}\bigg(\frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}_{1}}\bigg)\mathrm{d}\mathbb{Q}_{0}.\]

In addition, for positive real numbers x and y, we put

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathrm{K}(x,y)& \displaystyle =x\log \frac{x}{y}-x+y,\\{} \displaystyle \mathrm{V}(x,y)& \displaystyle =x{\log }^{2}\frac{x}{y},\\{} \displaystyle h(x,y)& \displaystyle =\big|\sqrt{x}-\sqrt{y}\big|.\end{array}\]

Using the same symbols K, V, and h is justified as follows. Suppose that Ω is a singleton $\{\omega \}$ and consider the Dirac measures $\delta _{x}$ and $\delta _{y}$ that put masses x and y, respectively, on Ω. Then $\mathrm{K}(\delta _{x},\delta _{y})=\mathrm{K}(x,y)$, and similar equalities are valid for the V-discrepancy and the Hellinger distance.

2.5 Class of locally β-Hölder functions

For any $\beta \in \mathbb{R}$, by $\lfloor \beta \rfloor $ we denote the largest integer strictly smaller than β, by $\mathbb{N}$ the set of natural numbers, whereas $\mathbb{N}_{0}$ stands for the union $\mathbb{N}\cup \{0\}$. For a multiindex $k=(k_{1},\dots ,k_{d})\in {\mathbb{N}_{0}^{d}}$, we set $k_{.}={\sum _{i=1}^{d}}k_{i}$. The usual Euclidean norm of a vector $y\in {\mathbb{R}}^{d}$ is denoted by $\| y\| $.

Let $\beta >0$ and $\tau _{0}\ge 0$ be constants, and let $L:{\mathbb{R}}^{d}\to \mathbb{R}_{+}$ be a measurable function. We define the class ${\mathcal{C}}^{\beta ,L,\tau _{0}}({\mathbb{R}}^{d})$ of locally β-Hölder regular functions as the set of all functions $r:{\mathbb{R}}^{d}\to \mathbb{R}$ such that all mixed partial derivatives ${D}^{k}r$ of r up to order $k_{.}\le \lfloor \beta \rfloor $ exist and, for every k with $k_{.}=\lfloor \beta \rfloor $, satisfy

\[\big|\big({D}^{k}r\big)(x+y)-\big({D}^{k}\big)r(x)\big|\le L(x)\exp \big(\tau _{0}\| y{\| }^{2}\big)\| y{\| }^{\beta -\lfloor \beta \rfloor },\hspace{1em}x,y\in {\mathbb{R}}^{d}.\]

See p. 625 in [22] for this class of functions.

3 Main result

Define the complements of the Hellinger-type neighborhoods of $(\lambda _{0},r_{0})$ by

\[A(\varepsilon _{n},M)=\big\{(\lambda ,r):h(\mathbb{Q}_{\lambda _{0},r_{0}},\mathbb{Q}_{\lambda ,r})>M\varepsilon _{n}\big\},\]

where $\{\varepsilon _{n}\}$ is a sequence of positive numbers. We say that $\varepsilon _{n}$ is a posterior contraction rate if there exists a constant $M>0$ such that

\[\varPi \big(A(\varepsilon _{n},M)\big|\mathcal{Z}_{n}\big)\to 0\]

as $n\to \infty $ in ${\mathbb{Q}_{\lambda _{0},r_{0}}^{n}}$-probability.

The ε-covering number of a subset B of a metric space equipped with the metric ρ is the minimum number of ρ-balls of radius ε needed to cover it. Let $\mathcal{Q}$ be a set of CPP laws $\mathbb{Q}_{\lambda ,r}$. Furthermore, we set

(5)

\[B(\varepsilon ,\mathbb{Q}_{\lambda _{0},r_{0}})=\big\{(\lambda ,r):\mathrm{K}(\mathbb{Q}_{\lambda _{0},r_{0}},\mathbb{Q}_{\lambda ,r})\le {\varepsilon }^{2},\mathrm{V}(\mathbb{Q}_{\lambda _{0},r_{0}},\mathbb{Q}_{\lambda ,r})\le {\varepsilon }^{2}\big\}.\]

We recall the following general result on posterior contraction rates.

Theorem 1 ([14]).

Suppose that for positive sequences $\overline{\varepsilon }_{n},\widetilde{\varepsilon }_{n}\to 0$ such that $n\min ({\overline{\varepsilon }_{n}^{2}},{\widetilde{\varepsilon }_{n}^{2}})\to \infty $, constants $c_{1},c_{2},c_{3},c_{4}>0$, and sets $\mathcal{Q}_{n}\subset \mathcal{Q}$, we have

(6)

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \log N(\overline{\varepsilon }_{n},\mathcal{Q}_{n},h)& \displaystyle \le c_{1}n{\overline{\varepsilon }_{n}^{2}},\end{array}\]

(7)

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \varPi (\mathcal{Q}\setminus \mathcal{Q}_{n})& \displaystyle \le c_{3}{e}^{-n{\widetilde{\varepsilon }_{n}^{2}}(c_{2}+4)},\end{array}\]

(8)

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \varPi \big(B(\widetilde{\varepsilon }_{n},\mathbb{Q}_{\lambda _{0},r_{0}})\big)& \displaystyle \ge c_{4}{e}^{-c_{2}n{\widetilde{\varepsilon }_{n}^{2}}}.\end{array}\]

Then, for $\varepsilon _{n}=\max (\overline{\varepsilon }_{n},\widetilde{\varepsilon }_{n})$ and a constant $M>0$ large enough, we have that

(9)

\[\varPi \big(A(\varepsilon _{n},M)\big|\mathcal{Z}_{n}\big)\to 0\]

as $n\to \infty $ in ${\mathbb{Q}_{\lambda _{0},r_{0}}^{n}}$-probability, assuming that the i.i.d. observations $\{Z_{j}\}$ have been generated according to $\mathbb{Q}_{\lambda _{0},r_{0}}$.

In order to derive the posterior contraction rate in our problem, we impose the following conditions on the true parameter pair $(\lambda _{0},r_{0})$.

Assumption 1.

Denote by $(\lambda _{0},r_{0})$ the true parameter values for the compound Poisson process.

(i) $\lambda _{0}$ is in a compact set $[\underline{\lambda },\overline{\lambda }]\subset (0,\infty )$;
(ii) The true density $r_{0}$ is bounded, belongs to the set ${\mathcal{C}}^{\beta ,L,\tau _{0}}({\mathbb{R}}^{d})$, and additionally satisfies, for some $\varepsilon >0$ and all $k\in {\mathbb{N}_{0}^{d}},\hspace{0.1667em}k_{.}\le \beta $,
\[\int {\bigg(\frac{L}{r_{0}}\bigg)}^{(2\beta +\varepsilon )/\beta }r_{0}<\infty ,\hspace{2em}\int {\bigg(\frac{|{D}^{k}r_{0}|}{r_{0}}\bigg)}^{(2\beta +\varepsilon )/k}r_{0}<\infty .\]
Furthermore, we assume that there exist strictly positive constants $a,b,c$, and τ such that
\[r_{0}(x)\le c\exp \big(-b\| x{\| }^{\tau }\big),\hspace{1em}\| x\| >a.\]

The conditions on $r_{0}$ come from Theorem 1 in [22] and are quite reasonable. They simplify greatly when $r_{0}$ has a compact support.

We also need to make some assumptions on the prior Π defined in Section 2.2.

Assumption 2.

The prior $\varPi =\varPi _{1}\times \varPi _{2}$ on $(\lambda _{0},r_{0})$ satisfies the following assumptions:

(i) The prior $\varPi _{1}$ on λ has a density $\pi _{1}$ (with respect to the Lebesgue measure) that is supported on the finite interval $[\underline{\lambda },\overline{\lambda }]\subset (0,\infty )$ and is such that

(10)
\[0<\underline{\pi }_{1}\le \pi _{1}(\lambda )\le \overline{\pi }_{1}<\infty ,\hspace{1em}\lambda \in [\underline{\lambda },\overline{\lambda }],\]
for some constants $\underline{\pi }_{1}$ and $\overline{\pi }_{1}$;
(ii) The base measure α of the Dirichlet process prior $\mathcal{D}_{\alpha }$ is finite and possesses a strictly positive density on ${\mathbb{R}}^{d}$ such that for all sufficiently large $x>0$ and some strictly positive constants $a_{1},b_{1}$, and $C_{1}$,
\[1-\overline{\alpha }\big({[-x,x]}^{d}\big)\le b_{1}\exp \big(-C_{1}{x}^{a_{1}}\big),\]
where $\overline{\alpha }(\cdot )=\alpha (\cdot )/\alpha ({\mathbb{R}}^{d})$;
(iii) There exist strictly positive constants $\kappa ,a_{2}$, $a_{3}$, $a_{4}$, $a_{5}$, $b_{2}$, $b_{3}$, $b_{4}$, $C_{2}$, $C_{3}$ such that for all $x>0$ large enough,
\[G\big(\varSigma :\operatorname{eig}_{d}\big({\varSigma }^{-1}\big)\ge x\big)\le b_{2}\exp \big(-C_{2}{x}^{a_{2}}\big),\]
for all $x>0$ small enough,
\[G\big(\varSigma :\operatorname{eig}_{1}\big({\varSigma }^{-1}\big)<x\big)\le b_{3}{x}^{a_{3}},\]
and for any $0<s_{1}\le \cdots \le s_{d}$ and $t\in (0,1)$,
\[G\big(\varSigma :s_{j}<\operatorname{eig}_{j}\big({\varSigma }^{-1}\big)<s_{j}(1+t),j=1,\dots ,d\big)\ge b_{4}{s_{1}^{a_{4}}}{t}^{a_{5}}\exp \big(-C_{3}{s_{d}^{\kappa /2}}\big).\]
Here $\operatorname{eig}_{j}({\varSigma }^{-1})$ denotes the jth smallest eigenvalue of the matrix ${\varSigma }^{-1}$.

This assumption comes from [22, p. 626], to which we refer for an additional discussion. In particular, it is shown there that an inverse Wishart distribution (a popular prior distribution for covariance matrices) satisfies the assumptions on G with $\kappa =2$. As far as α is concerned, we can take it such that its rescaled version $\overline{\alpha }$ is a nondegenerate Gaussian distribution on ${\mathbb{R}}^{d}$.

Remark 1.

Assumption (10) requiring that the prior density $\pi _{1}$ is bounded away from zero on the interval $[\underline{\lambda },\overline{\lambda }]$ can be relaxed to allowing it to take the zero value at the end points of this interval, provided that $\lambda _{0}$ is an interior point of $[\underline{\lambda },\overline{\lambda }]$.

We now state our main result.

Theorem 2.

Let Assumptions 1 and 2 hold. Then there exists a constant $M>0$ such that, as $n\to \infty $,

\[\varPi \big(A\big({(\log n)}^{\ell }{n}^{-\gamma },M\big)\big|\mathcal{Z}_{n}\big)\to 0\]

in ${\mathbb{Q}_{\lambda _{0},r_{0}}^{n}}$-probability. Here

\[\gamma =\frac{\beta }{2\beta +{d}^{\ast }},\hspace{2em}\ell >\ell _{0}=\frac{{d}^{\ast }(1+1/\tau +1/\beta )+1}{2+{d}^{\ast }/\beta },\hspace{2em}{d}^{\ast }=\max (d,\kappa ).\]

We conclude this section with a brief discussion on the obtained result: the logarithmic factor ${(\log n)}^{\ell }$ is negligible for practical purposes. If $\kappa =1$, then the posterior contraction rate obtained in Theorem 2 is essentially ${n}^{-2\beta /(2\beta +d)}$, which is the minimax estimation rate in a number of nonparametric settings. This is arguably also the minimax estimation rate in our problem as well (cf. Theorem 2.1 in [16] for a related result in the one-dimensional setting), although here we do not give a formal argument. Equally important is the fact that our result is adaptive: the posterior contraction rate in Theorem 2 is attained without the knowledge of the smoothness level β being incorporated in the construction of our prior Π. Finally, Theorem 2, in combination with Theorem 2.5 and the arguments on pp. 506–507 in [15], implies the existence of Bayesian point estimates achieving (in the frequentist sense) this convergence rate.

Remark 2.

After completion of this work, we learned about the paper [8] that deals with nonparametric Bayesian estimation of intensity functions for Aalen counting processes. Although CPPs are in some sense similar to the latter class of processes, they are not counting processes. An essential difference between our work and [8] lies in the fact that, unlike [8], ours deals with discretely observed multidimensional processes. Also [8] uses the log-spline prior, or the Dirichlet mixture of uniform densities, and not the Dirichlet mixture of normal densities as the prior.

4 Proof of Theorem 2

The proof of Theorem 2 consists in verification of the conditions in Theorem 1. The following lemma plays the key role.

Lemma 1.

The following estimates are valid:

(11)

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathrm{K}(\mathbb{Q}_{\lambda _{0},r_{0}},\mathbb{Q}_{\lambda ,r})& \displaystyle \le \lambda _{0}\mathrm{K}(\mathbb{P}_{r_{0}},\mathbb{P}_{r})+\mathrm{K}(\lambda _{0},\lambda ),\end{array}\]

(12)

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathrm{V}(\mathbb{Q}_{\lambda _{0},r_{0}},\mathbb{Q}_{\lambda ,r})& \displaystyle \le 2\lambda _{0}(1+\lambda _{0})\mathrm{V}(\mathbb{P}_{r_{0}},\mathbb{P}_{r})+4\lambda _{0}\mathrm{K}(\mathbb{P}_{r_{0}},\mathbb{P}_{r})\\{} & \displaystyle \hspace{1em}+2\mathrm{V}(\lambda _{0},\lambda )+4\mathrm{K}(\lambda _{0},\lambda )+2\mathrm{K}{(\lambda _{0},\lambda )}^{2},\end{array}\]

(13)

\[\begin{array}{r@{\hskip0pt}l}\displaystyle h(\mathbb{Q}_{\lambda _{0},r_{0}},\mathbb{Q}_{\lambda ,r})& \displaystyle \le \sqrt{\lambda _{0}}\hspace{0.1667em}h(\mathbb{P}_{r_{0}},\mathbb{P}_{r})+h(\lambda _{0},\lambda ).\end{array}\]

Moreover, there exists a constant $\overline{C}\in (0,\infty )$, depending on $\underline{\lambda }$ and $\overline{\lambda }$ only, such that for all $\lambda _{0},\lambda \in [\underline{\lambda },\overline{\lambda }]$,

(14)

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathrm{K}(\mathbb{Q}_{\lambda _{0},r_{0}},\mathbb{Q}_{\lambda ,r})& \displaystyle \le \overline{C}\big(\mathrm{K}(\mathbb{P}_{r_{0}},\mathbb{P}_{r})+|\lambda _{0}-\lambda {|}^{2}\big),\end{array}\]

(15)

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathrm{V}(\mathbb{Q}_{\lambda _{0},r_{0}},\mathbb{Q}_{\lambda ,r})& \displaystyle \le \overline{C}\big(\mathrm{V}(\mathbb{P}_{r_{0}},\mathbb{P}_{r})+\mathrm{K}(\mathbb{P}_{r_{0}},\mathbb{P}_{r})+|\lambda _{0}-\lambda {|}^{2}\big),\end{array}\]

(16)

\[\begin{array}{r@{\hskip0pt}l}\displaystyle h(\mathbb{Q}_{\lambda _{0},r_{0}},\mathbb{Q}_{\lambda ,r})& \displaystyle \le \overline{C}\big(|\lambda _{0}-\lambda |+h(\mathbb{P}_{r_{0}},\mathbb{P}_{r})\big).\end{array}\]

The proof of the lemma is given in Section 5. We proceed with the proof of Theorem 2.

Let $\varepsilon _{n}={n}^{-\gamma }{(\log n)}^{\ell }$ for γ and $\ell >\ell _{0}$ as in the statement of Theorem 2. Set $\overline{\varepsilon }_{n}=2\overline{C}\varepsilon _{n}$, where $\overline{C}$ is the constant from Lemma 1. We define the sieves of densities $\mathcal{F}_{n}$ as in Theorem 5 in [22]:

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathcal{F}_{n}=\Bigg\{r_{F,\varSigma }\hspace{2.5pt}\text{with}\hspace{2.5pt}F=\sum \limits_{i=1}^{\infty }\pi _{i}\delta _{z_{i}}:& \displaystyle z_{i}\in {[-\alpha _{n},\alpha _{n}]}^{d},\forall i\le I_{n};\sum \limits_{i>I_{n}}\pi _{i}<\varepsilon _{n};\\{} & \displaystyle {\sigma _{0,n}^{2}}\le \operatorname{eig}_{j}(\varSigma )<{\sigma _{0,n}^{2}}{\big(1+{\varepsilon _{n}^{2}}/d\big)}^{J_{n}}\Bigg\},\end{array}\]

where

\[I_{n}=\big\lfloor n{\varepsilon _{n}^{2}}/\log n\big\rfloor ,\hspace{2em}J_{n}={\alpha _{n}^{a_{1}}}={\sigma _{0,n}^{-2a_{2}}}=n,\]

and $a_{1}$ and $a_{2}$ are as in Assumption 2. We also put

(17)

\[\mathcal{Q}_{n}=\big\{\mathbb{Q}_{\lambda ,r}:r\in \mathcal{F}_{n},\lambda \in [\underline{\lambda },\overline{\lambda }]\big\}.\]

In [22], sieves of the type $\mathcal{F}_{n}$ are used to verify conditions of Theorem 1 and to determine posterior contraction rates in the standard density estimation context. We further will show that these sieves also work in the case of decompounding by verifying the conditions of Theorem 1 for the sieves $\mathcal{Q}_{n}$ defined in (17).

4.1 Verification of (6)

Introduce the notation

\[\overline{h}_{1}(\lambda _{1},\lambda _{2})=\overline{C}|\lambda _{1}-\lambda _{2}|,\hspace{2em}\overline{h}_{2}(r_{1},r_{2})=\overline{C}h(\mathbb{P}_{r_{1}},\mathbb{P}_{r_{2}}).\]

Let $\{\lambda _{i}\}$ be the centers of the balls from a minimal covering of $[\underline{\lambda },\overline{\lambda }]$ with $\overline{h}_{1}$-intervals of size $\overline{C}\varepsilon _{n}$. Let $\{r_{j}\}$ be centers of the balls from a minimal covering of $\mathcal{F}_{n}$ with $\overline{h}_{2}$-balls of size $\overline{C}\varepsilon _{n}$. By Lemma 1, for any $\mathbb{Q}_{\lambda ,r}\in \mathcal{Q}_{n}$,

\[h(\mathbb{Q}_{\lambda ,r},\mathbb{Q}_{\lambda _{i},r_{j}})\le \overline{h}_{1}(\lambda ,\lambda _{i})+\overline{h}_{2}(r,r_{j})\le \overline{\varepsilon }_{n}\]

by appropriate choices of i and j. Hence,

\[N(\overline{\varepsilon }_{n},\mathcal{Q}_{n},h)\le N\big(\overline{C}\varepsilon _{n},[\underline{\lambda },\overline{\lambda }],\overline{h}_{1}\big)\times N(\overline{C}\varepsilon _{n},\mathcal{F}_{n},\overline{h}_{2}),\]

and so

\[\log N(\overline{\varepsilon }_{n},\mathcal{Q}_{n},h)\le \log N\big(\overline{C}\varepsilon _{n},[\underline{\lambda },\overline{\lambda }],\overline{h}_{1}\big)+\log N(\overline{C}\varepsilon _{n},\mathcal{F}_{n},\overline{h}_{2}).\]

By Proposition 2 and Theorem 5 in [22], there exists a constant $c_{1}>0$ such that for all n large enough,

\[\log N(\overline{C}\varepsilon _{n},\mathcal{F}_{n},\overline{h}_{2})=\log N(\varepsilon _{n},\mathcal{F}_{n},h)\le c_{1}n{\varepsilon _{n}^{2}}=\frac{c_{1}}{4{\overline{C}}^{2}}n{\overline{\varepsilon }_{n}^{2}}.\]

On the other hand,

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \log N\big(\overline{C}\varepsilon _{n},[\underline{\lambda },\overline{\lambda }],\overline{h}_{1}\big)& \displaystyle =\log N\big(\varepsilon _{n},[\underline{\lambda },\overline{\lambda }],|\cdot |\big),\\{} & \displaystyle \lesssim \log \bigg(\frac{1}{\varepsilon _{n}}\bigg)\\{} & \displaystyle \lesssim \log \bigg(\frac{1}{\overline{\varepsilon }_{n}}\bigg).\end{array}\]

With our choice of $\overline{\varepsilon }_{n}$, for all n large enough, we have

\[\frac{c_{1}}{4{\overline{C}}^{2}}n{\overline{\varepsilon }_{n}^{2}}\ge \log \bigg(\frac{1}{\overline{\varepsilon }_{n}}\bigg),\]

so that for all n large enough,

\[\log N(\overline{\varepsilon }_{n},\mathcal{Q}_{n},h)\le \frac{c_{1}}{2{\overline{C}}^{2}}n{\overline{\varepsilon }_{n}^{2}}.\]

We can simply rename the constant $c_{1}/(2{\overline{C}}^{2})$ in this formula into $c_{1}$, and thus (6) is satisfied with that constant.

4.2 Verification of (7) and (8)

We first focus on (8). Introduce

\[\widetilde{B}(\varepsilon ,\mathbb{Q}_{\lambda _{0},r_{0}})=\big\{(\lambda ,r):\mathrm{K}(\mathbb{P}_{r_{0}},\mathbb{P}_{r})\le {\varepsilon }^{2},\mathrm{V}(\mathbb{P}_{r_{0}},\mathbb{P}_{r})\le {\varepsilon }^{2},|\lambda _{0}-\lambda |\le \varepsilon \big\}.\]

Suppose that $(\lambda ,r)\in \tilde{B}(\varepsilon ,\mathbb{Q}_{\lambda _{0},r_{0}})$. From (14) we obtain

\[\mathrm{K}(\mathbb{Q}_{\lambda _{0},r_{0}},\mathbb{Q}_{\lambda ,r})\le \overline{C}\mathrm{K}(\mathbb{P}_{r_{0}},\mathbb{P}_{r})+\overline{C}|\lambda -\lambda _{0}{|}^{2}\le 2\overline{C}{\varepsilon }^{2}.\]

Furthermore, using (15), we have

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathrm{V}(\mathbb{Q}_{\lambda _{0},r_{0}},\mathbb{Q}_{\lambda ,r})& \displaystyle \le \overline{C}\mathrm{V}(\mathbb{P}_{r_{0}},\mathbb{P}_{r})+\overline{C}\mathrm{K}(\mathbb{P}_{r_{0}},\mathbb{P}_{r})+\overline{C}|\lambda -\lambda _{0}{|}^{2}\le 3\overline{C}{\varepsilon }^{2}.\end{array}\]

Combination of these inequalities with the definition of the set $B(\varepsilon ,\mathbb{Q}_{\lambda _{0},r_{0}})$ in (5) yields

\[\widetilde{B}(\varepsilon ,\mathbb{Q}_{\lambda _{0},r_{0}})\subset B(\sqrt{3\overline{C}}\varepsilon ,\mathbb{Q}_{\lambda _{0},r_{0}}).\]

Consequently,

(18)

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \varPi \big(B(\sqrt{3\overline{C}}\varepsilon ,\mathbb{Q}_{\lambda _{0},r_{0}})\big)& \displaystyle \ge \varPi \big(\tilde{B}(\varepsilon ,\mathbb{Q}_{\lambda _{0},r_{0}})\big)\\{} & \displaystyle =\varPi _{1}(|\lambda _{0}-\lambda |\le \varepsilon )\\{} & \displaystyle \hspace{1em}\times \varPi _{2}\big(r_{f,\varSigma }:\mathrm{K}(\mathbb{P}_{r_{0}},\mathbb{P}_{r_{F,\varSigma }})\le {\varepsilon }^{2},\hspace{0.1667em}\mathrm{V}(\mathbb{P}_{r_{0}},\mathbb{P}_{r_{F,\varSigma }})\le {\varepsilon }^{2}\big).\end{array}\]

By Assumption 2(i),

\[\varPi _{1}(|\lambda _{0}-\lambda |\le \varepsilon )\ge \underline{\pi }_{1}\varepsilon .\]

Furthermore, Theorem 4 in [22] yields that for some $A,C>0$ and all sufficiently large n,

\[\begin{array}{l}\displaystyle \varPi _{2}\big(r_{F,\varSigma }:\mathrm{K}(\mathbb{P}_{r_{0}},\mathbb{P}_{r_{F,\varSigma }})\le A{n}^{-2\gamma }{(\log n)}^{2\ell _{0}},\mathrm{V}(\mathbb{P}_{r_{0}},\mathbb{P}_{r_{F,\varSigma }})\le A{n}^{-2\gamma }{(\log n)}^{2\ell _{0}}\big)\\{} \displaystyle \ge \exp \big(-Cn{\big\{{n}^{-\gamma }{(\log n)}^{\ell _{0}}\big\}}^{2}\big).\end{array}\]

We substitute ε with $\sqrt{A}{n}^{-\gamma }{(\log n)}^{\ell _{0}}$ and write $\widetilde{\varepsilon }_{n}=\sqrt{3A\overline{C}}{n}^{-\gamma }{(\log n)}^{\ell _{0}}$ to arrive at

\[\varPi \big(B(\widetilde{\varepsilon }_{n},\mathbb{Q}_{\lambda _{0},r_{0}})\big)\ge \underline{\pi }_{1}\sqrt{A}{n}^{-\gamma }{(\log n)}^{\ell _{0}}\times \exp \bigg(-\frac{C}{3A\overline{C}}n{\widetilde{\varepsilon }_{n}^{2}}\bigg).\]

Now, since $\gamma <\frac{1}{2}$, for all n large enough, we have

\[\underline{\pi }_{1}\sqrt{A}{n}^{-\gamma }{(\log n)}^{\ell _{0}}\ge \exp \big(-{n}^{1-2\gamma }{(\log n)}^{2\ell _{0}}\big).\]

Consequently, for all n large enough,

(19)

\[\varPi (B(\widetilde{\varepsilon }_{n},\mathbb{Q}_{\lambda _{0},f_{0}})\ge \exp \bigg(-\bigg(\frac{C+1}{3A\overline{C}}\bigg)n{\widetilde{\varepsilon }_{n}^{2}}\bigg).\]

Choosing $c_{2}=\frac{C+1}{3A\overline{C}}$, we have verified (8) (with $c_{4}=1$).

For the verification of (7), we use the constants $c_{2}$ and $\widetilde{\varepsilon }_{n}$ as above. Note first that

\[\varPi (\mathcal{Q}\setminus \mathcal{Q}_{n})=\varPi _{2}\big({\mathcal{F}_{n}^{c}}\big).\]

By Theorem 5 in [22] (see also p. 627 there), for some $c_{3}>0$ and any constant $c>0$, we have

\[\varPi _{2}\big({\mathcal{F}_{n}^{c}}\big)\le c_{3}\exp \big(-(c+4)n{\big\{{n}^{-\gamma }{(\log n)}^{\ell _{0}}\big\}}^{2}\big),\]

provided that n is large enough. Thus,

\[\varPi (\mathcal{Q}\setminus \mathcal{Q}_{n})\le c_{3}\exp \bigg(-\frac{c+4}{3A\overline{C}}\hspace{0.1667em}n{\widetilde{\varepsilon }_{n}^{2}}\bigg).\]

Without loss of generality, we can take the positive constant c greater than $3A\overline{C}(c_{2}+4)-4$. This gives

\[\varPi (\mathcal{Q}\setminus \mathcal{Q}_{n})\le c_{3}\exp \big(-(c_{2}+4)n{\widetilde{\varepsilon }_{n}^{2}}\big),\]

which is indeed (7).

We have thus verified conditions (6)–(8), and the statement of Theorem 2 follows by Theorem 1 since $\bar{\varepsilon }_{n}\ge \widetilde{\varepsilon }_{n}$ (eventually).

5 Proof of Lemma 1

We start with a lemma from [7], which will be used three times in the proof of Lemma 1. Consider a probability space $(\varOmega ,\mathfrak{F},\mathbb{P})$. Let $\mathbb{P}_{0}$ be a probability measure on $(\varOmega ,\mathfrak{F})$ and assume that $\mathbb{P}_{0}\ll \mathbb{P}$ with Radon–Nikodym derivative $\zeta =\frac{\mathrm{d}\mathbb{P}_{0}}{\mathrm{d}\mathbb{P}}$. Furthermore, let $\mathfrak{G}$ be a sub-σ-algebra of $\mathfrak{F}$. The restrictions of $\mathbb{P}$ and $\mathbb{P}_{0}$ to $\mathfrak{G}$ are denoted ${\mathbb{P}^{\prime }}$ and ${\mathbb{P}^{\prime }_{0}}$, respectively. Then ${\mathbb{P}^{\prime }_{0}}\ll {\mathbb{P}^{\prime }}$ and $\frac{\mathrm{d}{\mathbb{P}^{\prime }_{0}}}{\mathrm{d}{\mathbb{P}^{\prime }}}=\mathbb{E}_{\mathbb{P}}[\zeta |\mathfrak{G}]=:{\zeta ^{\prime }}$.

Lemma 2.

Let $g:[0,\infty )\to \mathbb{R}$ be a convex function. Then

\[\mathbb{E}_{{\mathbb{P}^{\prime }}}g\big({\zeta ^{\prime }}\big)\le \mathbb{E}_{\mathbb{P}}\hspace{0.1667em}g(\zeta ).\]

The proof of the lemma consists in an application of Jensen’s inequality for conditional expectations. This lemma is typically used as follows. The measures $\mathbb{P}$ and $\mathbb{P}_{0}$ are possible distributions of some random element X. If ${X^{\prime }}=T(X)$ is some measurable transformation of X, then we consider ${\mathbb{P}^{\prime }}$ and ${\mathbb{P}^{\prime }_{0}}$ as the corresponding distributions of ${X^{\prime }}$. Here T may be a projection. In the present context, we take $X=(X_{t},t\in [0,1])$ and ${X^{\prime }}=X_{1}$, and so $\mathbb{P}$ in the lemma should be taken as $\mathbb{R}=\mathbb{R}_{\lambda ,r}$ and ${\mathbb{P}^{\prime }}$ as $\mathbb{Q}=\mathbb{Q}_{\lambda ,r}$.

In the proof of Lemma 1, for economy of notation, a constant $c(\underline{\lambda },\overline{\lambda })$ depending on $\underline{\lambda }$ and $\overline{\lambda }$ may differ from line to line. We also abbreviate $\mathbb{Q}_{\lambda _{0},r_{0}}$ and $\mathbb{Q}_{\lambda ,r}$ to $\mathbb{Q}_{0}$ and $\mathbb{Q}$, respectively. The same convention will be used for $\mathbb{R}_{\lambda _{0},r_{0}}$, $\mathbb{R}_{\lambda ,r}$, $\mathbb{P}_{r_{0}}$, and $\mathbb{P}_{r}$.

Proof of inequalities (11) and (14).

Application of Lemma 2 with $g(x)=(x\log x)1_{\{x\ge 0\}}$ gives $\mathrm{K}(\mathbb{Q}_{0},\mathbb{Q})\le \mathrm{K}(\mathbb{R}_{0},\mathbb{R})$. Using (1) and the expression for the mean of a stochastic integral with respect to a Poisson point process (see, e.g., property 6 on p. 68 in [23]), we obtain that

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathrm{K}(\mathbb{R}_{0},\mathbb{R})& \displaystyle =\int \log \bigg(\frac{\mathrm{d}\mathbb{R}_{0}}{\mathrm{d}\mathbb{R}}\bigg)\mathrm{d}\mathbb{R}_{0}\\{} & \displaystyle =\lambda _{0}\int \log \bigg(\frac{\lambda _{0}r_{0}}{\lambda r}\bigg)r_{0}-(\lambda _{0}-\lambda )\\{} & \displaystyle =\lambda _{0}\mathrm{K}(\mathbb{P}_{0},\mathbb{P})+\bigg(\lambda _{0}\log \bigg(\frac{\lambda _{0}}{\lambda }\bigg)-[\lambda _{0}-\lambda ]\bigg)\\{} & \displaystyle =\lambda _{0}\mathrm{K}(\mathbb{P}_{0},\mathbb{P})+\mathrm{K}(\lambda _{0},\lambda ).\end{array}\]

Now

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \lambda _{0}\log \bigg(\frac{\lambda _{0}}{\lambda }\bigg)-(\lambda _{0}-\lambda )& \displaystyle =\lambda _{0}\bigg|\log \bigg(\frac{\lambda }{\lambda _{0}}\bigg)-\bigg(\frac{\lambda }{\lambda _{0}}-1\bigg)\bigg|\\{} & \displaystyle \le c(\underline{\lambda },\overline{\lambda })|\lambda _{0}-\lambda {|}^{2},\end{array}\]

where $c(\underline{\lambda },\overline{\lambda })$ is some constant depending on $\underline{\lambda }$ and $\overline{\lambda }$. The result follows. □

Proof of inequalities (12) and (15).

We have

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathrm{V}(\mathbb{Q}_{0},\mathbb{Q})& \displaystyle =\mathbb{E}_{\mathbb{Q}_{0}}\bigg[{\log }^{2}\bigg(\frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}}\bigg)1_{\{\frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}}\ge 1\}}\bigg]+\mathbb{E}_{\mathbb{Q}_{0}}\bigg[{\log }^{2}\bigg(\frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}}\bigg)1_{\{\frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}}<1\}}\bigg]\\{} & \displaystyle =\mathrm{I}+\mathrm{II}.\end{array}\]

Application of Lemma 2 with $g(x)=(x{\log }^{2}(x))1_{\{x\ge 1\}}$ (which is a convex function) gives

(20)

\[\mathrm{I}\le \mathbb{E}_{\mathbb{R}_{0}}\bigg[{\log }^{2}\bigg(\frac{\mathrm{d}\mathbb{R}_{0}}{\mathrm{d}\mathbb{R}}\bigg)1_{[\frac{\mathrm{d}\mathbb{R}_{0}}{\mathrm{d}\mathbb{R}}\ge 1]}\bigg]\le \mathrm{V}(\mathbb{R}_{0},\mathbb{R}).\]

As far as $\mathrm{II}$ is concerned, for $x\ge 0$, we have the inequalities

\[\frac{{x}^{2}}{2}\le {e}^{x}-1-x\le 2{\big({e}^{x/2}-1\big)}^{2}.\]

The first inequality is trivial, and the second is a particular case of inequality (8.5) in [15] and is equally elementary. The two inequalities together yield

\[{e}^{-x}{x}^{2}\le 4{\big({e}^{-x/2}-1\big)}^{2}.\]

Applying this inequality with $x=-\log \frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}}$ (which is positive on the event $\{\frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}}<1\}$) and taking the expectation with respect to $\mathbb{Q}$ give

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathrm{II}& \displaystyle =\mathbb{E}_{\mathbb{Q}}\bigg[\frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}}{\log }^{2}\frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}}1_{\{\frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}}<1\}}\bigg]\\{} & \displaystyle \le 4\int {\bigg(\sqrt{\frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}}}-1\bigg)}^{2}\mathrm{d}\mathbb{Q}\\{} & \displaystyle =4{h}^{2}(\mathbb{Q}_{0},\mathbb{Q})\le 4\mathrm{K}(\mathbb{Q}_{0},\mathbb{Q}).\end{array}\]

For the final inequality, see [20], p. 62, formula (12).

Combining the estimates on I and $\mathrm{II},$ we obtain that

(21)

\[\mathrm{V}(\mathbb{Q}_{0},\mathbb{Q})\le \mathrm{V}(\mathbb{R}_{0},\mathbb{R})+4\mathrm{K}(\mathbb{Q}_{0},\mathbb{Q}).\]

After some long and tedious calculations employing (1) and the expressions for the mean and variance of a stochastic integral with respect to a Poisson point process (see, e.g., property 6 on p. 68 in [23] and Lemma 1.1 in [17]), we get that

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathrm{V}(\mathbb{R}_{0},\mathbb{R})& \displaystyle =\lambda _{0}\int {\bigg\{\log \bigg(\frac{\lambda _{0}}{\lambda }\bigg)+\log \bigg(\frac{r_{0}}{r}\bigg)\bigg\}}^{2}f_{0}\\{} & \displaystyle \hspace{1em}+{\lambda _{0}^{2}}{\bigg\{\int \log \bigg(\frac{r_{0}}{r}\bigg)r_{0}+\log \bigg(\frac{\lambda _{0}}{\lambda }\bigg)-\bigg(1-\frac{\lambda }{\lambda _{0}}\bigg)\bigg\}}^{2}\\{} & \displaystyle =\mathrm{III}+\mathrm{IV}.\end{array}\]

By the $c_{2}$-inequality ${(a+b)}^{2}\le 2{a}^{2}+2{b}^{2}$ we have

(22)

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathrm{III}& \displaystyle \le 2\lambda _{0}{\log }^{2}\bigg(\frac{\lambda _{0}}{\lambda }\bigg)+2\lambda _{0}\int {\log }^{2}\bigg(\frac{r_{0}}{r}\bigg)r_{0}\\{} & \displaystyle =2\mathrm{V}(\lambda _{0},\lambda )+2\lambda _{0}\mathrm{V}(\mathbb{P}_{0},\mathbb{P}),\end{array}\]

from which we deduce

(23)

\[\mathrm{III}\le c(\underline{\lambda },\overline{\lambda })|\lambda _{0}-\lambda {|}^{2}+2\overline{\lambda }\mathrm{V}(\mathbb{P}_{0},\mathbb{P})\]

for some constant $c(\underline{\lambda },\overline{\lambda })$ depending on $\underline{\lambda }$ and $\overline{\lambda }$ only. As far as $\mathrm{IV}$ is concerned, the $c_{2}$-inequality and the Cauchy–Schwarz inequality give that

(24)

\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathrm{IV}& \displaystyle \le 2{\lambda _{0}^{2}}{\bigg(\int \log \bigg(\frac{r_{0}}{r}\bigg)r_{0}\bigg)}^{2}+2{\lambda _{0}^{2}}{\bigg(\log \bigg(\frac{\lambda _{0}}{\lambda }\bigg)-\bigg[1-\frac{\lambda }{\lambda _{0}}\bigg]\bigg)}^{2}\\{} & \displaystyle \le 2{\lambda _{0}^{2}}\mathrm{V}(\mathbb{P}_{0},\mathbb{P})+2\mathrm{K}{(\lambda _{0},\lambda )}^{2},\end{array}\]

from which we find the upper bound

(25)

\[\mathrm{IV}\le 2{\overline{\lambda }}^{2}\mathrm{V}(\mathbb{P}_{0},\mathbb{P})+c(\underline{\lambda },\overline{\lambda })|\lambda _{0}-\lambda {|}^{2}\]

for some constant $c(\underline{\lambda },\overline{\lambda })$ depending on $\underline{\lambda }$ and $\overline{\lambda }$. Combining estimates (22) and (24) on $\mathrm{III}$ and $\mathrm{IV}$ with inequalities (21) and (11) yields (12). Similarly, the upper bounds (23) and (25), combined with (21) and (11), yield (15). □

Proof of inequalities (13) and (16).

First, note that for $g(x)={(\sqrt{x}-1)}^{2}1_{[x\ge 0]}$,

\[{h}^{2}(\mathbb{Q}_{0},\mathbb{Q})=\mathbb{E}_{\mathbb{Q}}\bigg[{\bigg(\sqrt{\frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}}}-1\bigg)}^{2}\bigg]=\mathbb{E}_{\mathbb{Q}}\bigg[g\bigg(\frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}}\bigg)\bigg].\]

Since g is convex, an application of Lemma 2 yields $h(\mathbb{Q}_{0},\mathbb{Q})\le h(\mathbb{R}_{0},\mathbb{R})$. Using (1) and invoking Lemma 1.5 in [17], in particular, using formula (1.30) in its statement, we get that

\[\begin{array}{r@{\hskip0pt}l}\displaystyle h(\mathbb{R}_{0},\mathbb{R})& \displaystyle \le \| \sqrt{\lambda _{0}r_{0}}-\sqrt{\lambda r}\| \\{} & \displaystyle \le \| \sqrt{\lambda _{0}r_{0}}-\sqrt{\lambda _{0}r}\| +\| \sqrt{\lambda _{0}r}-\sqrt{\lambda r}\| \\{} & \displaystyle \le \sqrt{\lambda _{0}}\| \sqrt{r_{0}}-\sqrt{r}\| +|\sqrt{\lambda _{0}}-\sqrt{\lambda }|\\{} & \displaystyle =\sqrt{\lambda _{0}}h(\mathbb{P}_{0},\mathbb{P})+h(\lambda _{0},\lambda ),\end{array}\]

where $\| \cdot \| $ denotes the ${L}^{2}$-norm. This proves (13). Furthermore, from this we obtain the obvious upper bound

\[h(\mathbb{R}_{0},\mathbb{R})\le \sqrt{\overline{\lambda }}\hspace{0.1667em}h(\mathbb{P}_{0},\mathbb{P})+\frac{1}{2\sqrt{\underline{\lambda }}}|\lambda _{0}-\lambda |,\]

which yields (16). □

Authors

Abstract

1 Introduction

2 Preliminaries

2.1 Likelihood

(1)

(2)

(3)

2.2 Prior

(4)

2.3 Posterior

2.4 Distances

2.5 Class of locally β-Hölder functions

3 Main result

(5)

Theorem 1 ([14]).

(6)

(7)

(8)

(9)

Assumption 1.

Assumption 2.

(10)

Remark 1.

Theorem 2.

Remark 2.

4 Proof of Theorem 2

Lemma 1.

(11)

(12)

(13)

(14)

(15)

(16)

(17)

4.1 Verification of (6)

4.2 Verification of (7) and (8)

(18)

(19)

5 Proof of Lemma 1

Lemma 2.

Proof of inequalities (11) and (14).

Proof of inequalities (12) and (15).

(20)

(21)

(22)

(23)

(24)

(25)

Proof of inequalities (13) and (16).

Acknowledgments

References

Export citation

Copy and paste formatted citation

Download citation in file

Theorem 1 ([14]).

(6)

(7)

(8)

(9)

Theorem 2.