1 Introduction
Let $N=(N_{t})_{t\ge 0}$ be a Poisson process of constant intensity $\lambda >0$, and let $\{Y_{j}\}$ be independent and identically distributed (i.i.d.) ${\mathbb{R}}^{d}$-valued random vectors defined on the same probability space and having a common distribution function R, which is assumed to be absolutely continuous with respect to the Lebesgue measure with density r. Assume that N and $\{Y_{j}\}$ are independent and define the ${\mathbb{R}}^{d}$-valued process $X=(X_{t})_{t\ge 0}$ by
The process X is called a compound Poisson process (CPP) and forms a basic stochastic model in a variety of applied fields, such as, for example, risk theory and queueing; see [10, 21].
Suppose that, corresponding to the true parameter pair $(\lambda _{0},r_{0})$, a sample $X_{\varDelta }$, $X_{2\varDelta },\dots ,X_{n\varDelta }$ from X is available, where the sampling mesh $\varDelta >0$ is assumed to be fixed and thus independent of n. The problem we study in this note is nonparametric estimation of $r_{0}$ (and of $\lambda _{0}$). This is referred to as decompounding and is well studied for one-dimensional CPPs; see [2, 3, 6, 9, 24]. Some practical situations in which this problem may arise are listed in [9, p. 3964]. However, the methods used in the above papers do not seem to admit (with the exception of [24]) a generalization to the multidimensional setup. This is also true for papers studying nonparametric inference for more general classes of Lévy processes (of which CPPs form a particular class), such as, for example, [4, 5, 19]. In fact, there is a dearth of publications dealing with nonparametric inference for multidimensional Lévy processes. An exception is [1], where the setup is however specific in that it is geared to inference in Lévy copula models and that, unlike the present work, the high-frequency sampling scheme is assumed ($\varDelta =\varDelta _{n}\to 0$ and $n\varDelta _{n}\to \infty $).
In this work, we will establish the posterior contraction rate in a suitable metric around the true parameter pair $(\lambda _{0},r_{0})$. This concerns study of asymptotic frequentist properties of Bayesian procedures, which has lately received considerable attention in the literature (see, e.g., [14, 15]), and is useful in that it provides their justification from the frequentist point of view. Our main result says that for a β-Hölder regular density $r_{0}$, under some suitable additional assumptions on the model and the prior, the posterior contracts at the rate ${n}^{-\beta /(2\beta +d)}{(\log n)}^{\ell }$, which, perhaps up to a logarithmic factor, is arguably the optimal posterior contraction rate in our problem. Finally, our Bayesian procedure is adaptive: the construction of our prior does not require knowledge of the smoothness level β in order to achieve the posterior contraction rate given above.
The proof of our main theorem employs certain results from [14, 22] but involves a substantial number of technicalities specifically characteristic of decompounding.
We remark that a practical implementation of the Bayesian approach to decompounding lies outside the scope of the present paper. Preliminary investigations and a small scale simulation study we performed show that it is feasible and under certain conditions leads to good results. However, the technical complications one has to deal with are quite formidable, and therefore the results of our study of implementational aspects of decompounding will be reported elsewhere.
The rest of the paper is organized as follows. In the next section, we introduce some notation and recall a number of notions useful for our purposes. Section 3 contains our main result, Theorem 2, and a brief discussion on it. The proof of Theorem 2 is given in Section 4. Finally, Section 5 contains the proof of the key technical lemma used in our proofs.
2 Preliminaries
Assume without loss of generality that $\varDelta =1$, and let $Z_{i}=X_{i}-X_{i-1}$, $i=1,\dots ,n$. The ${\mathbb{R}}^{d}$-valued random vectors $Z_{i}$ are i.i.d. copies of a random vector
where $\{Y_{j}\}$ are i.i.d. with distribution function $R_{0}$, whereas T, which is independent of $\{Y_{j}\}$, has the Poisson distribution with parameter $\lambda _{0}$. The problem of decompounding the jump size density $r_{0}$ introduced in Section 1 is equivalent to estimation of $r_{0}$ from observations $\mathcal{Z}_{n}=\{Z_{1},Z_{2},\dots ,Z_{n}\}$, and we will henceforth concentrate on this alternative formulation. We will use the following notation:
2.1 Likelihood
We will first specify the dominating measure for $\mathbb{Q}_{\lambda ,r}$, which allows us to write down the likelihood in our model. Define the random measure μ by
The density $k_{\lambda ,r}$ of $\mathbb{Q}_{\lambda ,r}$ with respect to $\mathbb{Q}_{\widetilde{\lambda },\widetilde{r}}$ is then given by the conditional expectation
where the subscript in the conditional expectation operator signifies the fact that it is evaluated under $\mathbb{R}_{\widetilde{\lambda },\widetilde{r}}$; see Theorem 2 on p. 245 in [23] and Corollary 2 on p. 246 there. Hence, the likelihood (in the parameter pair $(\lambda ,r)$) associated with the sample $\mathcal{Z}_{n}$ is given by
\[\mu (B)=\big\{\mathrm{\# }t:(t,X_{t}-X_{t-})\in B\big\},\hspace{1em}B\in \mathcal{B}\big([0,1]\big)\otimes \mathcal{B}\big({\mathbb{R}}^{d}\setminus \{0\}\big).\]
Under $\mathbb{R}_{\lambda ,r}$, the random measure μ is a Poisson point process on $[0,1]\times ({\mathbb{R}}^{d}\setminus \{0\})$ with intensity measure $\varLambda (\mathrm{d}t,dx)=\lambda \mathrm{d}tr(x)\mathrm{d}x$. Provided that $\lambda ,\widetilde{\lambda }>0$, and $\widetilde{r}>0$, by formula (46.1) on p. 262 in [23] we have
(1)
\[\frac{\mathrm{d}\mathbb{R}_{\lambda ,r}}{\mathrm{d}\mathbb{R}_{\widetilde{\lambda },\widetilde{r}}}(X)=\exp \Bigg({\int _{0}^{1}}\int _{{\mathbb{R}}^{d}}\log \bigg(\frac{\lambda r(x)}{\widetilde{\lambda }\widetilde{r}(x)}\bigg)\mu (\mathrm{d}t,\mathrm{d}x)-(\lambda -\widetilde{\lambda })\Bigg).\](2)
\[k_{\lambda ,r}(x)=\mathbb{E}_{\widetilde{\lambda },\widetilde{r}}\bigg(\frac{\mathrm{d}\mathbb{R}_{\lambda ,r}}{\mathrm{d}\mathbb{R}_{\widetilde{\lambda },\widetilde{r}}}(X)\hspace{0.1667em}\Big|\hspace{0.1667em}X_{1}=x\bigg),\]2.2 Prior
We will use the product prior $\varPi =\varPi _{1}\times \varPi _{2}$ for $(\lambda _{0},r_{0})$. The prior $\varPi _{1}$ for $\lambda _{0}$ will be assumed to be supported on the interval $[\underline{\lambda },\overline{\lambda }]$ and to possess a density $\pi _{1}$ with respect to the Lebesgue measure.
The prior for $r_{0}$ will be specified as a Dirichlet process mixture of normal densities. Namely, introduce a convolution density
where F is a distribution function on ${\mathbb{R}}^{d}$, Σ is a $d\times d$ positive definite real matrix, and $\phi _{\varSigma }$ denotes the density of the centered d-dimensional normal distribution with covariance matrix Σ. Let α be a finite measure on ${\mathbb{R}}^{d}$, and let $\mathcal{D}_{\alpha }$ denote the Dirichlet process distribution with base measure α (see [11] or, alternatively, [13] for a modern overview). Recall that if $F\sim \mathcal{D}_{\alpha }$, then for any Borel-measurable partition $B_{1},\dots ,B_{k}$ of ${\mathbb{R}}^{d}$, the distribution of the vector $(F(B_{1}),\dots ,F(B_{k}))$ is the k-dimensional Dirichlet distribution with parameters $\alpha (B_{1}),\dots ,\alpha (B_{k})$. The Dirichlet process location mixture of normals prior $\varPi _{2}$ is obtained as the law of the random function $r_{F,\varSigma }$, where $F\sim \mathcal{D}_{\alpha }$ and $\varSigma \sim G$ for some prior distribution function G on the set of $d\times d$ positive definite matrices. For additional information on Dirichlet process mixtures of normal densities, see, for example, the original papers [12] and [18], or a recent paper [22] and the references therein.
2.3 Posterior
Let $\mathcal{R}$ denote the class of probability densities of the form (4). By Bayes’ theorem, the posterior measure of any measurable set $A\subset (0,\infty )\times \mathcal{R}$ is given by
\[\varPi (A|\mathcal{Z}_{n})=\frac{\iint _{A}L_{n}(\lambda ,r)\mathrm{d}\varPi _{1}(\lambda )\mathrm{d}\varPi _{2}(r)}{\iint L_{n}(\lambda ,r)\mathrm{d}\varPi _{1}(\lambda )\mathrm{d}\varPi _{2}(r)}.\]
The priors $\varPi _{1}$ and $\varPi _{2}$ indirectly induce the prior $\varPi =\varPi _{1}\times \varPi _{2}$ on the collection of densities $k_{\lambda ,r}$. We will use the symbol Π to signify both the prior on $(\lambda _{0},r_{0})$ and the density $k_{\lambda _{0},r_{0}}$. The posterior in the first case will be understood as the posterior for the pair $(\lambda _{0},r_{0})$, whereas in the second case as the posterior for the density $k_{\lambda _{0},r_{0}}$. Thus, setting $\overline{A}=\{k_{\lambda ,r}:(\lambda ,r)\in A\}$, we have
In the Bayesian paradigm, the posterior encapsulates all the inferential conclusions for the problem at hand. Once the posterior is available, one can next proceed with computation of other quantities of interest in Bayesian statistics, such as Bayes point estimates or credible sets.2.4 Distances
The Hellinger distance $h(\mathbb{Q}_{0},\mathbb{Q}_{1})$ between two probability laws $\mathbb{Q}_{0}$ and $\mathbb{Q}_{1}$ on a measurable space $(\varOmega ,\mathfrak{F})$ is given by
\[h(\mathbb{Q}_{0},\mathbb{Q}_{1})={\bigg(\int {\big(\mathrm{d}{\mathbb{Q}_{0}^{1/2}}-\mathrm{d}{\mathbb{Q}_{1}^{1/2}}\big)}^{2}\bigg)}^{1/2}.\]
Assuming that $\mathbb{Q}_{0}\ll \mathbb{Q}_{1}$, the Kullback–Leibler divergence $\mathrm{K}(\mathbb{Q}_{0},\mathbb{Q}_{1})$ is
\[\mathrm{K}(\mathbb{Q}_{0},\mathbb{Q}_{1})=\int \log \bigg(\frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}_{1}}\bigg)\mathrm{d}\mathbb{Q}_{0}.\]
We also define the V-discrepancy by
\[\mathrm{V}(\mathbb{Q}_{0},\mathbb{Q}_{1})=\int {\log }^{2}\bigg(\frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}_{1}}\bigg)\mathrm{d}\mathbb{Q}_{0}.\]
In addition, for positive real numbers x and y, we put
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathrm{K}(x,y)& \displaystyle =x\log \frac{x}{y}-x+y,\\{} \displaystyle \mathrm{V}(x,y)& \displaystyle =x{\log }^{2}\frac{x}{y},\\{} \displaystyle h(x,y)& \displaystyle =\big|\sqrt{x}-\sqrt{y}\big|.\end{array}\]
Using the same symbols K, V, and h is justified as follows. Suppose that Ω is a singleton $\{\omega \}$ and consider the Dirac measures $\delta _{x}$ and $\delta _{y}$ that put masses x and y, respectively, on Ω. Then $\mathrm{K}(\delta _{x},\delta _{y})=\mathrm{K}(x,y)$, and similar equalities are valid for the V-discrepancy and the Hellinger distance.2.5 Class of locally β-Hölder functions
For any $\beta \in \mathbb{R}$, by $\lfloor \beta \rfloor $ we denote the largest integer strictly smaller than β, by $\mathbb{N}$ the set of natural numbers, whereas $\mathbb{N}_{0}$ stands for the union $\mathbb{N}\cup \{0\}$. For a multiindex $k=(k_{1},\dots ,k_{d})\in {\mathbb{N}_{0}^{d}}$, we set $k_{.}={\sum _{i=1}^{d}}k_{i}$. The usual Euclidean norm of a vector $y\in {\mathbb{R}}^{d}$ is denoted by $\| y\| $.
Let $\beta >0$ and $\tau _{0}\ge 0$ be constants, and let $L:{\mathbb{R}}^{d}\to \mathbb{R}_{+}$ be a measurable function. We define the class ${\mathcal{C}}^{\beta ,L,\tau _{0}}({\mathbb{R}}^{d})$ of locally β-Hölder regular functions as the set of all functions $r:{\mathbb{R}}^{d}\to \mathbb{R}$ such that all mixed partial derivatives ${D}^{k}r$ of r up to order $k_{.}\le \lfloor \beta \rfloor $ exist and, for every k with $k_{.}=\lfloor \beta \rfloor $, satisfy
\[\big|\big({D}^{k}r\big)(x+y)-\big({D}^{k}\big)r(x)\big|\le L(x)\exp \big(\tau _{0}\| y{\| }^{2}\big)\| y{\| }^{\beta -\lfloor \beta \rfloor },\hspace{1em}x,y\in {\mathbb{R}}^{d}.\]
See p. 625 in [22] for this class of functions.3 Main result
Define the complements of the Hellinger-type neighborhoods of $(\lambda _{0},r_{0})$ by
\[A(\varepsilon _{n},M)=\big\{(\lambda ,r):h(\mathbb{Q}_{\lambda _{0},r_{0}},\mathbb{Q}_{\lambda ,r})>M\varepsilon _{n}\big\},\]
where $\{\varepsilon _{n}\}$ is a sequence of positive numbers. We say that $\varepsilon _{n}$ is a posterior contraction rate if there exists a constant $M>0$ such that
as $n\to \infty $ in ${\mathbb{Q}_{\lambda _{0},r_{0}}^{n}}$-probability.The ε-covering number of a subset B of a metric space equipped with the metric ρ is the minimum number of ρ-balls of radius ε needed to cover it. Let $\mathcal{Q}$ be a set of CPP laws $\mathbb{Q}_{\lambda ,r}$. Furthermore, we set
We recall the following general result on posterior contraction rates.
Theorem 1 ([14]).
Suppose that for positive sequences $\overline{\varepsilon }_{n},\widetilde{\varepsilon }_{n}\to 0$ such that $n\min ({\overline{\varepsilon }_{n}^{2}},{\widetilde{\varepsilon }_{n}^{2}})\to \infty $, constants $c_{1},c_{2},c_{3},c_{4}>0$, and sets $\mathcal{Q}_{n}\subset \mathcal{Q}$, we have
Then, for $\varepsilon _{n}=\max (\overline{\varepsilon }_{n},\widetilde{\varepsilon }_{n})$ and a constant $M>0$ large enough, we have that
as $n\to \infty $ in ${\mathbb{Q}_{\lambda _{0},r_{0}}^{n}}$-probability, assuming that the i.i.d. observations $\{Z_{j}\}$ have been generated according to $\mathbb{Q}_{\lambda _{0},r_{0}}$.
(6)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \log N(\overline{\varepsilon }_{n},\mathcal{Q}_{n},h)& \displaystyle \le c_{1}n{\overline{\varepsilon }_{n}^{2}},\end{array}\]In order to derive the posterior contraction rate in our problem, we impose the following conditions on the true parameter pair $(\lambda _{0},r_{0})$.
Assumption 1.
Denote by $(\lambda _{0},r_{0})$ the true parameter values for the compound Poisson process.
-
(i) $\lambda _{0}$ is in a compact set $[\underline{\lambda },\overline{\lambda }]\subset (0,\infty )$;
-
(ii) The true density $r_{0}$ is bounded, belongs to the set ${\mathcal{C}}^{\beta ,L,\tau _{0}}({\mathbb{R}}^{d})$, and additionally satisfies, for some $\varepsilon >0$ and all $k\in {\mathbb{N}_{0}^{d}},\hspace{0.1667em}k_{.}\le \beta $,\[\int {\bigg(\frac{L}{r_{0}}\bigg)}^{(2\beta +\varepsilon )/\beta }r_{0}<\infty ,\hspace{2em}\int {\bigg(\frac{|{D}^{k}r_{0}|}{r_{0}}\bigg)}^{(2\beta +\varepsilon )/k}r_{0}<\infty .\]Furthermore, we assume that there exist strictly positive constants $a,b,c$, and τ such that
The conditions on $r_{0}$ come from Theorem 1 in [22] and are quite reasonable. They simplify greatly when $r_{0}$ has a compact support.
We also need to make some assumptions on the prior Π defined in Section 2.2.
Assumption 2.
The prior $\varPi =\varPi _{1}\times \varPi _{2}$ on $(\lambda _{0},r_{0})$ satisfies the following assumptions:
-
(i) The prior $\varPi _{1}$ on λ has a density $\pi _{1}$ (with respect to the Lebesgue measure) that is supported on the finite interval $[\underline{\lambda },\overline{\lambda }]\subset (0,\infty )$ and is such that
(10)
\[0<\underline{\pi }_{1}\le \pi _{1}(\lambda )\le \overline{\pi }_{1}<\infty ,\hspace{1em}\lambda \in [\underline{\lambda },\overline{\lambda }],\] -
(ii) The base measure α of the Dirichlet process prior $\mathcal{D}_{\alpha }$ is finite and possesses a strictly positive density on ${\mathbb{R}}^{d}$ such that for all sufficiently large $x>0$ and some strictly positive constants $a_{1},b_{1}$, and $C_{1}$, where $\overline{\alpha }(\cdot )=\alpha (\cdot )/\alpha ({\mathbb{R}}^{d})$;
-
(iii) There exist strictly positive constants $\kappa ,a_{2}$, $a_{3}$, $a_{4}$, $a_{5}$, $b_{2}$, $b_{3}$, $b_{4}$, $C_{2}$, $C_{3}$ such that for all $x>0$ large enough,\[G\big(\varSigma :\operatorname{eig}_{d}\big({\varSigma }^{-1}\big)\ge x\big)\le b_{2}\exp \big(-C_{2}{x}^{a_{2}}\big),\]for all $x>0$ small enough, and for any $0<s_{1}\le \cdots \le s_{d}$ and $t\in (0,1)$,\[G\big(\varSigma :s_{j}<\operatorname{eig}_{j}\big({\varSigma }^{-1}\big)<s_{j}(1+t),j=1,\dots ,d\big)\ge b_{4}{s_{1}^{a_{4}}}{t}^{a_{5}}\exp \big(-C_{3}{s_{d}^{\kappa /2}}\big).\]Here $\operatorname{eig}_{j}({\varSigma }^{-1})$ denotes the jth smallest eigenvalue of the matrix ${\varSigma }^{-1}$.
This assumption comes from [22, p. 626], to which we refer for an additional discussion. In particular, it is shown there that an inverse Wishart distribution (a popular prior distribution for covariance matrices) satisfies the assumptions on G with $\kappa =2$. As far as α is concerned, we can take it such that its rescaled version $\overline{\alpha }$ is a nondegenerate Gaussian distribution on ${\mathbb{R}}^{d}$.
Remark 1.
Assumption (10) requiring that the prior density $\pi _{1}$ is bounded away from zero on the interval $[\underline{\lambda },\overline{\lambda }]$ can be relaxed to allowing it to take the zero value at the end points of this interval, provided that $\lambda _{0}$ is an interior point of $[\underline{\lambda },\overline{\lambda }]$.
We now state our main result.
Theorem 2.
We conclude this section with a brief discussion on the obtained result: the logarithmic factor ${(\log n)}^{\ell }$ is negligible for practical purposes. If $\kappa =1$, then the posterior contraction rate obtained in Theorem 2 is essentially ${n}^{-2\beta /(2\beta +d)}$, which is the minimax estimation rate in a number of nonparametric settings. This is arguably also the minimax estimation rate in our problem as well (cf. Theorem 2.1 in [16] for a related result in the one-dimensional setting), although here we do not give a formal argument. Equally important is the fact that our result is adaptive: the posterior contraction rate in Theorem 2 is attained without the knowledge of the smoothness level β being incorporated in the construction of our prior Π. Finally, Theorem 2, in combination with Theorem 2.5 and the arguments on pp. 506–507 in [15], implies the existence of Bayesian point estimates achieving (in the frequentist sense) this convergence rate.
Remark 2.
After completion of this work, we learned about the paper [8] that deals with nonparametric Bayesian estimation of intensity functions for Aalen counting processes. Although CPPs are in some sense similar to the latter class of processes, they are not counting processes. An essential difference between our work and [8] lies in the fact that, unlike [8], ours deals with discretely observed multidimensional processes. Also [8] uses the log-spline prior, or the Dirichlet mixture of uniform densities, and not the Dirichlet mixture of normal densities as the prior.
4 Proof of Theorem 2
The proof of Theorem 2 consists in verification of the conditions in Theorem 1. The following lemma plays the key role.
Lemma 1.
The following estimates are valid:
Moreover, there exists a constant $\overline{C}\in (0,\infty )$, depending on $\underline{\lambda }$ and $\overline{\lambda }$ only, such that for all $\lambda _{0},\lambda \in [\underline{\lambda },\overline{\lambda }]$,
(11)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathrm{K}(\mathbb{Q}_{\lambda _{0},r_{0}},\mathbb{Q}_{\lambda ,r})& \displaystyle \le \lambda _{0}\mathrm{K}(\mathbb{P}_{r_{0}},\mathbb{P}_{r})+\mathrm{K}(\lambda _{0},\lambda ),\end{array}\](12)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathrm{V}(\mathbb{Q}_{\lambda _{0},r_{0}},\mathbb{Q}_{\lambda ,r})& \displaystyle \le 2\lambda _{0}(1+\lambda _{0})\mathrm{V}(\mathbb{P}_{r_{0}},\mathbb{P}_{r})+4\lambda _{0}\mathrm{K}(\mathbb{P}_{r_{0}},\mathbb{P}_{r})\\{} & \displaystyle \hspace{1em}+2\mathrm{V}(\lambda _{0},\lambda )+4\mathrm{K}(\lambda _{0},\lambda )+2\mathrm{K}{(\lambda _{0},\lambda )}^{2},\end{array}\](14)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathrm{K}(\mathbb{Q}_{\lambda _{0},r_{0}},\mathbb{Q}_{\lambda ,r})& \displaystyle \le \overline{C}\big(\mathrm{K}(\mathbb{P}_{r_{0}},\mathbb{P}_{r})+|\lambda _{0}-\lambda {|}^{2}\big),\end{array}\]Let $\varepsilon _{n}={n}^{-\gamma }{(\log n)}^{\ell }$ for γ and $\ell >\ell _{0}$ as in the statement of Theorem 2. Set $\overline{\varepsilon }_{n}=2\overline{C}\varepsilon _{n}$, where $\overline{C}$ is the constant from Lemma 1. We define the sieves of densities $\mathcal{F}_{n}$ as in Theorem 5 in [22]:
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathcal{F}_{n}=\Bigg\{r_{F,\varSigma }\hspace{2.5pt}\text{with}\hspace{2.5pt}F=\sum \limits_{i=1}^{\infty }\pi _{i}\delta _{z_{i}}:& \displaystyle z_{i}\in {[-\alpha _{n},\alpha _{n}]}^{d},\forall i\le I_{n};\sum \limits_{i>I_{n}}\pi _{i}<\varepsilon _{n};\\{} & \displaystyle {\sigma _{0,n}^{2}}\le \operatorname{eig}_{j}(\varSigma )<{\sigma _{0,n}^{2}}{\big(1+{\varepsilon _{n}^{2}}/d\big)}^{J_{n}}\Bigg\},\end{array}\]
where
\[I_{n}=\big\lfloor n{\varepsilon _{n}^{2}}/\log n\big\rfloor ,\hspace{2em}J_{n}={\alpha _{n}^{a_{1}}}={\sigma _{0,n}^{-2a_{2}}}=n,\]
and $a_{1}$ and $a_{2}$ are as in Assumption 2. We also put
In [22], sieves of the type $\mathcal{F}_{n}$ are used to verify conditions of Theorem 1 and to determine posterior contraction rates in the standard density estimation context. We further will show that these sieves also work in the case of decompounding by verifying the conditions of Theorem 1 for the sieves $\mathcal{Q}_{n}$ defined in (17).
4.1 Verification of (6)
Introduce the notation
\[\overline{h}_{1}(\lambda _{1},\lambda _{2})=\overline{C}|\lambda _{1}-\lambda _{2}|,\hspace{2em}\overline{h}_{2}(r_{1},r_{2})=\overline{C}h(\mathbb{P}_{r_{1}},\mathbb{P}_{r_{2}}).\]
Let $\{\lambda _{i}\}$ be the centers of the balls from a minimal covering of $[\underline{\lambda },\overline{\lambda }]$ with $\overline{h}_{1}$-intervals of size $\overline{C}\varepsilon _{n}$. Let $\{r_{j}\}$ be centers of the balls from a minimal covering of $\mathcal{F}_{n}$ with $\overline{h}_{2}$-balls of size $\overline{C}\varepsilon _{n}$. By Lemma 1, for any $\mathbb{Q}_{\lambda ,r}\in \mathcal{Q}_{n}$,
\[h(\mathbb{Q}_{\lambda ,r},\mathbb{Q}_{\lambda _{i},r_{j}})\le \overline{h}_{1}(\lambda ,\lambda _{i})+\overline{h}_{2}(r,r_{j})\le \overline{\varepsilon }_{n}\]
by appropriate choices of i and j. Hence,
\[N(\overline{\varepsilon }_{n},\mathcal{Q}_{n},h)\le N\big(\overline{C}\varepsilon _{n},[\underline{\lambda },\overline{\lambda }],\overline{h}_{1}\big)\times N(\overline{C}\varepsilon _{n},\mathcal{F}_{n},\overline{h}_{2}),\]
and so
\[\log N(\overline{\varepsilon }_{n},\mathcal{Q}_{n},h)\le \log N\big(\overline{C}\varepsilon _{n},[\underline{\lambda },\overline{\lambda }],\overline{h}_{1}\big)+\log N(\overline{C}\varepsilon _{n},\mathcal{F}_{n},\overline{h}_{2}).\]
By Proposition 2 and Theorem 5 in [22], there exists a constant $c_{1}>0$ such that for all n large enough,
\[\log N(\overline{C}\varepsilon _{n},\mathcal{F}_{n},\overline{h}_{2})=\log N(\varepsilon _{n},\mathcal{F}_{n},h)\le c_{1}n{\varepsilon _{n}^{2}}=\frac{c_{1}}{4{\overline{C}}^{2}}n{\overline{\varepsilon }_{n}^{2}}.\]
On the other hand,
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \log N\big(\overline{C}\varepsilon _{n},[\underline{\lambda },\overline{\lambda }],\overline{h}_{1}\big)& \displaystyle =\log N\big(\varepsilon _{n},[\underline{\lambda },\overline{\lambda }],|\cdot |\big),\\{} & \displaystyle \lesssim \log \bigg(\frac{1}{\varepsilon _{n}}\bigg)\\{} & \displaystyle \lesssim \log \bigg(\frac{1}{\overline{\varepsilon }_{n}}\bigg).\end{array}\]
With our choice of $\overline{\varepsilon }_{n}$, for all n large enough, we have
\[\frac{c_{1}}{4{\overline{C}}^{2}}n{\overline{\varepsilon }_{n}^{2}}\ge \log \bigg(\frac{1}{\overline{\varepsilon }_{n}}\bigg),\]
so that for all n large enough,
\[\log N(\overline{\varepsilon }_{n},\mathcal{Q}_{n},h)\le \frac{c_{1}}{2{\overline{C}}^{2}}n{\overline{\varepsilon }_{n}^{2}}.\]
We can simply rename the constant $c_{1}/(2{\overline{C}}^{2})$ in this formula into $c_{1}$, and thus (6) is satisfied with that constant.4.2 Verification of (7) and (8)
We first focus on (8). Introduce
By Assumption 2(i),
Furthermore, Theorem 4 in [22] yields that for some $A,C>0$ and all sufficiently large n,
Choosing $c_{2}=\frac{C+1}{3A\overline{C}}$, we have verified (8) (with $c_{4}=1$).
\[\widetilde{B}(\varepsilon ,\mathbb{Q}_{\lambda _{0},r_{0}})=\big\{(\lambda ,r):\mathrm{K}(\mathbb{P}_{r_{0}},\mathbb{P}_{r})\le {\varepsilon }^{2},\mathrm{V}(\mathbb{P}_{r_{0}},\mathbb{P}_{r})\le {\varepsilon }^{2},|\lambda _{0}-\lambda |\le \varepsilon \big\}.\]
Suppose that $(\lambda ,r)\in \tilde{B}(\varepsilon ,\mathbb{Q}_{\lambda _{0},r_{0}})$. From (14) we obtain
\[\mathrm{K}(\mathbb{Q}_{\lambda _{0},r_{0}},\mathbb{Q}_{\lambda ,r})\le \overline{C}\mathrm{K}(\mathbb{P}_{r_{0}},\mathbb{P}_{r})+\overline{C}|\lambda -\lambda _{0}{|}^{2}\le 2\overline{C}{\varepsilon }^{2}.\]
Furthermore, using (15), we have
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathrm{V}(\mathbb{Q}_{\lambda _{0},r_{0}},\mathbb{Q}_{\lambda ,r})& \displaystyle \le \overline{C}\mathrm{V}(\mathbb{P}_{r_{0}},\mathbb{P}_{r})+\overline{C}\mathrm{K}(\mathbb{P}_{r_{0}},\mathbb{P}_{r})+\overline{C}|\lambda -\lambda _{0}{|}^{2}\le 3\overline{C}{\varepsilon }^{2}.\end{array}\]
Combination of these inequalities with the definition of the set $B(\varepsilon ,\mathbb{Q}_{\lambda _{0},r_{0}})$ in (5) yields
\[\widetilde{B}(\varepsilon ,\mathbb{Q}_{\lambda _{0},r_{0}})\subset B(\sqrt{3\overline{C}}\varepsilon ,\mathbb{Q}_{\lambda _{0},r_{0}}).\]
Consequently,
(18)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \varPi \big(B(\sqrt{3\overline{C}}\varepsilon ,\mathbb{Q}_{\lambda _{0},r_{0}})\big)& \displaystyle \ge \varPi \big(\tilde{B}(\varepsilon ,\mathbb{Q}_{\lambda _{0},r_{0}})\big)\\{} & \displaystyle =\varPi _{1}(|\lambda _{0}-\lambda |\le \varepsilon )\\{} & \displaystyle \hspace{1em}\times \varPi _{2}\big(r_{f,\varSigma }:\mathrm{K}(\mathbb{P}_{r_{0}},\mathbb{P}_{r_{F,\varSigma }})\le {\varepsilon }^{2},\hspace{0.1667em}\mathrm{V}(\mathbb{P}_{r_{0}},\mathbb{P}_{r_{F,\varSigma }})\le {\varepsilon }^{2}\big).\end{array}\]
\[\begin{array}{l}\displaystyle \varPi _{2}\big(r_{F,\varSigma }:\mathrm{K}(\mathbb{P}_{r_{0}},\mathbb{P}_{r_{F,\varSigma }})\le A{n}^{-2\gamma }{(\log n)}^{2\ell _{0}},\mathrm{V}(\mathbb{P}_{r_{0}},\mathbb{P}_{r_{F,\varSigma }})\le A{n}^{-2\gamma }{(\log n)}^{2\ell _{0}}\big)\\{} \displaystyle \ge \exp \big(-Cn{\big\{{n}^{-\gamma }{(\log n)}^{\ell _{0}}\big\}}^{2}\big).\end{array}\]
We substitute ε with $\sqrt{A}{n}^{-\gamma }{(\log n)}^{\ell _{0}}$ and write $\widetilde{\varepsilon }_{n}=\sqrt{3A\overline{C}}{n}^{-\gamma }{(\log n)}^{\ell _{0}}$ to arrive at
\[\varPi \big(B(\widetilde{\varepsilon }_{n},\mathbb{Q}_{\lambda _{0},r_{0}})\big)\ge \underline{\pi }_{1}\sqrt{A}{n}^{-\gamma }{(\log n)}^{\ell _{0}}\times \exp \bigg(-\frac{C}{3A\overline{C}}n{\widetilde{\varepsilon }_{n}^{2}}\bigg).\]
Now, since $\gamma <\frac{1}{2}$, for all n large enough, we have
\[\underline{\pi }_{1}\sqrt{A}{n}^{-\gamma }{(\log n)}^{\ell _{0}}\ge \exp \big(-{n}^{1-2\gamma }{(\log n)}^{2\ell _{0}}\big).\]
Consequently, for all n large enough,
(19)
\[\varPi (B(\widetilde{\varepsilon }_{n},\mathbb{Q}_{\lambda _{0},f_{0}})\ge \exp \bigg(-\bigg(\frac{C+1}{3A\overline{C}}\bigg)n{\widetilde{\varepsilon }_{n}^{2}}\bigg).\]For the verification of (7), we use the constants $c_{2}$ and $\widetilde{\varepsilon }_{n}$ as above. Note first that
By Theorem 5 in [22] (see also p. 627 there), for some $c_{3}>0$ and any constant $c>0$, we have
\[\varPi _{2}\big({\mathcal{F}_{n}^{c}}\big)\le c_{3}\exp \big(-(c+4)n{\big\{{n}^{-\gamma }{(\log n)}^{\ell _{0}}\big\}}^{2}\big),\]
provided that n is large enough. Thus,
\[\varPi (\mathcal{Q}\setminus \mathcal{Q}_{n})\le c_{3}\exp \bigg(-\frac{c+4}{3A\overline{C}}\hspace{0.1667em}n{\widetilde{\varepsilon }_{n}^{2}}\bigg).\]
Without loss of generality, we can take the positive constant c greater than $3A\overline{C}(c_{2}+4)-4$. This gives
\[\varPi (\mathcal{Q}\setminus \mathcal{Q}_{n})\le c_{3}\exp \big(-(c_{2}+4)n{\widetilde{\varepsilon }_{n}^{2}}\big),\]
which is indeed (7).5 Proof of Lemma 1
We start with a lemma from [7], which will be used three times in the proof of Lemma 1. Consider a probability space $(\varOmega ,\mathfrak{F},\mathbb{P})$. Let $\mathbb{P}_{0}$ be a probability measure on $(\varOmega ,\mathfrak{F})$ and assume that $\mathbb{P}_{0}\ll \mathbb{P}$ with Radon–Nikodym derivative $\zeta =\frac{\mathrm{d}\mathbb{P}_{0}}{\mathrm{d}\mathbb{P}}$. Furthermore, let $\mathfrak{G}$ be a sub-σ-algebra of $\mathfrak{F}$. The restrictions of $\mathbb{P}$ and $\mathbb{P}_{0}$ to $\mathfrak{G}$ are denoted ${\mathbb{P}^{\prime }}$ and ${\mathbb{P}^{\prime }_{0}}$, respectively. Then ${\mathbb{P}^{\prime }_{0}}\ll {\mathbb{P}^{\prime }}$ and $\frac{\mathrm{d}{\mathbb{P}^{\prime }_{0}}}{\mathrm{d}{\mathbb{P}^{\prime }}}=\mathbb{E}_{\mathbb{P}}[\zeta |\mathfrak{G}]=:{\zeta ^{\prime }}$.
The proof of the lemma consists in an application of Jensen’s inequality for conditional expectations. This lemma is typically used as follows. The measures $\mathbb{P}$ and $\mathbb{P}_{0}$ are possible distributions of some random element X. If ${X^{\prime }}=T(X)$ is some measurable transformation of X, then we consider ${\mathbb{P}^{\prime }}$ and ${\mathbb{P}^{\prime }_{0}}$ as the corresponding distributions of ${X^{\prime }}$. Here T may be a projection. In the present context, we take $X=(X_{t},t\in [0,1])$ and ${X^{\prime }}=X_{1}$, and so $\mathbb{P}$ in the lemma should be taken as $\mathbb{R}=\mathbb{R}_{\lambda ,r}$ and ${\mathbb{P}^{\prime }}$ as $\mathbb{Q}=\mathbb{Q}_{\lambda ,r}$.
In the proof of Lemma 1, for economy of notation, a constant $c(\underline{\lambda },\overline{\lambda })$ depending on $\underline{\lambda }$ and $\overline{\lambda }$ may differ from line to line. We also abbreviate $\mathbb{Q}_{\lambda _{0},r_{0}}$ and $\mathbb{Q}_{\lambda ,r}$ to $\mathbb{Q}_{0}$ and $\mathbb{Q}$, respectively. The same convention will be used for $\mathbb{R}_{\lambda _{0},r_{0}}$, $\mathbb{R}_{\lambda ,r}$, $\mathbb{P}_{r_{0}}$, and $\mathbb{P}_{r}$.
Proof of inequalities (11) and (14).
Application of Lemma 2 with $g(x)=(x\log x)1_{\{x\ge 0\}}$ gives $\mathrm{K}(\mathbb{Q}_{0},\mathbb{Q})\le \mathrm{K}(\mathbb{R}_{0},\mathbb{R})$. Using (1) and the expression for the mean of a stochastic integral with respect to a Poisson point process (see, e.g., property 6 on p. 68 in [23]), we obtain that
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathrm{K}(\mathbb{R}_{0},\mathbb{R})& \displaystyle =\int \log \bigg(\frac{\mathrm{d}\mathbb{R}_{0}}{\mathrm{d}\mathbb{R}}\bigg)\mathrm{d}\mathbb{R}_{0}\\{} & \displaystyle =\lambda _{0}\int \log \bigg(\frac{\lambda _{0}r_{0}}{\lambda r}\bigg)r_{0}-(\lambda _{0}-\lambda )\\{} & \displaystyle =\lambda _{0}\mathrm{K}(\mathbb{P}_{0},\mathbb{P})+\bigg(\lambda _{0}\log \bigg(\frac{\lambda _{0}}{\lambda }\bigg)-[\lambda _{0}-\lambda ]\bigg)\\{} & \displaystyle =\lambda _{0}\mathrm{K}(\mathbb{P}_{0},\mathbb{P})+\mathrm{K}(\lambda _{0},\lambda ).\end{array}\]
Now
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \lambda _{0}\log \bigg(\frac{\lambda _{0}}{\lambda }\bigg)-(\lambda _{0}-\lambda )& \displaystyle =\lambda _{0}\bigg|\log \bigg(\frac{\lambda }{\lambda _{0}}\bigg)-\bigg(\frac{\lambda }{\lambda _{0}}-1\bigg)\bigg|\\{} & \displaystyle \le c(\underline{\lambda },\overline{\lambda })|\lambda _{0}-\lambda {|}^{2},\end{array}\]
where $c(\underline{\lambda },\overline{\lambda })$ is some constant depending on $\underline{\lambda }$ and $\overline{\lambda }$. The result follows. □Proof of inequalities (12) and (15).
We have
As far as $\mathrm{II}$ is concerned, for $x\ge 0$, we have the inequalities
The first inequality is trivial, and the second is a particular case of inequality (8.5) in [15] and is equally elementary. The two inequalities together yield
Applying this inequality with $x=-\log \frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}}$ (which is positive on the event $\{\frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}}<1\}$) and taking the expectation with respect to $\mathbb{Q}$ give
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathrm{V}(\mathbb{Q}_{0},\mathbb{Q})& \displaystyle =\mathbb{E}_{\mathbb{Q}_{0}}\bigg[{\log }^{2}\bigg(\frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}}\bigg)1_{\{\frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}}\ge 1\}}\bigg]+\mathbb{E}_{\mathbb{Q}_{0}}\bigg[{\log }^{2}\bigg(\frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}}\bigg)1_{\{\frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}}<1\}}\bigg]\\{} & \displaystyle =\mathrm{I}+\mathrm{II}.\end{array}\]
Application of Lemma 2 with $g(x)=(x{\log }^{2}(x))1_{\{x\ge 1\}}$ (which is a convex function) gives
(20)
\[\mathrm{I}\le \mathbb{E}_{\mathbb{R}_{0}}\bigg[{\log }^{2}\bigg(\frac{\mathrm{d}\mathbb{R}_{0}}{\mathrm{d}\mathbb{R}}\bigg)1_{[\frac{\mathrm{d}\mathbb{R}_{0}}{\mathrm{d}\mathbb{R}}\ge 1]}\bigg]\le \mathrm{V}(\mathbb{R}_{0},\mathbb{R}).\]
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathrm{II}& \displaystyle =\mathbb{E}_{\mathbb{Q}}\bigg[\frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}}{\log }^{2}\frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}}1_{\{\frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}}<1\}}\bigg]\\{} & \displaystyle \le 4\int {\bigg(\sqrt{\frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}}}-1\bigg)}^{2}\mathrm{d}\mathbb{Q}\\{} & \displaystyle =4{h}^{2}(\mathbb{Q}_{0},\mathbb{Q})\le 4\mathrm{K}(\mathbb{Q}_{0},\mathbb{Q}).\end{array}\]
For the final inequality, see [20], p. 62, formula (12).Combining the estimates on I and $\mathrm{II},$ we obtain that
After some long and tedious calculations employing (1) and the expressions for the mean and variance of a stochastic integral with respect to a Poisson point process (see, e.g., property 6 on p. 68 in [23] and Lemma 1.1 in [17]), we get that
from which we deduce
for some constant $c(\underline{\lambda },\overline{\lambda })$ depending on $\underline{\lambda }$ and $\overline{\lambda }$ only. As far as $\mathrm{IV}$ is concerned, the $c_{2}$-inequality and the Cauchy–Schwarz inequality give that
from which we find the upper bound
for some constant $c(\underline{\lambda },\overline{\lambda })$ depending on $\underline{\lambda }$ and $\overline{\lambda }$. Combining estimates (22) and (24) on $\mathrm{III}$ and $\mathrm{IV}$ with inequalities (21) and (11) yields (12). Similarly, the upper bounds (23) and (25), combined with (21) and (11), yield (15). □
(21)
\[\mathrm{V}(\mathbb{Q}_{0},\mathbb{Q})\le \mathrm{V}(\mathbb{R}_{0},\mathbb{R})+4\mathrm{K}(\mathbb{Q}_{0},\mathbb{Q}).\]
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathrm{V}(\mathbb{R}_{0},\mathbb{R})& \displaystyle =\lambda _{0}\int {\bigg\{\log \bigg(\frac{\lambda _{0}}{\lambda }\bigg)+\log \bigg(\frac{r_{0}}{r}\bigg)\bigg\}}^{2}f_{0}\\{} & \displaystyle \hspace{1em}+{\lambda _{0}^{2}}{\bigg\{\int \log \bigg(\frac{r_{0}}{r}\bigg)r_{0}+\log \bigg(\frac{\lambda _{0}}{\lambda }\bigg)-\bigg(1-\frac{\lambda }{\lambda _{0}}\bigg)\bigg\}}^{2}\\{} & \displaystyle =\mathrm{III}+\mathrm{IV}.\end{array}\]
By the $c_{2}$-inequality ${(a+b)}^{2}\le 2{a}^{2}+2{b}^{2}$ we have
(22)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathrm{III}& \displaystyle \le 2\lambda _{0}{\log }^{2}\bigg(\frac{\lambda _{0}}{\lambda }\bigg)+2\lambda _{0}\int {\log }^{2}\bigg(\frac{r_{0}}{r}\bigg)r_{0}\\{} & \displaystyle =2\mathrm{V}(\lambda _{0},\lambda )+2\lambda _{0}\mathrm{V}(\mathbb{P}_{0},\mathbb{P}),\end{array}\](23)
\[\mathrm{III}\le c(\underline{\lambda },\overline{\lambda })|\lambda _{0}-\lambda {|}^{2}+2\overline{\lambda }\mathrm{V}(\mathbb{P}_{0},\mathbb{P})\](24)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathrm{IV}& \displaystyle \le 2{\lambda _{0}^{2}}{\bigg(\int \log \bigg(\frac{r_{0}}{r}\bigg)r_{0}\bigg)}^{2}+2{\lambda _{0}^{2}}{\bigg(\log \bigg(\frac{\lambda _{0}}{\lambda }\bigg)-\bigg[1-\frac{\lambda }{\lambda _{0}}\bigg]\bigg)}^{2}\\{} & \displaystyle \le 2{\lambda _{0}^{2}}\mathrm{V}(\mathbb{P}_{0},\mathbb{P})+2\mathrm{K}{(\lambda _{0},\lambda )}^{2},\end{array}\](25)
\[\mathrm{IV}\le 2{\overline{\lambda }}^{2}\mathrm{V}(\mathbb{P}_{0},\mathbb{P})+c(\underline{\lambda },\overline{\lambda })|\lambda _{0}-\lambda {|}^{2}\]Proof of inequalities (13) and (16).
First, note that for $g(x)={(\sqrt{x}-1)}^{2}1_{[x\ge 0]}$,
\[{h}^{2}(\mathbb{Q}_{0},\mathbb{Q})=\mathbb{E}_{\mathbb{Q}}\bigg[{\bigg(\sqrt{\frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}}}-1\bigg)}^{2}\bigg]=\mathbb{E}_{\mathbb{Q}}\bigg[g\bigg(\frac{\mathrm{d}\mathbb{Q}_{0}}{\mathrm{d}\mathbb{Q}}\bigg)\bigg].\]
Since g is convex, an application of Lemma 2 yields $h(\mathbb{Q}_{0},\mathbb{Q})\le h(\mathbb{R}_{0},\mathbb{R})$. Using (1) and invoking Lemma 1.5 in [17], in particular, using formula (1.30) in its statement, we get that
\[\begin{array}{r@{\hskip0pt}l}\displaystyle h(\mathbb{R}_{0},\mathbb{R})& \displaystyle \le \| \sqrt{\lambda _{0}r_{0}}-\sqrt{\lambda r}\| \\{} & \displaystyle \le \| \sqrt{\lambda _{0}r_{0}}-\sqrt{\lambda _{0}r}\| +\| \sqrt{\lambda _{0}r}-\sqrt{\lambda r}\| \\{} & \displaystyle \le \sqrt{\lambda _{0}}\| \sqrt{r_{0}}-\sqrt{r}\| +|\sqrt{\lambda _{0}}-\sqrt{\lambda }|\\{} & \displaystyle =\sqrt{\lambda _{0}}h(\mathbb{P}_{0},\mathbb{P})+h(\lambda _{0},\lambda ),\end{array}\]
where $\| \cdot \| $ denotes the ${L}^{2}$-norm. This proves (13). Furthermore, from this we obtain the obvious upper bound
\[h(\mathbb{R}_{0},\mathbb{R})\le \sqrt{\overline{\lambda }}\hspace{0.1667em}h(\mathbb{P}_{0},\mathbb{P})+\frac{1}{2\sqrt{\underline{\lambda }}}|\lambda _{0}-\lambda |,\]
which yields (16). □