1 Introduction
Let us recall the well-known relationship between the one-dimensional stochastic differential equation (SDE)
and the following parabolic partial differential equation (PDE), called the backward Kolmogorov equation, with initial condition
where $Af=b{f^{\prime }}+\frac{1}{2}{\sigma }^{2}{f^{\prime\prime }}$ is the generator of the diffusion defined by SDE (1.1). If the coefficients $b,\sigma :\mathbb{R}\to \mathbb{R}$ and the initial function f are sufficiently “good,” then the function $u=u(t,x):=\mathbb{E}f({X_{t}^{x}})$ is a (classical) solution to PDE (1.2). From this by Itô’s formula it follows that the random process
is a martingale with mean $\mathbb{E}\hspace{0.1667em}{M_{t}^{x}}=f(x)$ satisfying the final condition ${M_{T}^{x}}=f({X_{T}^{x}})$. This fact is essential in rigorous proofs of the convergence rates of weak approximations of SDEs. The higher the convergence rate, the greater smoothness of the coefficients, and the final condition is to be assumed to get a sufficient smoothness of the solution u to (1.2). The question of the existence of smooth classical solutions to the backward Kolmogorov equation is more complicated than it might seem from the first sight. General results typically require smoothness and polynomial growth of several higher-order derivatives of the coefficients; we refer to the book by Kloeden and Platen [9], Theorem 4.8.6 on p. 153.
(1.1)
\[ {X_{t}^{x}}=x+{\int _{0}^{t}}b\big({X_{s}^{x}}\big)\hspace{0.1667em}dt+{\int _{0}^{t}}\sigma \big({X_{s}^{x}}\big)\hspace{0.1667em}dB_{s},\hspace{1em}{X_{0}^{x}}=x,\](1.2)
\[ \left\{\begin{array}{l}\partial _{t}u(t,x)=Au(t,x),\hspace{1em}\\{} u(0,x)=f(x),\hspace{1em}\end{array}\right.\]However, the coefficients of many SDEs used in financial mathematics are not sufficiently good, and therefore the general theory is not applicable. A classic example is the well-known Cox–Ingersoll–Ross (CIR) process [5], the solution to the SDE
with parameters $\theta ,\kappa ,\sigma >0$, $x\ge 0$, where the diffusion coefficient $\tilde{\sigma }(x)=\sigma \sqrt{x}$ has unbounded derivatives.
(1.3)
\[ {X_{t}^{x}}=x+{\int _{0}^{t}}\theta \big(\kappa -{X_{s}^{x}}\big)\hspace{0.1667em}ds+{\int _{0}^{t}}\sigma \sqrt{{X_{s}^{x}}}\hspace{0.1667em}dB_{s},\hspace{1em}t\in [0,T],\]Alfonsi [1, Prop. 4.1], using the known expression of the transition density of CIR process by a rather complicated function series, gave an ad hoc proof that, indeed, $u=u(t,x):=\mathbb{E}f({X_{t}^{x}})$ is a classic solution to the PDE (1.2), where
\[Af(x)=\theta (\kappa -x){f^{\prime }}(x)+\displaystyle\frac{1}{2}{\sigma }^{2}x{f^{\prime\prime }}(x),\hspace{1em}x\ge 0,\]
is the generator of the CIR process (1.3). Moreover, he proved that if $f:\mathbb{R}_{+}\to \mathbb{R}$ is sufficiently smooth with partial derivatives of polynomial growth, then so is the solution u.In this paper, in case the coefficients of Eq. (1.3) satisfy the condition ${\sigma }^{2}\le 4\theta \kappa $, we give another proof of this result, where we do not use the transition function. We believe that our approach will be applicable to a wider class of “square-root-type” processes for which an explicit form of the transition function is not known (e.g., the well-known square-root stochastic-volatility Heston process [7]). The main tools are the additivity property of CIR processes and their representation in terms of squared Bessel processes. More precisely, we use, after a smooth time–space transformation, the expression of the solution to Eq. (1.3) in the form ${X_{t}^{x}}={(\sqrt{x}+B_{t})}^{2}+Y_{t}$, where Y is a squared Bessel process independent from B. The main challenge is the negative powers of x appearing in the expression of $u(t,x)=\mathbb{E}f({X_{t}^{x}})$ after differentiation with respect to $x>0$. To overcome it, we use a “symmetrization” trick (see Step 1 in the proof of Theorem 4) based on the simple fact that replacing $B_{t}$ by the “opposite” Brownian motion $\bar{B}_{t}:=-B_{t}$ does not change the distribution of ${X_{t}^{x}}$.
Both proofs, Alfonsi’s and ours, are “probabilistic.” It is interesting whether there are similar results with “nonprobabilistic” proofs in the literature. Equation (1.2) seems to be a very simple equation, with coefficients analytic everywhere and the diffusion nondegenerate everywhere except a single point. However, although there is a vast literature on degenerate parabolic and elliptic equations, we could find only a few related results, which, however, do not include the case of initial functions f from ${C_{\mathit{pol}}^{n}}(\mathbb{R}_{+})$ or ${C_{\mathit{pol}}^{\infty }}(\mathbb{R}_{+})$ (see the notation in the Introduction); instead, the boundedness of f and its derivatives is assumed as a rule. For example, general Theorem 1.1 of Feehan and Pop [6] (see also Cerrai [4]) in our particular (one-dimensional) case gives an a priori estimate of the form
in terms of the corresponding Hölder and weighted Hölder space supremum norms.
2 Preliminaries
Definition 1 ([8], Def. 6.1.2.1).
For every $\delta \ge 0$ and $x\ge 0$, the unique strong solution Y to the equation
is called a squared Bessel process with dimension δ, starting at x (BESQ${_{x}^{\delta }}$ for short). We further denote it by ${Y_{t}^{\delta }}(x)$ or ${Y}^{\delta }(t,x)$, and also, ${Y_{t}^{\delta }}:={Y_{t}^{\delta }}(0)$.
Lemma 1 (See [8], Section 6.1).
Let $B=({B}^{1},{B}^{2},\dots ,{B}^{n})$ be a standard n-dimensional Brownian motion, $n\in \mathbb{N}$. Then the process
where ξ is a standard normal variable independent of ${Y_{t}^{n-1}}$, and $\stackrel{d}{=}$ means equality in distribution.
\[ {R_{t}^{2}}:=\| z+B_{t}{\| }^{2}={\sum \limits_{i=1}^{n}}{\big(z_{i}+{B_{t}^{i}}\big)}^{2},\hspace{1em}t\ge 0,\]
where $z=(z_{1},\dots ,z_{n})\in {\mathbb{R}}^{n}$, coincides in distribution with ${Y_{t}^{n}}(\| z\| ),$ that is, with a BESQ${_{x}^{n}}$ random process starting at $x=\| z\| =\sqrt{{\sum _{i=1}^{n}}{z_{i}^{2}}}$. In particular,
(2.2)
\[ {Y_{t}^{n}}(x)\stackrel{d}{=}{\big(\sqrt{x}+{B_{t}^{1}}\big)}^{2}+{\sum \limits_{i=2}^{n}}{\big({B_{t}^{i}}\big)}^{2}\stackrel{d}{=}{(\sqrt{x}+\xi \sqrt{t})}^{2}+{Y_{t}^{n-1}},\hspace{1em}t\ge 0,\]Lemma 2 ([8], Prop. 6.3.1.1).
The distribution of CIR process (1.3) can be expressed in terms of a squared Bessel process as follows:
where $\delta =4\theta \kappa /{\sigma }^{2}$.
(2.3)
\[ X_{t}(x)\stackrel{d}{=}{\mathrm{e}}^{-\theta t}{Y}^{\delta }\bigg(\frac{{\sigma }^{2}}{4\theta }\big({\mathrm{e}}^{\theta t}-1\big),x\bigg),\hspace{1em}t\ge 0,\]We will frequently use differentiation under the integral sign (in particular, under the expectation sign). Without special mentioning, this will be clearly justified by Lemma 3, which seems to be a folklore theorem; we refer to technical report [3].
Definition 2.
Let $(E,\mathcal{A},\mu )$ be a measure space. Let $X\subset {\mathbb{R}}^{k}$ be an open set, and $f:X\times E\to \mathbb{R}$ be a measurable function. The function f is said to be locally integrable in X if
for all compact sets $K\subset X$.
Lemma 3 (Differentiation under the integral sign; see [3], Thm. 4.1).
Let $(E,\mathcal{A},\mu )$, X, and let f be as in Definition 2. Suppose that f has partial derivatives $\frac{\partial f}{\partial x_{i}}(x,\omega )$ for all $(x,\omega )\in X\times E$ and that both f and $\frac{\partial f}{\partial x_{i}}$ are locally integrable in X. Then
\[ \frac{\partial }{\partial x_{i}}\underset{E}{\int }f(x,\omega )\mu (d\omega )=\underset{E}{\int }\frac{\partial }{\partial x_{i}}f(x,\omega )\mu (d\omega )\]
for almost all $x\in X$. In particular, if both sides are continuous in X, then we have equality for all $x\in X$.
Notation.
As usual, $\mathbb{N}$ and $\mathbb{R}$ are the sets of natural and real numbers, $\mathbb{R}_{+}:=[0,\infty )$, and $\overline{\mathbb{N}}:=\mathbb{N}\cup \{0\}$. We denote by ${C_{\mathit{pol}}^{n}}(\mathbb{R}_{+})$ the set of n times continuously differentiable functions $f:\mathbb{R}_{+}\to \mathbb{R}$ such that there exist constants $C_{i}\ge 0$ and $k_{i}\in \mathbb{N}$, $i=0,1,\dots ,n$, such that
for all $i=0,1,\dots ,n.$ Then, following Alfonsi [2], we say that the set of constants {$(C_{i},k_{i})$, $i=0,1,\dots ,n$} is good for f. If $f\in {C_{\mathit{pol}}^{\infty }}(\mathbb{R}_{+})$, that is, $f:\mathbb{R}_{+}\to \mathbb{R}$ is infinitely differentiable and there exist constants $C_{i}\ge 0$ and $k_{i}\in \mathbb{N}$, $i\in \overline{\mathbb{N}}$, such that
then the sequence of constants {$(C_{i},k_{i})$, $i\in \overline{\mathbb{N}}$} is said to be good for f. Finally, by $C\ge 0$ and $k\in \mathbb{N}$ we will denote constants that depend only on the good set of a function f and may very from line to line.
(2.5)
\[ \big|{f}^{(i)}(x)\big|\le C_{i}\big(1+{x}^{k_{i}}\big),\hspace{1em}x\ge 0,\hspace{2.5pt}i\in \overline{\mathbb{N}},\]3 Existence and properties of a solution to backward Kolmogorov equation related to CIR process
Our main result is a direct proof of the following:
Theorem 4 (cf. Alfonsi [1], Prop. 4.1).
Let $X_{t}(x)={X_{t}^{x}}$ be a CIR process with coefficients satisfying the condition ${\sigma }^{2}\le 4\theta \kappa $ and starting at $x\ge 0$. Let $f\in {C_{\mathit{pol}}^{q}}(\mathbb{R}_{+})$ for some $q\ge 4$. Then the function
is l times continuously differentiable in $x\ge 0$ and ${l^{\prime }}$ times continuously differentiable in $t\in [0,T]$ for $l,{l^{\prime }}\in \mathbb{N}$ such that $2l+4{l^{\prime }}\le q$. Moreover, there exist constants $C\ge 0$ and $k\in \mathbb{N}$, depending only on a good set $\{(C_{i},k_{i})$, $i=0,1,\dots ,q\}$ for f, such that
for $j=0,1,\dots ,l,\hspace{2.5pt}i=0,1,\dots ,{l^{\prime }}$. In particular, $u(t,x)$ is a (classical) solution to the Kolmogorov backward equation (1.2) for $(t,x)\in [0,T]\times \mathbb{R}_{+}$.
(3.1)
\[ \big|{\partial _{x}^{j}}{\partial _{t}^{i}}u(t,x)\big|\le C\big(1+{x}^{k}\big),\hspace{1em}x\ge 0,\hspace{2.5pt}t\in [0,T],\hspace{2.5pt}\]
As a consequence, if $f\in {C_{\mathit{pol}}^{\infty }}(\mathbb{R}_{+})$, then $u(t,x)$ is infinitely differentiable on $[0,T]\times \mathbb{R}_{+}$, and estimate (3.1) holds for all $i,j\in \mathbb{N}_{+}$ with C and k depending on $(i,j)$ and a good sequence $\{(C_{i},k_{i})$, $i\in \overline{\mathbb{N}}\}$ for f.
Proof.
We first focus ourselves on the differentiability in $x\ge 0$. By Lemma 2 the process $X_{t}(x)$ can be reduced, by a space–time transformation, to the BESQ${}^{\delta }$ process ${Y_{t}^{\delta }}(x)$ with $\delta =\frac{4\theta \kappa }{{\sigma }^{2}}\ge 1$. Since only bounded smooth functions of $t\in [0,T]$ are involved in (2.3), it suffices to show estimate (3.1) for ${Y_{t}^{\delta }}(x)$, $t\in [0,\tilde{T}],$ instead of $X_{t}(x)$, $t\in [0,T]$, with $\tilde{T}=\frac{1}{\theta }\ln (1+\frac{4\theta T}{{\sigma }^{2}})$. With an abuse of notation, we further write T instead of $\tilde{T}$. We proceed by induction on l.
Step 1. Let $l=1$. First, suppose that $\delta =n\in \mathbb{N}$. By Lemma 1 we have
where $\xi \sim \mathcal{N}(0,1)$ is independent of ${Y_{t}^{n-1}}$ (in the case $n=1$, ${Y_{t}^{0}}:=0$). Denote
Since the distributions of ${Y_{t}^{+}}(x)$ and ${Y_{t}^{-}}(x)$ coincide, we have
where
we have the following estimates:
Now, for $P(t,x)$, we have
where the constant C depends only on $C_{1}$, $k_{1}$, T, and n.
(3.3)
\[\begin{array}{r@{\hskip0pt}l}& \displaystyle \partial _{x}\mathbb{E}f\big({Y_{t}^{n}}(x)\big)\\{} & \displaystyle \hspace{1em}=\partial _{x}\mathbb{E}f\big({Y_{t}^{+}}(x)+{Y_{t}^{n-1}}\big)\\{} & \displaystyle \hspace{1em}=\frac{1}{2}\big[\partial _{x}\mathbb{E}f\big({Y_{t}^{+}}(x)+{Y_{t}^{n-1}}\big)+\partial _{x}\mathbb{E}f\big({Y_{t}^{-}}(x)+{Y_{t}^{n-1}}\big)\big]\\{} & \displaystyle \hspace{1em}=\frac{1}{2}\mathbb{E}\bigg[{f^{\prime }}\big({Y_{t}^{+}}(x)+{Y_{t}^{n-1}}\big)\bigg(1+\xi \sqrt{\frac{t}{x}}\bigg)\hspace{2em}\hspace{2em}\hspace{2em}\hspace{2em}\hspace{2em}\hspace{2em}\text{(Lemma 3)}\\{} & \displaystyle \hspace{2em}+{f^{\prime }}\big({Y_{t}^{-}}(x)+{Y_{t}^{n-1}}\big)\bigg(1-\xi \sqrt{\frac{t}{x}}\bigg)\bigg]\\{} & \displaystyle \hspace{1em}=\mathbb{E}{f^{\prime }}\big({Y_{t}^{n}}(x)\big)\\{} & \displaystyle \hspace{2em}+\frac{1}{2}\sqrt{\frac{t}{x}}\mathbb{E}\big\{\xi \big[{f^{\prime }}\big({Y_{t}^{+}}(x)+{Y_{t}^{n-1}}\big)-{f^{\prime }}\big({Y_{t}^{-}}(x)+{Y_{t}^{n-1}}\big)\big]\big\}\\{} & \displaystyle \hspace{1em}=\mathbb{E}{f^{\prime }}\big({Y_{t}^{n}}(x)\big)+\frac{1}{2}\sqrt{t}\hspace{0.1667em}\mathbb{E}\big(\xi g_{1}\big(x,\xi \sqrt{t},{Y_{t}^{n-1}}\big)\big)=:P(t,x)+R(t,x),\hspace{1em}x>0,\end{array}\]
\[ g_{1}(x,a,b):=\frac{{f^{\prime }}({(\sqrt{x}+a)}^{2}+b)-{f^{\prime }}({(\sqrt{x}-a)}^{2}+b)}{\sqrt{x}},\hspace{1em}x>0,\hspace{2.5pt}a\in \mathbb{R},\hspace{2.5pt}b\ge 0.\]
We now estimate $P(t,x)$ and $R(t,x)$ separately. By the well-known inequality
(3.4)
\[ {\Bigg|{\sum \limits_{i=1}^{n}}a_{i}\Bigg|}^{p}\le {n}^{p-1}{\sum \limits_{i=1}^{n}}|a_{i}{|}^{p}\hspace{1em}\text{for any}\hspace{2.5pt}n\in \mathbb{N},\hspace{2.5pt}p\ge 1,\hspace{2.5pt}a_{i}\in \mathbb{R},\hspace{2.5pt}i=1,2,\dots ,n,\]
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathbb{E}{\big({Y_{t}^{\pm }}(x)\big)}^{p}& \displaystyle =\mathbb{E}{(\sqrt{x}\pm \xi \sqrt{t})}^{2p}\le {2}^{2p-1}\big({x}^{p}+\mathbb{E}|\xi {|}^{2p}{t}^{p}\big)\\{} & \displaystyle ={2}^{2p-1}\bigg({x}^{p}+\frac{{2}^{p}\varGamma (p+\frac{1}{2})}{\sqrt{\pi }}{t}^{p}\bigg)\\{} & \displaystyle \le C\big(1+{x}^{p}\big),\hspace{1em}x\ge 0,\hspace{2.5pt}t\in [0,T],\\{} \displaystyle \mathbb{E}{\big({Y_{t}^{n}}\big)}^{p}& \displaystyle =\mathbb{E}{\Bigg({\sum \limits_{i=1}^{n}}{\big|{B_{t}^{i}}\big|}^{2}\Bigg)}^{p}\le {n}^{p-1}{\sum \limits_{i=1}^{n}}\mathbb{E}{\big|{B_{t}^{i}}\big|}^{2p}\\{} & \displaystyle ={n}^{p}\frac{{2}^{p}\varGamma (p+\frac{1}{2})}{\sqrt{\pi }}{t}^{p}\le C,\hspace{1em}t\in [0,T],\end{array}\]
and, as a consequence,
(3.5)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \mathbb{E}{\big({Y_{t}^{n}}(x)\big)}^{p}& \displaystyle =\mathbb{E}{\big({Y_{t}^{+}}(x)+{Y_{t}^{n-1}}\big)}^{p}\le {2}^{p-1}\mathbb{E}\big({\big({Y_{t}^{+}}(x)\big)}^{p}+\mathbb{E}{\big({Y_{t}^{n-1}}\big)}^{p}\big)\\{} & \displaystyle \le C\big(1+{x}^{p}\big),\hspace{1em}x\ge 0,\hspace{2.5pt}t\in [0,T].\end{array}\](3.6)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \big|P(t,x)\big|& \displaystyle =\mathbb{E}\big|{f^{\prime }}\big({Y_{t}^{n}}(x)\big)\big|\le C_{1}\big(1+\mathbb{E}{\big({Y_{t}^{n}}(x)\big)}^{k_{1}}\big)\le C_{1}\big(1+C\big(1+{x}^{k_{1}}\big)\big)\\{} & \displaystyle \le C\big(1+{x}^{k_{1}}\big),\hspace{1em}x\ge 0,\hspace{2.5pt}t\in [0,T],\end{array}\]At this point, we need the following technical lemma, which we will prove in the Appendix.
Lemma 5.
For a function $f:\mathbb{R}_{+}\to \mathbb{R}$, define the function
for all $j=0,1,\dots ,l$.
\[ g(x;a,b):=\frac{f({(\sqrt{x}+a)}^{2}+b)-f({(\sqrt{x}-a)}^{2}+b)}{\sqrt{x}},\hspace{1em}x>0,\hspace{2.5pt}a\in \mathbb{R},\hspace{2.5pt}b\in \mathbb{R}_{+}.\]
If $f\in {C_{\mathit{pol}}^{q}}(\mathbb{R}_{+})$ for some $q=2l+1\in \mathbb{N}$ $(l\in \overline{\mathbb{N}})$, then the function g is extendable to a continuous function on $\mathbb{R}_{+}\times \mathbb{R}\times \mathbb{R}_{+}$ such that $g(\cdot ;a,b)\in {C_{\mathit{pol}}^{l}}(\mathbb{R}_{+})$ for all $a\in \mathbb{R}$ and $b\in \mathbb{R}_{+}$. Moreover, there exist constants $C\ge 0$ and $k\in \mathbb{N}$, depending only on a good set $\{(C_{i},k_{i})$, $i=0,1,\dots ,q\}$ for f, such that
(3.7)
\[ \big|{\partial _{x}^{j}}g(x;a,b)\big|\le C|a|\big(1+{x}^{k}+{\big|{a}^{2}+b\big|}^{k}\big),\hspace{1em}x\in \mathbb{R}_{+},a\in \mathbb{R},b\in \mathbb{R}_{+},\hspace{2.5pt}\]Now consider $R(t,x)$. Applying Lemma 5 with ${f^{\prime }}$ instead of f (and thus with $g_{1}$ instead of g), we have
where the constant C clearly depends only on $C_{2}$, $k_{2}$, T, and n.
(3.8)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \big|R(t,x)\big|& \displaystyle \le \frac{1}{2}\sqrt{t}\mathbb{E}\big|\xi g_{1}\big(x,\xi \sqrt{t},{Y_{t}^{n-1}}\big)\big|\\{} & \displaystyle \le Ct\mathbb{E}\big[{\xi }^{2}\big(1+{x}^{k_{2}}+{\big|{(\xi \sqrt{t})}^{2}+{Y_{t}^{n-1}}\big|}^{k_{2}}\big)\big]\\{} & \displaystyle \le Ct\mathbb{E}\big[{\xi }^{2}\big(1+{x}^{k_{2}}+{2}^{k-1}\big({(\xi \sqrt{t})}^{2k_{2}}+{\big({Y_{t}^{n-1}}\big)}^{k_{2}}\big)\big)\big]\\{} & \displaystyle \le C\big(1+{x}^{k_{2}}\big),\hspace{1em}x\ge 0,\hspace{2.5pt}t\in [0,T],\end{array}\]Combining the obtained estimates, we finally get
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \big|\partial _{x}\mathbb{E}f\big(X_{t}(x)\big)\big|& \displaystyle \le C\big(1+{x}^{k_{1}}\big)+C\big(1+{x}^{k_{2}}\big)\\{} & \displaystyle \le C\big(1+{x}^{k}\big),\hspace{1em}x\ge 0,\hspace{2.5pt}t\in [0,T],\end{array}\]
where $k=\max \{k_{1},k_{2}\}$, and the constant C depends only on $C_{1},C_{2}$, $k_{1},k_{2}$, T, and n.Now consider the general case where $\delta \ge 1$$,\delta \notin \mathbb{N}$. Note that we consider the general case only for $l=1$ because the reasoning for higher-order derivatives is the same.
Let $n<\delta <n+1$, $n\in \mathbb{N}$. According to [8, Prop. 6.2.1.1], ${Y_{t}^{\delta }}(x)$ has the same distribution as the affine sum of two independent BESQ processes, namely,
\[\begin{array}{r@{\hskip0pt}l}\displaystyle {Y_{t}^{\delta }}(x)& \displaystyle \stackrel{d}{=}\lambda {\widetilde{Y}_{t}^{n}}(x)+\lambda _{2}{\widehat{Y}_{t}^{n+1}}(x),\end{array}\]
where ${\widetilde{Y}_{t}^{n}}(x)$ and ${\widehat{Y}_{t}^{n+1}}(x)$ are two independent BESQ processes of dimensions n and $n+1$, respectively, starting at x, and $\lambda _{1}=n+1-\delta \in (0,1)$, $\lambda _{2}=1-\lambda _{1}=\delta -n\in (0,1)$ (so that $\delta =\lambda _{1}n+\lambda _{2}(n+1)$). Using the estimates just obtained for $\delta \in \mathbb{N}$, we have
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \partial _{x}\mathbb{E}f\big({Y_{t}^{\delta }}(x)\big)& \displaystyle =\partial _{x}\mathbb{E}f\big(\lambda _{1}{\widetilde{Y}_{t}^{n}}(x)+\lambda _{2}{\widehat{Y}_{t}^{n+1}}(x)\big)\\{} & \displaystyle =\frac{1}{2}\big[\partial _{x}\mathbb{E}f\big(\lambda _{1}\big({\widetilde{Y}_{t}^{+}}(x)+{\widetilde{Y}_{t}^{n-1}}\big)+\lambda _{2}\big({\widehat{Y}_{t}^{+}}(x)+{\widehat{Y}_{t}^{n}}\big)\big)\\{} & \displaystyle \hspace{1em}+\partial _{x}\mathbb{E}f\big(\lambda _{1}\big({\widetilde{Y}_{t}^{-}}(x)+{\widetilde{Y}_{t}^{n-1}}\big)+\lambda _{2}\big({\widehat{Y}_{t}^{-}}(x)+{\widehat{Y}_{t}^{n}}\big)\big)\big],\end{array}\]
where
\[\begin{array}{r@{\hskip0pt}l}\displaystyle {\widetilde{Y}_{t}^{+}}(x)& \displaystyle \hspace{0.1667em}:=\hspace{0.1667em}{(\sqrt{x}+\tilde{\xi }\sqrt{t})}^{2},\hspace{2em}\hspace{-0.1667em}{\widetilde{Y}_{t}^{-}}(x)\hspace{0.1667em}:=\hspace{0.1667em}{(\sqrt{x}-\tilde{\xi }\sqrt{t})}^{2},\hspace{2em}\hspace{-0.1667em}{\widehat{Y}_{t}^{+}}(x)\hspace{0.1667em}:=\hspace{0.1667em}{(\sqrt{x}+\hat{\xi }\sqrt{t})}^{2},\\{} \displaystyle {\widehat{Y}_{t}^{-}}(x)& \displaystyle :={(\sqrt{x}-\hat{\xi }\sqrt{t})}^{2},\hspace{2em}{\widetilde{Y}_{t}^{n-1}}:={\sum \limits_{i=1}^{n-1}}{\big({\widetilde{B}_{t}^{i}}\big)}^{2},\hspace{2em}{\widehat{Y}_{t}^{n}}:={\sum \limits_{i=1}^{n}}{\big({\widehat{B}_{t}^{i}}\big)}^{2},\end{array}\]
with independent standard normal variables $\tilde{\xi }$ and $\hat{\xi }$ and standard Brownian motions ${\widetilde{B}}^{i}$ and ${\widehat{B}}^{i}$. Using again the fact that the distributions of ${\tilde{Y}_{t}^{\pm }}(x)$ and ${\hat{Y}_{t}^{\pm }}(x)$ coincide and proceeding as in (3.3), we have
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \partial _{x}\mathbb{E}f\big({Y_{t}^{\delta }}(x)\big)& \displaystyle =\frac{1}{2}\big[\partial _{x}\mathbb{E}f\big(\lambda _{1}\big({(\sqrt{x}+\tilde{\xi }\sqrt{t})}^{2}+{\widetilde{Y}_{t}^{n-1}}\big)+\lambda _{2}\big({(\sqrt{x}+\hat{\xi }\sqrt{t})}^{2}+{\widehat{Y}_{t}^{n}}\big)\big)\\{} & \displaystyle \hspace{1em}+\partial _{x}\mathbb{E}f\big(\lambda _{1}\big({(\sqrt{x}-\tilde{\xi }\sqrt{t})}^{2}+{\widetilde{Y}_{t}^{n-1}}\big)+\lambda _{2}\big({(\sqrt{x}-\hat{\xi }\sqrt{t})}^{2}+{\widehat{Y}_{t}^{n}}\big)\big)\big]\\{} & \displaystyle =\frac{1}{2}\bigg[\mathbb{E}{f^{\prime }}\big(\lambda _{1}\big({(\sqrt{x}+\tilde{\xi }\sqrt{t})}^{2}+{\widetilde{Y}_{t}^{n-1}}\big)\\{} & \displaystyle \hspace{1em}+\lambda _{2}\big({(\sqrt{x}+\hat{\xi }\sqrt{t})}^{2}+{\widehat{Y}_{t}^{n}}\big)\big)\bigg(1+(\lambda _{1}\tilde{\xi }+\lambda _{2}\hat{\xi })\sqrt{\frac{t}{x}}\bigg)\\{} & \displaystyle \hspace{1em}+\mathbb{E}{f^{\prime }}\big(\lambda _{1}\big({(\sqrt{x}-\tilde{\xi }\sqrt{t})}^{2}+{\widetilde{Y}_{t}^{n-1}}\big)\\{} & \displaystyle \hspace{1em}+\lambda _{2}\big({(\sqrt{x}-\hat{\xi }\sqrt{t})}^{2}+{\widehat{Y}_{t}^{n}}\big)\big)\bigg(1-(\lambda _{1}\tilde{\xi }+\lambda _{2}\hat{\xi })\sqrt{\frac{t}{x}}\bigg)\bigg]\\{} & \displaystyle =\mathbb{E}{f^{\prime }}\big({Y_{t}^{\delta }}(x)\big)+\frac{\sqrt{t}}{2}\mathbb{E}\big[(\lambda _{1}\tilde{\xi }+\lambda _{2}\hat{\xi })\\{} & \displaystyle \hspace{1em}\times g_{1}\big(x,(\lambda _{1}\tilde{\xi }+\lambda _{2}\hat{\xi })\sqrt{t},\lambda _{1}\lambda _{2}{(\tilde{\xi }-\hat{\xi })}^{2}t+\lambda _{1}{\widetilde{Y}_{t}^{n-1}}+\lambda _{2}{\widehat{Y}_{t}^{n}}\big)\big]\\{} & \displaystyle =:P_{1}(t,x)+R_{1}(t,x).\end{array}\]
Combination of estimates (3.4) and (3.5) leads to the estimate
where $k=\max \{k_{1},k_{2}\}$, and the constant C depends only on $C_{1},C_{2}$, $k_{1},k_{2}$, and T.
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \big|P_{1}(t,x)\big|& \displaystyle =\big|\mathbb{E}{f^{\prime }}\big({Y_{t}^{\delta }}(x)\big)\big|\le C_{1}\mathbb{E}\big(1+{\big|{Y_{t}^{\delta }}(x)\big|}^{k_{1}}\big)\\{} & \displaystyle \le C_{1}\mathbb{E}\big(1+{4}^{k_{1}-1}\big|{\lambda _{1}^{k_{1}}}\big({\big({\widetilde{Y}_{t}^{+}}(x)\big)}^{k_{1}}\\{} & \displaystyle \hspace{1em}+{\big({\widetilde{Y}_{t}^{n-1}}\big)}^{k_{1}}\big)+{\lambda _{2}^{k_{1}}}\big({\big({\widehat{Y}_{t}^{+}}(x)\big)}^{k_{1}}+{\big({\widehat{Y}_{t}^{n}}\big)}^{k_{1}}\big)\big|\big)\\{} & \displaystyle \le C\big(1+{x}^{k_{1}}\big),\hspace{1em}x\ge 0,\hspace{2.5pt}t\in [0,T],\end{array}\]
where the constant C depends only on $C_{1}$, $k_{1}$, T, and n. By Lemma 5, similarly to estimate (3.8), we have
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \big|R_{1}(t,x)\big|& \displaystyle \le C\mathbb{E}\big[{(\lambda _{1}\tilde{\xi }+\lambda _{2}\hat{\xi })}^{2}\big(1+{x}^{k}\\{} & \displaystyle \hspace{1em}+{\big|{(\lambda _{1}\tilde{\xi }+\lambda _{2}\hat{\xi })}^{2}t+\lambda _{1}\lambda _{2}{(\tilde{\xi }-\hat{\xi })}^{2}t+\lambda _{1}{\widetilde{Y}_{t}^{n-1}}+\lambda _{2}{\widehat{Y}_{t}^{n}}\big|}^{k}\big)\big]\\{} & \displaystyle \le C\big(1+{x}^{k}\big),\hspace{1em}x\ge 0,\hspace{2.5pt}t\in [0,T],\end{array}\]
where the constant C depends only on $C_{2}$, $k_{2}$, T, and n. Combining the last two estimates, we get
(3.9)
\[ \big|\partial _{x}\mathbb{E}f\big(X_{t}(x)\big)\big|\le C\big(1+{x}^{k}\big),\hspace{1em}x\ge 0,\hspace{2.5pt}t\in [0,T],\]
Step 2. Let $l=2$. From Step 1 we have
where the constant C depends only on $C_{1}$, $C_{3}$, $k_{1}$, $k_{3}$, T, and n. For $R_{2}(t,x)$, applying Lemma 5 once more to $g_{1}$ instead of g, we get
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \partial _{x}\mathbb{E}f\big({Y_{t}^{n}}(x)\big)& \displaystyle =\mathbb{E}{f^{\prime }}\big({Y_{t}^{n}}(x)\big)+\frac{1}{2}\sqrt{t}\hspace{0.1667em}\mathbb{E}\big(\xi g_{1}\big(x,\xi \sqrt{t},{Y_{t}^{n-1}}\big)\big).\end{array}\]
Therefore,
\[\begin{array}{r@{\hskip0pt}l}\displaystyle {\partial _{x}^{2}}\mathbb{E}f\big({Y_{t}^{n}}(x)\big)& \displaystyle =\partial _{x}\mathbb{E}{f^{\prime }}\big({Y_{t}^{n}}(x)\big)+\frac{1}{2}\sqrt{t}\hspace{0.1667em}\mathbb{E}\big(\xi \partial _{x}g_{1}\big(x,\xi \sqrt{t},{Y_{t}^{n-1}}\big)\big)\\{} & \displaystyle =:P_{2}(t,x)+R_{2}(t,x).\end{array}\]
From estimate (3.9) with f replaced by ${f^{\prime }}$ we obtain
(3.10)
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \big|P_{2}(t,x)\big|& \displaystyle \le C\big(1+{x}^{k_{3}}\big),\hspace{1em}x\ge 0,\hspace{2.5pt}t\in [0,T],\end{array}\]
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \big|R_{2}(t,x)\big|& \displaystyle \hspace{0.1667em}\le \hspace{0.1667em}\frac{1}{2}\sqrt{t}\mathbb{E}\big|\xi \partial _{x}g_{1}\big(x,\xi \sqrt{t},{Y_{t}^{n-1}}\big)\big|\hspace{0.1667em}\le \hspace{0.1667em}Ct\mathbb{E}\big({\xi }^{2}\big(1+{x}^{k}+|\xi \sqrt{t}{|}^{k}+{\big({Y_{t}^{n-1}}\big)}^{k}\big)\big)\\{} & \displaystyle \le C\big(1+{x}^{k}\big),\hspace{1em}x\ge 0,\hspace{2.5pt}t\in [0,T],\end{array}\]
where the constants C and $k\in \mathbb{N}$ depend only on $\{(C_{i},k_{i})$, $i=1,2,3,4\}$, T, and n. Combining the obtained estimates, we finally get
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \big|{\partial _{x}^{2}}\hspace{0.1667em}\mathbb{E}f\big(X_{t}(x)\big)\big|& \displaystyle \le C\big(1+{x}^{k}\big),\hspace{1em}x\ge 0,\hspace{2.5pt}t\in [0,T],\end{array}\]
where the constants C and $k\in \mathbb{N}$ depend only on $\{(C_{i},k_{i})$, $i=1,2,3,4$}, T, and n.
Step 3. Now we may continue by induction on l. Suppose that estimate (3.1) is valid for $l=m-1$. Let us show that it is still valid for $l=m$. The arguments are similar to those in the case $m=2$ (Step 2). We have
\[\begin{array}{r@{\hskip0pt}l}\displaystyle {\partial _{x}^{m}}\hspace{0.1667em}\mathbb{E}f\big({Y_{t}^{n}}(x)\big)& \displaystyle ={\partial _{x}^{m-1}}\mathbb{E}{f^{\prime }}\big({Y_{t}^{n}}(x)\big)+\frac{1}{2}\sqrt{t}\hspace{0.1667em}{\partial _{x}^{m-1}}\mathbb{E}\big(\xi g_{1}\big(x,\xi \sqrt{t},{Y_{t}^{n-1}}\big)\big)\\{} & \displaystyle =:P_{m}(t,x)+R_{m}(t,x).\end{array}\]
Then, similarly to estimates (3.6) and (3.10), we have
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \big|P_{m}(t,x)\big|& \displaystyle \le C\big(1+{x}^{k_{m}}\big),\end{array}\]
where the constant C depends only on $\{(C_{i},k_{i})$, $i=1,3,\dots ,2m-1\}$, T, and n.For $R_{m}(t,x)$, applying Lemma 5 to $g_{1}$ instead of g, we get
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \big|R_{m}(t,x)\big|& \displaystyle \le \frac{1}{2}\sqrt{t}\mathbb{E}\big|\xi {\partial _{x}^{m-1}}g_{1}\big(x,\xi \sqrt{t},{Y_{t}^{n-1}}\big)\big|\le C\big(1+{x}^{k}\big),\end{array}\]
where the constants C and $k\in \mathbb{N}$ depend only on $\{(C_{i},k_{i})$, $i=1,\dots ,2m\}$, T, and n. Combining the obtained estimates, we get
\[\begin{array}{r@{\hskip0pt}l}\displaystyle \big|{\partial _{x}^{m}}\mathbb{E}f\big(X_{t}(x)\big)\big|& \displaystyle \le C\big(1+{x}^{k}\big),\hspace{1em}x>0,\hspace{2.5pt}t\in [0,T],\end{array}\]
where the constants C and $k\in \mathbb{N}$ depend only on $\{(C_{i},k_{i})$, $i=1,\dots ,2m\}$, T, and n. Thus, Theorem 4 is proved for all $l\in \mathbb{N}$.
Step 4. As in Alfonsi [1, p. 28], inequality (3.1) for the derivatives with respect to t and mixed derivatives follows automatically by an induction on ${l^{\prime }}$ using that, for ${l^{\prime }}\ge 1$ such that $4{l^{\prime }}+2l\le q$,
\[\begin{array}{r@{\hskip0pt}l}\displaystyle {\partial _{x}^{l}}{\partial _{t}^{{l^{\prime }}}}u(t,x)& \displaystyle ={\partial _{x}^{l}}\bigg(\theta (\kappa -x)\partial _{x}{\partial _{t}^{{l^{\prime }}-1}}u(t,x)+\frac{{\sigma }^{2}}{2}x{\partial _{x}^{2}}{\partial _{t}^{{l^{\prime }}-1}}u(t,x)\bigg)\\{} & \displaystyle =\frac{{\sigma }^{2}}{2}x{\partial _{x}^{l+2}}{\partial _{t}^{{l^{\prime }}-1}}u(t,x)+\bigg(l\frac{{\sigma }^{2}}{2}+\theta (\kappa -x)\bigg){\partial _{x}^{l+1}}{\partial _{t}^{{l^{\prime }}-1}}u(t,x)\\{} & \displaystyle \hspace{1em}-l\theta {\partial _{x}^{l}}{\partial _{t}^{{l^{\prime }}-1}}u(t,x).\end{array}\]
□